VTB contacted us to finalize their application for voting.
The client came with a ready-made base part. We needed to test the user identification system through an NFC tag, refresh the design of the application and add a feature that would translate text to speech and be able to recognize the voice.
They sent us a tablet for tests, a tag reader and the tag itself, through which the user logged into the system.
We modified the request that identified the user by the database.
Also, we updated the design.
The problem with the voice module was solved using the native iOS Speech framework, through which the Siri voice assistant works.
The app was released in summer 2019.
We discussed the requests of the application with the customer and prescribed the technical task. Created a prototype in Figma and started development. In total, we worked for about 2.5 months.
When authorizing through the tag, the request did not go through, due to the fact that the data formats did not match. This happens if the server side of the application is updated, but the one with which the user is interacting is not. To make the authorization work, we updated the data format.
We had never worked with the voice recognition before, so it was necessary to learn a technology that was new to us.
To translate questions into speech and recognize voice, we used the native iOS Speech framework. The text of the question and answer options are sent to the framework, after which they are read out by the Siri voice. Then the application switches to the "Listen to the answer" mode and correlates the user's speech with the proposed options.
Technically, speech is converted to text, and the application looks for matches between what is said and the answer options. If the speaker's answer coincides with the proposed option, then his vote is counted. If the app doesn't find a match, Siri asks you to say your answer again more clearly.
During our work, we faced a problem of an incorrect translation from English into Russian. Therefore, we still had to work on localization.
Initially, we wanted the voice to be recorded into an audio file and sent to the server. Then, it would be parsed and correlated, and in response the user would receive a line of text. But the native framework gave much faster and better result, so we abandoned this solution.