The idea
The project was started as a university project based around smart city and the IoT technologies. The goal was to design a system using AI, that could be integrated into a smart city infrastructure. We came up with an info panel, that could switch languages based on the language people around it were speaking. The practical case we had in mind as an art gallery panel as the speech around a work of art is vaguely predictable and the environment has little noise. Apart from switching languages to adapt to visitors, the system would also send data to the cloud for later analysis - what type of tourist enjoys which parts of the exhibition?
The neural network
We started by working on the neural network for keyword recognition. We used the online platform "Edge Impulse" as it provided an easy way to capture and label the data as well as train the network. We started with three words per language (English, Slovenian, Spanish) and a category for silent snippets. We quickly realized that we would need an insane amount of samples to get reliable recognition, so we shrunk down the data set to just one word per language. In the end we also added a "random word" category in an effort to bring down false positives.
Web implementation
At first we wanted to capture data using an Arduino that would then send it to an Android device. As Android devices are typically capable of collecting audio on their own, we decided to discard the Arduino and build a web app with the same functionality in one place. A web app was chosen over a native app, due to cross platform compatibility and also our familiarity with web dev tools.
Edge impulse allows for relatively easy deployment of models in different ways including a Web Assembly library that we used in our React app. As opposed to when exporting to Arduino, we had to take care of the audio capture and preprocessing ourselves.
Instead of just opening the info page, the app opens with a prompt to click a button. This is due to the browser requirement of user interaction before it allows audio recording.
Features- Text fields for the artwork title, author and description
- An image of the artwork
- Manual language switching buttons
- Automatic language switching in the background
- Detected language upload to ThingSpeak
Comments
Please log in or sign up to comment.