Story
Today there are as many as four types of audio content: radio services, streaming music services, podcasts and audio books. And the first is gaining more popularity.And why is this happening? And the whole point is that in 2018 neither I nor you have time to select music. And it remains to rely only on the taste of the DJ. At the same time, I have a personal rich digital collection of music, the selection of which for playback can be obtained by artificial intelligence,focusing on recognizing the emotions of the person. I plan to create algorithm for learning the neural network for selecting music for listening on the basis of an analysis of the listener's emotions. That is, the creation of the profile of the listener and the selection of the optimal musical compositions for him on the basis of his emotional state at a given time.
Before I obtained Thundercomm AI Kit I had used the MobileNet SSD model due to its relatively small size and the fact that it already had a method to upload to anandroid app. SSD is an unified framework for object detection with a single network. It’s possible to use the code to train/evaluate a network for object detection task. Having received the AI Kit, I turned to the Face SDK and Object Dectetion SDK from the ThunderSoft. The key task at the moment is to use the face recognition and emotion detection capabilities to select and play music from the database.
Theory
Face recognition is a classic topic in past decades and now still attracts much attention in the field of computer vision and pattern recognition. Emotion recognition is challenging due to several input modalities, have a significant role in understanding it. The mission of recognizing of the emotions is mostly difficult due to two main reasons: 1) There is not largely available database of training images and 2) classifying emotion could not be simple based on whether the input image is static or evolution frame into a facial expression. The final difficulty is mostly for the real-time detection while facial expressions different husiastically. There are six basic expressions (surprise, fear, happiness, anger, disgust, and sadness) that are common among human beings. Mostly the big overlap between the emotion classes makes the classification task very difficult. The facial recognition process consists of three main stages: acquisition, feature extraction, and emotion classification. In figure is shown as outputs the six basic emotions with the neutral state.
Recently, a new trend of machine learning techniques emerged namely Deep Learning allowing to automatically discover the adequate and relevant representations from raw data such as images. Indeed,they enable the extraction of several representations levels beginning from the lower level input to higher and more abstract one. In the case of an image, the first layer of representation defines the presence or absence of edges at specific orientations and locations. The next one allows detection of motifs by spotting particular arrangements of edges. The third layer allows to combine the detected motifs in order to spot parts of the object to detect. The last layers might merge the detected parts to match the entire object.
Tests
AlgoSample's main features are face registration, face recognition, object recognition, built-in camera/USB camera switching, AI Kit LED light control and more. It’s possible to build it with Android Studio. I use it to detect faces and add then to the DB. Then the captured dataset needs to be trained using OpenCV training algorithm. The idea is to create a database with face emotions,capture images of the face, comparisons with basic ones and define emotions.
Comments