In this project, I'll attempt to clarify some concepts in audio terminology before diving into how the project works and the final result.
1)Sound acquisition:The digital MEMS microphone is a sensor that converts acoustic pressure waves into a digital signal. The STM32 MCUs acquire digital data from the microphone(s) through particular peripherals to be processed and transformed into data standards for audio. The audio data is then handled by the microcontroller according to the targeted audio application.
PDM digital microphone converts the buffered analog signal into a serial pulse density modulated signal.
PDM(Pulse density modulation) is a form of modulation used to represent an analog signal in the digital domain. It is a high-frequency stream of 1-bit digital samples. In a PDM signal, the relative density of the pulses corresponds to the analog signal's amplitude. A large cluster of 1s corresponds to a high (positive) amplitude value while a large cluster of 0s would correspond to a low (negative) amplitude value, and alternating 1s and 0s would correspond to a zero amplitude value.
PCM(Pulse code modulation): in the PCM signal, specific amplitude values are encoded into pulses. A PCM stream has two basic properties that determine the stream's fidelity to the original analog signal: • the sampling rate.• the bit depth.
The sampling rate is the number of samples of a signal that are taken per second to represent it digitally. The bit depth determines the number of bits of information in each sample.
In order to convert the PDM stream into PCM samples, the PDM stream needs to be filtered and decimated. In the decimation stage, the sampling rate of the PDM signal is reduced to the targeted audio sampling rate (16 kHz for example). By selecting 1 of each M sample, the sample rate is reduced by a factor of M. Therefore, the PDM data frequency (which is the frequency of the microphone clock) is M times the target audio sampling frequency needed in an application, where M is the decimation factor.
PDM frequency = Audio sampling frequency × decimation factor
The decimation factor is generally in the range of 48 to 128. The decimation stage is preceded by a low-pass filter to avoid distortion from aliasing.
4)SAI peripheral:This section describes how to connect digital MEMS microphones to the SAI(Serial Audio Interface) peripherals embedded in STM32F746 in mono configurations.
The digital microphone is connected to one of the sub-blocks of the serial audio interface (SAI) peripheral in mono configuration. The SAI sub-block is configured in master receive mode. In this configuration, the SAI sub-block provides the clock to the digital microphone. The audio samples are acquired by the SAI sub-block from the digital microphone data output (DOUT) pin through the Serial Data (SD) pin.
PS: The L/R channel selection (LR) pin of the microphone can be connected either to Vdd or to GND.
5)Sensory Library:Creating customized vocabularies can be achieved in a few steps:
1. Define a custom vocabulary model from the Sensory VoiceHub portal at voicehub.sensory.com
2. Build from VoiceHub
3. Download the model
4. Uncompress the files and copy them to the proper folder
in this video, you will see how to use the sensory voiceHub website to build your model.
Building Custom Language Models with ST for the STM32 on Sensory’s VoiceHub - YouTube
6)The algorithm:you can check the code from the link below:
Najibkassab/STM32F7-Sensory-library-voice-recognition: This project uses a sensory library to demonstrate local voice recognition on the STM32F746 board. (github.com)
The final result:
Comments