Use documents in the following download for the full procedure:http://avnet.me/maaxRT-voice-lab
OverviewTo rapidly implement custom applications on the MaaXBoard RT development board, users have access to some high value resources from NXP and Avnet:
1) MCUXpresso SDK for RT1170-EVK (from NXP)ie. Available via search for “RT1170-EVK” from the NXP SDK builder site at: https://mcuxpresso.nxp.com/en/select
2) Reference designs for MaaXBoard RT (from Avnet)A number of system-level multi-threaded FreeRTOS demo implementations are provided. Examples of these are listed below
- Reference designs summary page http://avnet.me/maaxRT-demo-apps
- Out-of-box GUI demohttp://avnet.me/maaxRT-gui-demohttp://avnet.me/maaxRT-gui-guider-demo
- Wi-Fi webserver sensor demohttp://avnet.me/maaxRT-wifi-webserver-demo
- BLE health thermometer demohttp://avnet.me/maaxRT-ble-ht
- TensorFlow Lite object-recognition demohttp://avnet.me/maaxRT-run-tf
- VIT voice-UI complex system demohttp://avnet.me/maaxRT-voice-maestro-demo
- VIT voice-UI basic control demohttp://avnet.me/maaxRT-voice-control
The two Avnet VIT based voice-UI applications differ significantly in their level of complexity. The goal of this App Note is to provide the following:
a) A brief description and demo of the voice-UI complex system demo,(using pre-compiled binaries)
b) Step-by-step procedure to customize, build and run the voice-UI basic-control demo (using the MCUXpresso IDE)
VIT Voice-UI Complex System DemoThis headless application is partitioned into multiple FreeRTOS tasks on the M7 and M4 cores of the RT1176.
- M7 based voice-processing, USB MSD, MP3 audio-decoding, http webserver and Wi-Fi network
- M4 based-I2C sensor monitoring (requires inexpensive add-on hardware)
PI 2 Click Shield / HAT (MikroE $8.00)
6DOF IMU 3 Click board (MikroE $7.00)
LightRanger 8 Click board (MikroE $12.00)
Functions in the M7 applicationLocal Voice UI (uses VIT voice-recognition function in the NXP Maestro audio framework)
- · Uses 1 to 3 of the 4 onboard PDM microphones
- · Local playback control of MP3 audio files from USB thumb-drive
- · Local control of the board’s GPIOs (RGB LEDs)
Remote Web-UI (smartphone browser-based UI)A http webserver provides GUI webpages (via 802.11ac Wi-Fi soft-A/P or client connection) for:
- · Navigating the playlist of MP3 files on the USB storage
- · Remote status & control of board GPIO (RGB LEDs)
- · Wi-Fi scanning and configuration
- · Display of sensor measurements (6-axis IMU and Range sensor, streamed using websockets)
USB MSD storage access (FAT32 based file system with MP3 audio files)
MP3 Audio File Player (uses 3rd-party Helix MP3 decoder, notthe MP3 decoder within Maestro framework)
Tabled below is typical utilization of the M7 processor core by the different FreeRTOS tasks:
A custom wake-word plus set of 12 voice commands have been predefined for this application, using NXP’s web-based VIT text to speech voice-modelling tool at https://vit.nxp.com/
The NXP Maestro software framework supports multiple options for “audio source” and audio “sink devices”. This reference design however continuously listens for voice commands, so Maestro utilization is limited to:
- Audio source = Microphone(s)
- Voice processing = VIT
- Audio sink = Audio speaker (via Codec)
The M4 FreeRTOS application continuously samples sensor measurements (via I2C) from two MikroE Click sensor boards:
· 6DOF IMU 3 Click (NXP FXOS8700CQ motion sensor 6-axis IMU)
· LightRanger 8 Click (ST VL53L3CX Time-of-Flight sensor)
Programming the M7 and M4 binaries into Flash MemoryTBD
VIT Voice-UI Basic Control DemoThis simpler Cortex M7 FreeRTOS application is what will be built and executed from the MCUXpresso IDE.
It supports voice-control of the RGB LED outputs, as well as audio record and playback functions.
On recorded samples, a dynamic compression algorithm is also applied (triangular dithering) in real time to minimize audio clipping and audio quality-loss during conversion from 24bit to 16bit.
The VIT “wake-word”, plus the set of voice commands and actions implemented on the board in response to these commands, are all fully customizable. For convenience, the application is provided with a default wake-word plus 8 voice commands that control the following on the board:
VIT_Model version : v5.4.0
WakeWord supported : " HEY AVNET "
Voice Commands supported
Cmd_Id : Cmd_Name
0 : UNKNOWN
1 : PLAY SAMPLE
2 : RECORD
3 : PLAY RECORD
4 : LED RED
5 : LED GREEN
6 : LED BLUE
7 : LED OFF
8 : PLAY COMPRESSED
Note: SAI peripheral is configured @(sample_rate: 16khz, bit_width: 16bit/32bit, channel: mono). PDM peripheral is configured to read single channel microphone @(sample_rate: 16khz, bit_width: 32bit(24bit+8bit padding))
Description of Commands- PLAY SAMPLE - Playback demo audio (channel:1, bit_width:16, sample_rate:16Khz) is stored as byte array in the project source folder as
sample_mono.h
. - RECORD - Record the microphone PCM data in the SDRAM memory region.
__attribute__ ((section(".secSdram"))) uint8_t pcmBuffer[PCM_SIZE] = {[0 ... PCM_SIZE-1] = 0x00} ;
PCM_SIZE
is defined inmain.h
. record duration ~6.4 sec.beep_mono.h
audio is used for signalling the begin and end of record. - Play RECORD - Playback pcm signal stored on the previously stored
pcmBuffer
. Note: It is volatile memory, so after power on, it will be blank after power cycle
To create custom beep prompt and audio sample
, tools downloadable from the following sites can be used:
- Use audacity to convert any audio format to mono 16khz mono wav file.
- Use wavToCode to generate C array. Note: Windows OS only
The application executes two FreeRTOS tasks:
- Playback task - play sample/recorded audio stored on the sdram/flash.
- Voice task - VIT voice-recognition, using custom wake-word and 8 voice commands
These tasks are communicated through the FreeRTOS queue. "Voice task" is the producer, "Playback task" is the consumer.
Queue data has the following structure in main.h
typedef struct _queue_command
{
uint8_t command_type;
uint8_t taskId;
uint8_t buffer[24];
}queue_command_t;
The "voice task" handles voice commands as shownbelow. It communicates with "playback task" via queue. For more detail please refer to source/vit_proc.c Line#363
.
/* Please enter your custom code in here. */
switch(VoiceCommand.Cmd_Id)
{
case CMD_PLAY_SAMPLE: // 1
voice_command.command_type = PLAYER_CMD_PLAY;
voice_command.buffer[0] = 0;
xQueueSend(*player_commandQ, (void *) &voice_command, 10);
break;
case CMD_RECORD: // 2
voice_command.command_type = PLAYER_CMD_RECORD;
xQueueSend(*player_commandQ, (void *) &voice_command, 10);
break;
case CMD_PLAY_RECORD: // 3
voice_command.command_type = PLAYER_CMD_PLAY;
voice_command.buffer[0] = 2;
xQueueSend(*player_commandQ, (void *) &voice_command, 10);
break;
case CMD_LED_RED: // 4 LED RED
set_led(RED);
break;
case CMD_LED_GREEN: // 5 LED GREEN
set_led(GREEN);
break;
case CMD_LED_BLUE: // 6 LED BLUE
set_led(BLUE);
break;
case CMD_LED_OFF: // 7 LED OFF
set_led(BLACK);
break;
case CMD_PLAY_COMPRESSED: // 8
voice_command.command_type = PLAYER_CMD_PLAY;
voice_command.buffer[0] = 3;
xQueueSend(*player_commandQ, (void *) &voice_command, 10);
break;
case 9: // 9
break;
case 10: // 10
break;
case 11: // 11
break;
case 12: // 12
break;
default:
break;
}
At "playback task", it waits for queue data. Once data is received, it will process the data and play the requested audio. audio_player.c line:#334
xResult = xQueueReceive(*player_commandQ, &(audio_recvd_cmd), 100);
if (xResult == pdTRUE)
{
switch(audio_recvd_cmd.command_type)
{
case PLAYER_CMD_PLAY:
PRINTF("[Audio] Playing recorded data\r\n");
play_music(audio_recvd_cmd.buffer[0]);
PRINTF("[Audio] *** Player stopped ***\r\n");
break;
case PLAYER_CMD_RECORD:
PRINTF("[Audio] Recording data\r\n");
play_music(1);
enable_record(true);
xResult = xTaskNotifyWait(pdFALSE, 0xffffffff, &ulNotifiedValue, 12000/portTICK_PERIOD_MS);
if (xResult == pdTRUE)
{
PRINTF("[Audio] Record success\r\n");
enable_record(false);
play_music(1);
}else
{
PRINTF("[Audio!] Record error\r\n");
}
break;
case 2:
default:
PRINTF("[Audio] Unknown command received from voice task");
break;
}
}
Reference links:VIT - Creating custom voice command model.
Miniaudio - 24bit to 16bit conversion using dithering algorithm.
OpenAudio_ArduinoLibrary - dynamic range compression algorithm.
Comments
Please log in or sign up to comment.