The project aim is to build an Arduino like development environment that takes advantage of FlexIo advance driver module to build new drivers and interface with high speed multimedia devices such as camera, digital microphones, among others. This project will show how the K82 Cortex M4 MCU with FlexIO technology will allow to build a new level of designs that requires multimedia information handling such as image and audio, while maintaining a level of performance not seeing before for an Arduino compatible platform. The software platform with ported libraries that forms the called "Flexduino" platform offer many benefits for different expertise users, advance users will benefit from highly customizable hardware and advance debugger of Kinetics SDK for the most demanding tasks while entry level users will felt comfortable with arduino like coding style. A complete demonstration of the presented work is a web video QQVGA running at 7.0 fps while doing audio playback using DAC. FlexIO module was using to implement a MEMS microphone digital driver capable of running in the "background" too. TCP/IP server is used to host the video and example jQuery/javascript by means of WiFi IC WINC1500, that in conjunction with K82 MCU achieves 1.65 Mbps without DMA during tests.
Platform OverviewRather than giving an introduction of FlexIO and FRDM-K82F board and features, it will be presented the developed hardware and software with clarification of techniques and implementation carry out to bring this project to life, readers are kindly referred to NXP website for documentation. The next section will cover the Hardware followed by software, since an understanding of the first one might be required to understand the code.
The Flexduino Logo to start.
The FRDM-K82F design was perfect for this integration with Arduino platform because it provides a similar footprint, nonetheless the different electrical interfaces required for this project makes imperative to build a custom shield board in order to provide:
- WiFi module
- micro SD card
- MEMS digital microphone
- Audio output circuitry
- Camera Interface
The last item above was not necessary since the K82 devel board has an expansion connector for OV7670, still a new camera interface was implemented in hardware explained further on in next section.
The following picture shows the TOP view of shield. MEMS sensors used are in a very nice development board to worry less about signal integrity. As a
TCP/IP connectivity - WiFi to the rescue.
In order for transmit multimedia this days in a modern style, WiFi was the preferred choice, specially for a web cam with built in audio, the WiFi options are not as many as the wired ones, but a good one with Arduino support is to use WINC1500 IC. A module was used to speed hardware design. The interface with K82 was done using SPI module and during the tests it works at even 48MHz, but it was notice that above 40MHz there was no improvement in bandwidth during tests using netperf.
File System - micro SD card are getting cheaper this days!
A micro SD card push pull socket in on top of the custom shield to provide the platform File system. The connection was made by SPI second module available in K82 MCU. An astonishing read and write speed was conceived with this MCU using bare metal. Bellow is some statistical data, the test can be enable or disable in code and execute at start up. As you might guess the speed is very dependent on the SD card used, different cards were tested and the best one (SandDisk 1GB) is shown below.
- WRITE TOOK 73 ms, rate = 4383.56 kbit/s
- READ TOOK 70 ms, rate = 4571.43 kbit/s
Audio Recording - Let's see of what is done the FlexIO :)
The digital MEMS sensor used was a Cirrus Logic Part number WM7236E device, The interface consist of a CLK and a data signal DAT. The audio samples comes modulated as pulses by some internal 1 bit ADC, this is know as PDM. The difficult thing is that you need to acquired large amount of data at relative high speed and do some filtering and decimation, that thanks to a brilliant mind that invent the CIC filter nowadays we can implement this in C. Of course don't forget that here FlexIO is mandatory. Since this is a about to have a development environment for further experiments, the shield board includes two interfaces for Left and Right channel. One nice feature of this sensors is that they can share the Data line and it's the task of the driver to sample at the correct clock edge each channel, a simple task for FlexIO module! Since we want to experiment with voice on this project, the selected sample frequency of MEMS was fixed at 500 KHz, enough to get 16 bit PCM samples out of it at 8000 Hz after CIC stage.
In this project a development board from Cirrus logic was used. Very nice little board to try.
Audio Output stage. Let's try the 12-bit DAC.
The DAC was used as output device, the nice thing of this highly advance module is it's DMA capability. The DAC output is connected to an ampliflier stage made by the legendary LM386 amplifier, input filter and output filter is provided as well as gain control. A separate Analog Ground was implemented joined at a single point in PCB and of course bypassing caps all the way around. Besides this precaution, it was notice that some noise get's amplified during highly processing tasks such a file transferring using web browser. It's suspected that the 10K potentiometer at input stage used for volume control is picking up some noise from RF, but nothing that cannot be fixed in a second hardware revision.
Camera Interface - Push the limits but step by step.
The shield board has an on board CMOS sensor connected to a serializer LVDS IC, the idea was to explore FlexIO capability to handle LVDS signal at high speed. This will allow designer to place the camera far away from CPU perhaps using a Flex cable. Something not possible (at least for commercial use case) by means of the parallel interface. Since the contest was to short in time, this end up in a hardware not tested. But to keep progressing with application, the popular OV7670 module was used and guess, FlexIO again to the rescue.
About the prototype.
The PCB is a double layer PCB with components on the TOP only. I did a mistake on the routing, specifically the error was the orientation of the double header connectors. Fortunately it was prototyped by hand and fix it in a couple of hours or less. The schematic show the connections as they are indeed. Only the VREF capacitor that I miss and need to be in order to use it, a 100 nF 0402 capacitor was placed between two pins of connector (find it in picture if you can!)
Software OverviewThe software was developed under the KSDK, it was preferred this platform since the learning curve of eclipse like IDE was not present and also because of free available tools.
Similar to the above hardware sections will follow to detail the design of software, pointing to the different libraries used to implement each feature as well as it's usage.
File System.
Instead of trying to port the arduino SD card libraries, a port of Elm Chan Fat FS is provided. This library is embedded iconic and very robust in my opinion. The only problem is that it doesn't mimic arduino style of SD library, I wouldn't worry about it as in other projects I have use it but for Flexduino it was better to find a solution. I came to a C/C++ wrapper library made for Elm Chan and arduino.
The library was ok but I modified to allow multiple files to be opened at the same time like is the case of Fat FS library.
Usage is explained below.
To open a file:
if (fname) {
Serial << F("Creating ") << fname << "\r\n";
fh = file.open(fname, FA_WRITE | FA_CREATE_ALWAYS );
free(fname);
}
Notice that you need to keep track of the file handler you have opened so you can close it later.
To write for example:
if (file.isOpen(fh)) {
file.write(fh, buffer, size);
total_size += size;
}
And finally close it:
file.close(fh);
fh = -1;
I always end up deferencing the handler buy writing -1 as this value is not valid. For sure this library can be made better but it works really works for our case.
WiFi Module
As a good engineer never start something from scratch unless there is no way ;), the Atmel WINC1500 has a binary blob inside that is proprietary but there is a C library with API provided by them to use with it. Different things were done in order to port the library for Kinetics platform, first the low level SPI driver interface followed by some timer functionality needed by the upper layer stack. Arduino make use of millis() function so why not do the same right.
Here is a extract of SPI bsp driver port file (nm_bsp_wrapper_mk82f.c), located under the bsp_wrapper source folder, it shows the initilization of SPI nm_bus_init and the spi_rw function that writes data to the bus.
static sint8 spi_rw(uint8* pu8Mosi, uint8* pu8Miso, uint16 u16Sz)
{
dspi_transfer_t masterXfer;
uint8 u8Dummy[TRANSFER_SIZE] = {0U};
memset(u8Dummy, 0, sizeof(u8Dummy));
if (!pu8Mosi) {
pu8Mosi = &u8Dummy[0];
}
else if(!pu8Miso) {
pu8Miso = &u8Dummy[0];
}
else {
return M2M_ERR_BUS_FAIL;
}
/* Start master transfer */
masterXfer.txData = pu8Mosi;
masterXfer.rxData = pu8Miso;
masterXfer.dataSize = u16Sz;
masterXfer.configFlags = kDSPI_MasterCtar0 | WINC_DSPI_MASTER_PCS | kDSPI_MasterPcsContinuous;
DSPI_MasterTransferBlocking(WINC_DSPI_MASTER_BASEADDR, &masterXfer);
return M2M_SUCCESS;
}
/*
* @fn nm_bus_init
* @brief Initialize the bus wrapper
* @return M2M_SUCCESS in case of success and M2M_ERR_BUS_FAIL in case of failure
*/
sint8 nm_bus_init(void *pvinit)
{
sint8 result = M2M_SUCCESS;
/* Structure for SPI configuration. */
dspi_master_config_t masterConfig;
uint32_t srcClock_Hz;
/* Master config */
masterConfig.whichCtar = kDSPI_Ctar0;
masterConfig.ctarConfig.baudRate = TRANSFER_BAUDRATE;
masterConfig.ctarConfig.bitsPerFrame = 8U;
masterConfig.ctarConfig.cpol = kDSPI_ClockPolarityActiveLow;
masterConfig.ctarConfig.cpha = kDSPI_ClockPhaseFirstEdge;
masterConfig.ctarConfig.direction = kDSPI_MsbFirst;
masterConfig.ctarConfig.pcsToSckDelayInNanoSec = 100;
masterConfig.ctarConfig.lastSckToPcsDelayInNanoSec = 100;
masterConfig.ctarConfig.betweenTransferDelayInNanoSec = 0;
masterConfig.whichPcs = kDSPI_Pcs1;
masterConfig.pcsActiveHighOrLow = kDSPI_PcsActiveLow;
masterConfig.enableContinuousSCK = false;
masterConfig.enableRxFifoOverWrite = false;
masterConfig.enableModifiedTimingFormat = false;
masterConfig.samplePoint = kDSPI_SckToSin0Clock;
srcClock_Hz = CLOCK_GetFreq(DSPI_MASTER_CLK_SRC);
DSPI_MasterInit(WINC_DSPI_MASTER_BASEADDR, &masterConfig, srcClock_Hz);
return result;
}
Other relevant functions of the port are located under the bsp folder, for example the winc1500 uses an interrupt ping to inform MCU of task completion, so the function init_chip_pins configure the interrupt and also the reset pin. Timer function nm_bsp_sleep is there to provide milliseconds blocking delay, that is made using PIT timer as shown below.
void _delay_ms(int time)
{
/* Set timer period for channel 1 */
PIT_SetTimerPeriod(PIT, kPIT_Chnl_1, MSEC_TO_COUNT(time, PIT_SOURCE_CLOCK));
/* Start channel 1 */
PIT_StartTimer(PIT, kPIT_Chnl_1);
while(!PIT_GetStatusFlags(PIT, kPIT_Chnl_1));
PIT_ClearStatusFlags(PIT, kPIT_Chnl_1, PIT_TFLG_TIF_MASK);
PIT_StopTimer(PIT, kPIT_Chnl_1);
}
Another important thing was to provide functionality to the debug function of library M2M_DBG, in order to print to the console useful error conditions. Is important to note that the version of Atmel/Arduino WINC1500 library of this project uses the latest version of atmel ASF and WiFi101 arduino library. Last one is the upper layer software to provide TCP/IP and UDP. Some issues with those libraries were found, those were mostly fixed during the project without waiting for a mainstream update, for more info you can see this and this.
About the TCP/IP server libraries
A very complete port of libraries is available for the new born Flexduino platform. Following figure shows a folder structure that might explain itself for arduino fans.
Notice the supporting libraries such Print, WString, Stream, etc that makes this highly attractive to use existing libraries from the Arduino ecosystem.
Most of the basic supporting libraries for connectivity like TCP, UDP, and the upper layer Client and Server are located under the winc1500 folder.
WebServer. The very nice TinyWebServer library now at your fingertips ;)
This library have been my favorite for c/c++ embedded platforms that don't provide an integrated networking stack, I discover it years ago and now I ported to Flexduino platform, also have made some contribution by adding a souple of changes. Perhaps the demo app for the camera doesn't show all it's features (there is a video below), but the library makes very easy to deploy a web server, specially a jQuery/javascript one, because there is an uploader functionality using PUT command, that makes it easy to upload files to the SD card without having to remove it from embedded hardware. I used this couple of curl command to upload and delete files.
upload worker.js file
curl -0 -T worker.js http://192.168.1.60/upload/
and delete as needed
curl http://192.168.1.60/remove/worker.js
Of course you should restrict that functionality for production.
Also it has some modifications to allow submitting Forms. Without those nice features, the web road wouldn't be so pleasant. It took me dozens of iterations of debugging HTML5 websockets that requires to changes those scripts.
The webserver menu is shown below. It uses jQueryUI menu, so we load javascript/jQuery only once while navigating the web site.
And this is printed in console at start. Initializing Flexduino Platform - NXP contest!
INIT SPI PORT for SD card completed
966 MB total drive space.
968 MB available.
SD card mount OK
WRITE TOOK 74 ms, rate = 4324.32 kbit/s
READ TOOK 69 ms, rate = 4637.68 kbit/s
<dir> System Volume Information
5541 camera.html
40960 Data.bin
4158 favicon.ico
22184 flexduino.png
2734 hacksterio.png
289 home.html
38466 image.bmp
5033 index.html
273199 jquery.js
816452 male.raw
126562 moh_8sec.raw
7825 nxp.png
984 nxpcss.css
1801 pagelinks.js
349 settings.html
80002 sine_1000Hz.raw
9899 test.jpeg
346 voice.html
1497 worker.js
778 about.html
1696 menu.html
94080 record.raw
<dir> .Trash-1000
(APP)(INFO)Chip ID 1503a0
(APP)(INFO)Firmware ver : 19.4.4
(APP)(INFO)Min driver ver : 19.3.0
(APP)(INFO)Curr driver ver: 19.3.0
Attempting to connect to SSID: MOOI
SSID: MOOI
IP Address: 192.168.1.60
signal strength (RSSI):-36 dBm
Websockets - Highly needed for transferring binary data.
A web sockets library was ported to allow binary data transfer, the idea is to use the last additions to HTML5 or any other kind of browser websockets library to transfer the digital image as binary data, it will allow to do it right and fast.
Here is a snippet that shows the library API callback where a developer do it's magic.
void webSocketEvent(uint8_t num, WStype_t type, uint8_t * payload, size_t length) {
switch(type) {
case WStype_DISCONNECTED:
PRINTF("[%u] Disconnected!\r\n", num);
break;
case WStype_CONNECTED:
{
webSocket.sendTXT(num, "start");
}
break;
case WStype_TEXT:
PRINTF("[%u] get Text: %s\r\n", num, payload);
if(!strncmp((const char*)payload, "image_onfile", strlen("image_onfile"))){ // change this from image to image_onfile...
if(ws_send_file(num, "test.jpeg")){
PRINTF("Send JPEG image test\r\n");
webSocket.sendTXT(num, "end");
}else{
PRINTF("an error occur during image send test\r\n");
webSocket.sendTXT(num, "error");
}
}else if(!strncmp((const char*)payload, "image", strlen("image"))){
Serial.println("receive command for snapshot");
// need to inform about its ready, javascript will disable picture to SD file if not in video streaming mode
}else if(!strncmp((const char*)payload, "video_start", strlen("video_start"))){
Serial.println("receive command for start video");
camera.startvideo(num);
#if fps_avg
print_period = millis();
nframes_time = millis();
fcount = 0;
#endif
}else if(!strncmp((const char*)payload, "video_stop", strlen("video_stop"))){
Serial.println("receive command for stop video");
if(camera.videomode() == bitbang){
Serial.println("stopping video");
camera.stopvideo();
}else{
//another way to stop it? ...
}
}
break;
case WStype_BIN:
PRINTF("[%u] get binary length: %u\r\n", num, length);
uint8_t i;
for (i = 0; i < length; i++)
{
if (i > 0) PRINTF(":");
PRINTF("%02X", payload[i]);
}
PRINTF("\n");
// send message to client
// webSocket.sendBIN(num, payload, length);
break;
case WStype_ERROR:
//
break;
default:
break;
}
}
From Camera to Web browser
Several things have been done here to allow transferring an image from FlexIO camera to Web Browser, we already mention websockets and above snippet is just part of the mechanism that integrate the solution, specifically the control API.
The FlexIO camera module was configured to gather a YUV422 image. If you are landing for first time on this territory of image processing you might ask you why to use that image format, it has many uses cases but for this project it's just because it can be easily compressed to jpeg. The FlexIO driver have been twickle to allow a different acquisition approach in order to improve compression. Instead of having a double buffer of 38400 bytes to store YUV422, only one is used but now ping pong scheme just change from two buffers for two images to fifteen buffers for one image. It's correct 15 buffers. The idea comes from JPEG compressor Macroblock unit and the numbers of macroblocks needed to compress a macroline. Sorry but it's time to show the funny part.
Here is the buffer definitions
volatile __attribute__((aligned(32), section(".myRAM"))) uint8_t g_FlexioCameraMacroblockBuffer[MACROLINES][MACROBLOCK];
The original firmware example provided by NXP uses DMA to fill up the buffer but without actually using the EDMA interrupt, the only interrupt used is the VSYNC one. For this project we still keep the VSYNC interrupt but in order to handle the buffering the EDMA interrupt was configured as follows (the complete code for this is under flexio_ov7670c file)
DMAMUX_Init(DMAMUX0);
/* Configure DMA */
edma_config_t edmaConfig;
EDMA_GetDefaultConfig(&edmaConfig);
edmaConfig.enableDebugMode = true;
EDMA_Init(DMA0, &edmaConfig);
EDMA_CreateHandle(&g_EDMA_Camera_Handle, DMA0, FLEXIO_DMA_CHANNEL);
video_preemption_cfg.channelPriority = 1;
video_preemption_cfg.enablePreemptAbility = true;
video_preemption_cfg.enableChannelPreemption = false;
EDMA_SetChannelPreemptionConfig(DMA0, FLEXIO_DMA_CHANNEL, &video_preemption_cfg);
EDMA_SetCallback(&g_EDMA_Camera_Handle, Camera_Handler, NULL);
s_TcdMemoryPtrFlexioToFrame->SADDR = FLEXIO_CAMERA_GetRxBufferAddress(&s_FlexioCameraDevice);
s_TcdMemoryPtrFlexioToFrame->SOFF = 0;
s_TcdMemoryPtrFlexioToFrame->ATTR =
DMA_ATTR_SSIZE(kEDMA_TransferSize16Bytes) | DMA_ATTR_DSIZE(kEDMA_TransferSize16Bytes);
s_TcdMemoryPtrFlexioToFrame->NBYTES = 16;
s_TcdMemoryPtrFlexioToFrame->SLAST = 0;
s_TcdMemoryPtrFlexioToFrame->DADDR = (uint32_t)&g_FlexioCameraMacroblockBuffer[0][0];
s_TcdMemoryPtrFlexioToFrame->DOFF = 16;
s_TcdMemoryPtrFlexioToFrame->CITER = ((MACROBLOCK >> 4));
s_TcdMemoryPtrFlexioToFrame->DLAST_SGA = 0;
s_TcdMemoryPtrFlexioToFrame->CSR = DMA_CSR_INTMAJOR_MASK;
s_TcdMemoryPtrFlexioToFrame->BITER = ((MACROBLOCK >> 4));
EDMA_ChannelTransferInit(DMA0, FLEXIO_DMA_CHANNEL, (s_FlexioCameraDevice.shifterStartIdx + 1U),
s_TcdMemoryPtrFlexioToFrame);
FLEXIO_Ov7670VsynInit();
}
Besides the aforementioned need for this change to provide better jpeg streaming, another important reason that almos gets solve is that with original firmware, all 8 shifters of flexio were in use, so we cannot add our digital MEMS flexio driver for example. One of the changes is to use 4 shifters, so the transfers are 16 bytes at a time.
Notice that we use the EDMA_SetCallback to have a function called everytime a jpeg macroblock is acquired.
void Camera_Handler(edma_handle_t *handle, void *param, bool transferDone, uint32_t tcds)
{
EDMA_ClearChannelStatusFlags(DMA0, FLEXIO_DMA_CHANNEL, kEDMA_InterruptFlag);
gpfJpegMacroBlkCb(mblk_cnt_);
mblk_cnt_++;
DMA0->TCD[FLEXIO_DMA_CHANNEL].DADDR = (uint32_t)&g_FlexioCameraMacroblockBuffer[mblk_cnt_][0];
return;
}
In the above code the function gpfJpegMacroBlkCb is a callback function residing in the upper layer software (c++ library camera.cpp) to notify of a new Macroline is ready to be processed.
Perhaps I have been lazy to define or explain the reason or what is exacty a macroblock. It happens that jpeg compression algorithm make use of discrete cosine transforms or DCT, I am not going to go deep there but just to highlight that for DCT you need a matrix of pixels, several sizes can be used but here a 16 x 16 pixels macroblock was defined. For QQVGA, 160 pixels width makes a macroline to contain 10 of those macroblocks.
The encoder library is under encoder folder, there you can find the file jpegenc.c and inside it is a function called encode_line_yuv where exactly a line is encoded. One thing to notice is that FlexIO camera driver is not technically YUV422, rather is YCbCr, the problems is solved by making the algorith to pick up the right values from buffer as shown below.
for (b=0; b<num_blocks; b++) {
for (r=0; r<8; r++)
for (c=0; c<8; c++)
{
// get pixel index and extract YUV values
unsigned int n = 2*(160*r + 16*b + 2*c);
// first four pairs of pixels get put into Y8x8[0],
// and last four pairs get pu into Y8x8[1]
unsigned int yindex = c < 4 ? 0 : 1;
// OV7670 is Cbn Yn Crn Yn+1 ...
Cb8x8[r][c] = _line_buffer[n+0] - 128;
Y8x8[yindex][r][(2*c)%8+0] = _line_buffer[n+1] - 128;
Cr8x8[r][c] = _line_buffer[n+2] - 128;
Y8x8[yindex][r][(2*c)%8+1] = _line_buffer[n+3] - 128;
After that DCT and Huffman is apply to those matrices.
A function have to finish the work, for that since this is a bare metal app so far, we place webCam.process(); in main loop, and here is what is does:
void webCam::process(){
if(cur_stack_macroblk_ < cur_flexio_macroblk_){
if(cur_stack_macroblk_ < 0){
}else{
}
uint8_t *dat = (uint8_t *)&g_FlexioCameraMacroblockBuffer[cur_stack_macroblk_][0];
compress_macroblock(dat);
if(cur_stack_macroblk_ >= (MACROLINES-1)){
cur_stack_macroblk_ = -1;
}
}
}
Basically it get tracks of remaining macroblocks to convert and if FlexIO camera finish before (it's a joke of course), it just stop momentarily the conversion process since I already mention we only have a single buffer. The advantage is that for 25 fps image QQVGA, FlexIo takes roughly 70ms to gather it, but from the same first macroblock it gathers we can start feeding JPEG compressor. That makes possible to stream at 7.0 fps (there is some code to test this too and prints to console the rate)
What do we do with the image, well two things can happens on current platform, save it in SD card or send it to web socket endpoint. I will show you how the last one is done.
Remember the websocket event handler API shown above?, the function camera.startvideo(num); keep the websocket number id and start the FlexIO camera module. You might guess for sure what does the stopvideo() call.
Inside the camera library the compressor do a macroline at a time, but internally the Jpeg compressor library makes a callback to a write function when it has a buffer ready to dispatch, either to SD card or websocket. The buffer can be adapted in size but it's better to work with 512 bytes multiples for the SD card access> Websocket limitation is on the network layer, I didn't test different size just try to keep with 1024 since winc1500 buffers are 1400 bytes maximum MTU.
Here is the code:
void webCam::write_jpeg(const unsigned char * _buffer, const unsigned int _n){
if(video_type_ != no_video){
webSocket.sendBIN(wsnum_, (const uint8_t*)_buffer, _n);
}else{
if(file_jpg_ < 0)
return;
// disable PIT2 audio interrupt for SD card access
PIT_DisableInterrupts(PIT, kPIT_Chnl_2, kPIT_TimerInterruptEnable);
file.write(file_jpg_, (void*)_buffer, _n);
PIT_EnableInterrupts(PIT, kPIT_Chnl_2, kPIT_TimerInterruptEnable);
}
}
If video mode is active it just send the binary data using webSocket.sendBIN, Otherwise save to file. What does has to do the PIT here you might ask. The current demo uses playback from SD card at the same time as web cam might save to file, so this "brute" in my opinion mechanism prevent access to share resource.
For sure it's not the best way to do it but I have a hard time to find how to do a critical section to prevent usage of same resource by this two "processes". Always there is something we can make better right.
Last but not least javascript also comes to rescue in the front end. The example app uses HTML5 websockets with a web worker to run in the background of client machine for better experience. Here is just an extract
if(evt.data instanceof ArrayBuffer ){
var len = evt.data.byteLength;
// console.log("binary message received of length = " + len);
if(inTransfer == false){
buffer = new Uint8Array(evt.data);
inTransfer = true;
}else{
buffer = _array_concat(buffer, evt.data);
}
}else{
// control logic here
cmd = evt.data;
switch(cmd){
case "start":
inTransfer = false;
break;
case "end":
$("#shot").text("Shot");
$("#video").prop("disabled",false);
$("#shot").prop("disabled",false);
break;
case "error":
break;
default:
break;
}
}
}
A binary is made up when transfer starts by calling _array_concat located later on on this file. After all binary data is in resulting buffer, the html img tag is constructed. Now I will stop boring you and show you some action.
Please notice that the background audio you hear is a playback from SD file happening at the same time of video streaming.
Audio OverviewPlayback using DAC.
The DAC module was configured for EDMA transfer from a ping pong buffer. A library was deployed to handle the ping pong buffer states, read data from SD card and transfer automatically to DAC.
The main library file is AudioFlex.cpp that follows arduino style of .begin(), .process(), functions. The example code uses .play() function to retrieve a file from SD card and start the playback.
Here is a snippet of initialization code that resides in a different c file called audio_dma.c
void Init_Audio(void)
{
ClearAudioBuffers();
Set_Audio_Playback_Dma();
Playing_Buff_A = TRUE;
Set_VREF(); // VREF for DAC
Init_DAC(); // DACs ON
Init_FTM2();
}
The DMA configuration is as follows
void Set_Audio_Playback_Dma(void)
{
edma_channel_Preemption_config_t audio_preemption_cfg;
// Enable clock for DMAMUX and DMA
SIM->SCGC6 |= SIM_SCGC6_DMAMUX_MASK;
SIM->SCGC7 |= SIM_SCGC7_DMA_MASK;
DMAMUX0->CHCFG[DAC_DMA_CHANNEL] = DMAMUX_CHCFG_ENBL_MASK;
edmaConfig.enableDebugMode = true;
}
Notice that here, similar to the camera EDMA, we attach a callback to swap the buffers.
The callback function is in c++ library and the following code shows the ping pong manager that is activated in main loop.
volatile void AudioFlexClass::pingPongManager(void){
signed short * pSample;
signed short * p_buf;
uint16_t size;
#if debug
static int count = 0;
#endif
if(playing_){
if(stop_){
stop_ = false;
file.close(wav_hlr_);
wav_hlr_ = -1;
Stop_Audio();
return;
}
p_buf = (Playing_Buff_A == 1) ? &Audio_Source_Blk_B[0] : &Audio_Source_Blk_A[0]; // if DMA0 reads buff A, reload B by CPU otherwise reload A by CPU
if(file.isOpen(wav_hlr_)){
if((size = file.read(wav_hlr_, file_buf, sizeof(file_buf))) < sizeof(file_buf)){
// this is the last buffer of file
ClearAudioBuf(p_buf);
stop_ = true;
#if debug
Serial.print("Playback comes to an end, count = ");
Serial.println(count);
count = 0;
#endif
}
}else{
wav_hlr_ = -1;
#if debug
Serial.print("[Error] : File was closed, count = ");
Serial.println(count);
count = 0;
#endif
Stop_Audio();
stop_ = true;
return;
}
pSample = (signed short *) file_buf; // Here we can do some logic to gather the correct buffer stream or adjust the Volume
for(uint16_t i=0; i < size/2; i++)
{
*p_buf = (unsigned short)(((pSample[i])+32768))>>4;
#if VOLUME_CTRL
*p_buf >>= 1;
#endif
p_buf++;
}
#if debug
count ++;
#endif
}
}
Notice how we did some experiments to control the volume. The Voice was sampled and can be hear and undesrtand but there was some clipping from high gain. This better a little bit the situation. One thing that might help is to provide a better speaker and another is to feed the LM386 with +12V rail, we were using 5V from noise board actually.
The DAC produces 1/4096Vi to Vi range, where Vi in our case is the internal VREF, so here we are actually placing 1.2V at the amp inputand for 20V gain it's not good.
Audio Recording. The beauty of MEMS.
This was the part of the project that make me realize how FlexIO works and how powerful it is. Why? Because I write the MEMS sensor driver since none of the available (SPI, I2S, I2C) really fit the MEMS sensor needs.
First it needs a continuous clock and buffering from sensor bit stream data, most convenient by using DMA, second is that audio samples can become really huge and overwhelming for CPU to handle if not done correctly.
The design (for sure it can be improved) proves that FlexIO is up to the task, it works by first providing a continuous clock without triggering, most of the drivers uses triggers, been a Shift buffer write or another Timer output, and secondly EDMA with shifter size transfers to memory buffer, the PDM bits are gathered one by one until a Shift Register is filled (controlled by Timer of course), then transfer the 32 bit word to a bigger buffer and so on until it's time of ping pong swap.
The ping pong buffer size was choose to allow storage of 20 ms of audio, some of you for sure might know the underneath reason of that timeslot, it happens that it's used all over for VoIP and makes sense for data transmission using TCP/IP i.e.
The 20 ms for 8000 Hz audio sampling rate, using 500Khz PDM rate and CIC filter design needs a 1280 bytes buffer that will down convert to 160 bytes or octects after CIC, really nice.
The CIC filter takes every 20 ms one of those buffers and do the magic.
The MEMS mic works and you can hear the record with a test snippet, there was no time to do the websockets part. The only problem is that the gain seems low when playing with Audacity.
The FlexIO MEMS driver configuration is shown below, the rest of the code is under drivers folder, the file is called fsl_flexio_mems.c to follow NXP nomenclature.
void FLEXIO_MEMS_Init(FLEXIO_MEMS_Type *base, const flexio_mems_config_t *config)
{
assert(base && config);
flexio_shifter_config_t shifterConfig = {0};
flexio_timer_config_t timerConfig = {0};
/* Ungate flexio clock. */
CLOCK_EnableClock(kCLOCK_Flexio0);
FLEXIO_Reset(base->flexioBase);
/* Set shifter for MEMS Rx Data */
shifterConfig.timerSelect = base->bclkTimerIndex;
shifterConfig.pinSelect = base->rxPinIndex;
shifterConfig.timerPolarity = kFLEXIO_ShifterTimerPolarityOnNegitive;
shifterConfig.pinConfig = kFLEXIO_PinConfigOutputDisabled;
shifterConfig.pinPolarity = kFLEXIO_PinActiveHigh;
shifterConfig.shifterMode = kFLEXIO_ShifterModeReceive;
shifterConfig.inputSource = kFLEXIO_ShifterInputFromPin;
shifterConfig.shifterStop = kFLEXIO_ShifterStopBitDisable;
shifterConfig.shifterStart = kFLEXIO_ShifterStartBitDisabledLoadDataOnEnable;
FLEXIO_SetShifterConfig(base->flexioBase, base->rxShifterIndex, &shifterConfig);
/* Set Timer to MEMS bit clock */
timerConfig.triggerSelect = 0;
timerConfig.triggerPolarity = kFLEXIO_TimerTriggerPolarityActiveHigh;
timerConfig.triggerSource = kFLEXIO_TimerTriggerSourceExternal; //kFLEXIO_TimerTriggerSourceInternal;
timerConfig.pinSelect = base->bclkPinIndex;
timerConfig.pinConfig = kFLEXIO_PinConfigOutput;
timerConfig.pinPolarity = kFLEXIO_PinActiveHigh;
timerConfig.timerMode = kFLEXIO_TimerModeDual8BitBaudBit;
timerConfig.timerOutput = kFLEXIO_TimerOutputZeroNotAffectedByReset;
timerConfig.timerDecrement = kFLEXIO_TimerDecSrcOnFlexIOClockShiftTimerOutput;
timerConfig.timerReset = kFLEXIO_TimerResetNever;
timerConfig.timerDisable = kFLEXIO_TimerDisableNever;
timerConfig.timerEnable = kFLEXIO_TimerEnabledAlways;
timerConfig.timerStart = kFLEXIO_TimerStartBitDisabled;
timerConfig.timerStop = kFLEXIO_TimerStopBitDisabled;
timerConfig.timerCompare = 0;
FLEXIO_SetTimerConfig(base->flexioBase, base->bclkTimerIndex, &timerConfig);
/* Clear flags. */
FLEXIO_ClearShifterErrorFlags(base->flexioBase, (1 << base->rxShifterIndex));
FLEXIO_ClearTimerStatusFlags(base->flexioBase, 1U << (base->bclkTimerIndex));
}
It's worth to notice, the Timer is used as 8 bit counter/decrement mode, it makes possible to stream the ones at desired frequency by adjusting the lower 16 bit word nibble and setting the bit width using the higher nibble. Since we want to fill up the Shift buffer, we set bitwidth to 32, that coupole with EDMA makes easier to transfer the 1280 bytes without CPU intervention.
The EDMA configuration is in another file outside the drivers folders since I really didn't like to work with the edma high level functions, perhaps someone at NXP can adjust it ;) Anyway, here is a snippet of edma config,
static void MEMS_FLEXIO_EDMA_Init(void)
{
flexio_mems_format_t memsFormat;
FLEXIO_MEMS_GetDefaultConfig(&s_FlexioMemsConfig);
FLEXIO_MEMS_Init(&s_FlexioMemsDevice, &s_FlexioMemsConfig);
memsFormat.bitWidth = s_FlexioMemsDevice.bitWidth;
memsFormat.sampleRate_Hz = s_FlexioMemsDevice.sampleRate_Hz;
FLEXIO_MEMS_MasterSetFormat(&s_FlexioMemsDevice, &memsFormat, CLOCK_GetFreq(kCLOCK_CoreSysClk));
/* Enable the flexio edma request */
FLEXIO_EnableShifterStatusDMA(s_FlexioMemsDevice.flexioBase, (1U << s_FlexioMemsDevice.rxShifterIndex), true);
MEMS_EDMA_Init();
EDMA_CreateHandle(&g_EDMA_MEMS_Handle, DMA0, FLEXIO_MEMS_DMA_CHANNEL);
EDMA_SetCallback(&g_EDMA_MEMS_Handle, MEMS_Handler, NULL);
s_TcdMemoryPtrMEMSToFrame->SADDR = (uint32_t) FLEXIO_MEMS_RxGetDataRegisterAddress(&s_FlexioMemsDevice);
s_TcdMemoryPtrMEMSToFrame->SOFF = 0;
s_TcdMemoryPtrMEMSToFrame->ATTR = DMA_ATTR_SSIZE(kEDMA_TransferSize4Bytes) | DMA_ATTR_DSIZE(kEDMA_TransferSize4Bytes);
s_TcdMemoryPtrMEMSToFrame->NBYTES = 4;
s_TcdMemoryPtrMEMSToFrame->SLAST = 0;
s_TcdMemoryPtrMEMSToFrame->DADDR = (uint32_t) &PDMRxData[0][0];
s_TcdMemoryPtrMEMSToFrame->DOFF = 4;
s_TcdMemoryPtrMEMSToFrame->CITER = PDM_TRANSFER_SIZE/4;
s_TcdMemoryPtrMEMSToFrame->DLAST_SGA = 0;
s_TcdMemoryPtrMEMSToFrame->CSR = DMA_CSR_INTMAJOR_MASK;
s_TcdMemoryPtrMEMSToFrame->BITER = PDM_TRANSFER_SIZE/4;
EDMA_ChannelTransferInit(DMA0, FLEXIO_MEMS_DMA_CHANNEL, (s_FlexioMemsDevice.rxShifterIndex + 1U), s_TcdMemoryPtrMEMSToFrame);
}
And here is the the CIC implementation that takes the idea from here.
bool MemsCIC::worker(uint8_t buf){
CICREG Rout2;
CICREG stage1, stage2, stage3;
static int pcm;
static int pcm0, pcm1;
setPDMbuf(buf);
setPCMbuf(pcm_in_use_);
for(uint8_t k=0; k < PCM_FRAME; k++){
// extract PDM samples taken care of the reorder made by using FlexIO EDMA
for(uint8_t j=0; j < CIC2_R/2; j++) {
}
pdm_ += 4;
for(uint8_t j=0; j < CIC2_R/2; j++) {
}
pdm_ += 4;
Rout2 = s2_sum3;
stage1 = Rout2 - s2_comb1_2;
s2_comb1_2 = s2_comb1_1;
s2_comb1_1 = Rout2;
stage2 = stage1 - s2_comb2_2;
s2_comb2_2 = s2_comb2_1;
s2_comb2_1 = stage1;
stage3 = stage2 - s2_comb3_2;
s2_comb3_2 = s2_comb3_1;
s2_comb3_1 = stage2;
// High pass filter to remove DC, roll-off at alpha=63/64 and 7812.5 Hz
// implies RC=16.256 ms, or Fc=9.79 Hz
// alpha at 7812.5 Hz Fcut
// 255/256 32.64 ms 4.876 Hz
// 127/128 16.256 ms 9.79 Hz
// 63/64 8.064 ms 19.74 Hz
// 31/32 3.968 ms 40.1 Hz
pcm = stage3;
// queue the finished PCM filtered sample
}
if(record2file_){
file.write(rec_fh_, &PCMData[pcm_in_use_][0], 2*PCM_FRAME);
}
// swap PCM buffers - Ping Pong
if(pcm_in_use_ == 0){
}else{
}
return true;
}
An adjustment to read the buffer orderly was really needed. Since the LSb of the 32 bit word is always the one that follows the bit number 31 of previous word.
A nice improvement would be to use a double buffer since every outer loop of previous code feed the CIC filter with 8 bytes.
Here is a couple of pictures of audio data in audacity, First one without the Low Pass filter software feature of code above and second one with it. Notice the difference. The record is saying "Alo" and "prueba" in spanish.
Attached are those recording so you can playback, just use 7812,5 Hz as sampling frequency when importing raw audio in audacity.
ConclusionThis project proves to be a challenge of long nights and days of no rest but at the end I could sit and see the work in retrospective while doing this long explanation, moment in where I realize FlexIO and NXP rocks!
Acknowledgements
I appreciate all the people behind this contest that makes me possible to learn more and this is my gift for the community.
I also appreciate all the people that have trust in me and makes this possible, specially my wife :)
Enjoy!
Finally here is a whole view of setup.
Comments