Created November 28, 2023 © Apache-2.0

Audio visual fence for visually impaired

Use AI to implement direction of arrival and object detection for car for audio/visually impaired people.

AdvancedOver 6 days40

Honorable Mention Prize Swimming

Build2gether Inclusive Innovation Challenge

Audio visual fence for visually impaired

Things used in this project

Hardware components

Google Coral Dev Board

Avnet UltraZed-EV Starter Kit

Story

Introduction

This project will document how to build a device that serves as an aid for individuals with auditory and visual impairments. Such individuals most often develop advance tactile sensory skills , hence the idea is to build a gadget that uses a circular microphone array with advanced beam-forming algorithms to detect direction of arrival of sound and couple that with a camera pipeline that provides depth/distance information. The direction of arrival is used to activate tiny motors on a collar to notify the user of direction of sound. This project focuses only on the audio algorithms. The device is envisioned as an assistive device with sound localization capabilities serving as an assistant for people with reduced hearing and visibility. .

Hardware design

The system block diagram is as below:

A circular array using 8 I2S microphones was developed. The microphones array was grouped in groups of two.Each I2S microphone comes with the following signals:

WS - serial data word
LR - LEFT/ right channel select - HIGH (right channel) / LOW (left channel)
SCK - serial data clock
SD - serial data output

The LR pin was hardwired for each MIC pair so that one is allocated to left channel and the other one to the right. An FMC mezzanine was used to interface the microphone board with the FPGA.

On the FPGA SOC side a PYNQ OS is running a python script that reads the microphone channels and sends them via TCP to the coral Mini edge TPU. While the complete process can be done the FPGA SOC the idea behind this project was to evaluate the edge TPU accelerator.

FPGA design

The FPGA block diagram is shown below. An FPGA is used to acquire the sound stream and camera frames. The FPGA design is composed of the audio pipeline .

The audio pipeline makes use of the Xilinx I2S receiver and Xilins audio formatter.

1 / 3

The Xilinx I2S receiver is used to interface with the I2S microphones. The IP interfaces with the Xilinx I2S data formatter which supports multiple channels. The output audio stream is sent to the Processing Subsystem (PS).

On the PS side, the data is split into chunks, converted to spectrograms and sent via Ethernet to the TPU. The PYNQ OS is used to receive data from the audio array.

The first problem to address is how to determine the direction of arrival of a sound source. This requires more than one microphone sensor so a linear or circular microphone array has to be used.Circular microphone arrays do not suffer from some disadvantages of linear arrays so in this case an 8 microphone array was constructed.

The microphone sensors use the I2S protocol. They are grouped in 4 groups with 2 microphone sensors providing left and right channels of the I2S IP core. The device tree has to be edited to add the correct instantiatation.

The idea is to read the audio from the 8 channels in chunks , then use pyrosa to build mel cepstrum spectrograms. These spectrograms are then converted to log scale which can then be used to train a neural network for direction of arrival.

The plots obtained from the audio source are both phase and magnitude plots so both have to be used in order to get an accurate estimation.

Software

A Lenet5 network was built and ported to the Coral Mini.
The design of the neural network for DOA detection obviously requires more sophisticated network but this serves as stepping stone.

Note that algorithms like MUSIC, GC-PHAT can be used for detecting direction of arrival but the main intention was to design a neural network to do that.

The pyrosa library was used to get the spectrum of the audio channels. For experimental purposes the data was read from a wav file.

import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

def plot_magnitude_and_phase(wav_file):
    # Load the audio file
    y, sr = librosa.load(wav_file)

    # Compute the Short-Time Fourier Transform
    stft_result = librosa.stft(y)
    magnitude, phase = librosa.magphase(stft_result)
    log_magnitude = librosa.amplitude_to_db(magnitude, ref=np.max)

    # Plot the magnitude spectrogram
    plt.figure(figsize=(12, 8))

    plt.subplot(2, 1, 1)
    librosa.display.specshow(log_magnitude, sr=sr, x_axis='time', y_axis='log')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Log Magnitude Spectrogram')

    # Plot the phase spectrogram
    plt.subplot(2, 1, 2)
    librosa.display.specshow(librosa.amplitude_to_db(phase), sr=sr, x_axis='time', y_axis='log')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Log Phase Spectrogram')

    plt.tight_layout()
    plt.show()

if __name__ == "__main__":   
    wav_file_path = '0_03_1.wav'
    plot_magnitude_and_phase(wav_file_path)

Both the phase and magnitude of the sound data chunks are needed for reconstruction.

The idea is to concatenate both spectrogram and phase plot images and pass them to a neural network to detect direction of sound.

Real world sounds arrive originate from moving sources and are affected by the Doppler shift so a neural network capable of addressing direction of arrival from moving sources needs more complexity that a simple Lenet5 network.

Porting a custom CNN to edge TPU

On the coral mini board a python server using the pyaudio library is running which receives the audio channels which are then forwarded to the neural network.

The main steps to port a custom network to the EdgeTPU are as follows:

1.First install Edge TPU compiler

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

sudo apt-get update

sudo apt-get install edgetpu-compiler

To connect to the coral mini board, ssh keys have to be generated.

Issue :

mdt devices

then

mdt shell

2. Design network in Tensorflow or Pytorch

The idea behind the Lenet5 network is to modify the last fully connected layer from 10 to 8 which matches with the number of 8 microphones on board.

The model was defined for the mnist dataset however, for actual use the model will require updating with the resolution of the spectrogram image above.

#the LeNet-5 model architecture for spectrogram images
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(240, 240)),  #matches shape of spectrogram image
tf.keras.layers.Reshape(target_shape=(240, 240, 1)),
tf.keras.layers.Conv2D(filters=6, kernel_size=(5, 5), activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5), activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(120, activation='relu'),
tf.keras.layers.Dense(84, activation='relu'),
tf.keras.layers.Dense(8)
])

The network requires post quantization aware training so a dataset that represents the data must be used to quantize the weights.

In addition to that it is critical to set the last line otherwise it will not map any instructions to the edge TPU . This seems to be a bug with the compiler which took some time to figure out.

3. Post training quantization and export network

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.experimental_new_converter = False

To convert a neural network model from float model to run on the edge TPU the model has to be processed under full integer quantization using post quantization training.

So at this point the custom model can run on the coral mini board. the next step is to generate data for training and validation sets and replace the MNIST data set with the actual data set. And that's all.

Conclusion

This project showed some experiments for implementation of direction of arrival using neural networks on a Coral MINI edge TPU accelerator using a custom circular microphone array.

Code

Credits

Dimiter Kendri

23 projects • 159 followers

Robotics and AI

Contact

Comments

Please log in or sign up to comment.

Awards

Honorable Mention Prize Swimming

Build2gether Inclusive Innovation Challenge

Audio visual fence for visually impaired

Things used in this project

Hardware components

Story

Schematics

schematics_IId4iExE9E.png

Code

quantized lenet neural network for audio detection

Credits

Dimiter Kendri

Comments

Awards

Embed the widget on your own site

Audio visual fence for visually impaired

Audio visual fence for visually impaired

Things used in this project

Hardware components

Story

Schematics

schematics_IId4iExE9E.png

Code

quantized lenet neural network for audio detection

Credits

Dimiter Kendri

Comments

Awards