Introduction
This project will document how to build a device that serves as an aid for individuals with auditory and visual impairments. Such individuals most often develop advance tactile sensory skills , hence the idea is to build a gadget that uses a circular microphone array with advanced beam-forming algorithms to detect direction of arrival of sound and couple that with a camera pipeline that provides depth/distance information. The direction of arrival is used to activate tiny motors on a collar to notify the user of direction of sound. This project focuses only on the audio algorithms. The device is envisioned as an assistive device with sound localization capabilities serving as an assistant for people with reduced hearing and visibility. .
Hardware design
The system block diagram is as below:
A circular array using 8 I2S microphones was developed. The microphones array was grouped in groups of two.Each I2S microphone comes with the following signals:
- WS - serial data word
- LR - LEFT/ right channel select - HIGH (right channel) / LOW (left channel)
- SCK - serial data clock
- SD - serial data output
The LR pin was hardwired for each MIC pair so that one is allocated to left channel and the other one to the right. An FMC mezzanine was used to interface the microphone board with the FPGA.
On the FPGA SOC side a PYNQ OS is running a python script that reads the microphone channels and sends them via TCP to the coral Mini edge TPU. While the complete process can be done the FPGA SOC the idea behind this project was to evaluate the edge TPU accelerator.
FPGA design
The FPGA block diagram is shown below. An FPGA is used to acquire the sound stream and camera frames. The FPGA design is composed of the audio pipeline .
The audio pipeline makes use of the Xilinx I2S receiver and Xilins audio formatter.
The Xilinx I2S receiver is used to interface with the I2S microphones. The IP interfaces with the Xilinx I2S data formatter which supports multiple channels. The output audio stream is sent to the Processing Subsystem (PS).
On the PS side, the data is split into chunks, converted to spectrograms and sent via Ethernet to the TPU. The PYNQ OS is used to receive data from the audio array.
The first problem to address is how to determine the direction of arrival of a sound source. This requires more than one microphone sensor so a linear or circular microphone array has to be used.Circular microphone arrays do not suffer from some disadvantages of linear arrays so in this case an 8 microphone array was constructed.
The microphone sensors use the I2S protocol. They are grouped in 4 groups with 2 microphone sensors providing left and right channels of the I2S IP core. The device tree has to be edited to add the correct instantiatation.
The idea is to read the audio from the 8 channels in chunks , then use pyrosa to build mel cepstrum spectrograms. These spectrograms are then converted to log scale which can then be used to train a neural network for direction of arrival.
The plots obtained from the audio source are both phase and magnitude plots so both have to be used in order to get an accurate estimation.
Software
A Lenet5 network was built and ported to the Coral Mini.
The design of the neural network for DOA detection obviously requires more sophisticated network but this serves as stepping stone.
Note that algorithms like MUSIC, GC-PHAT can be used for detecting direction of arrival but the main intention was to design a neural network to do that.
The pyrosa library was used to get the spectrum of the audio channels. For experimental purposes the data was read from a wav file.
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
def plot_magnitude_and_phase(wav_file):
# Load the audio file
y, sr = librosa.load(wav_file)
# Compute the Short-Time Fourier Transform
stft_result = librosa.stft(y)
magnitude, phase = librosa.magphase(stft_result)
log_magnitude = librosa.amplitude_to_db(magnitude, ref=np.max)
# Plot the magnitude spectrogram
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
librosa.display.specshow(log_magnitude, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log Magnitude Spectrogram')
# Plot the phase spectrogram
plt.subplot(2, 1, 2)
librosa.display.specshow(librosa.amplitude_to_db(phase), sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log Phase Spectrogram')
plt.tight_layout()
plt.show()
if __name__ == "__main__":
wav_file_path = '0_03_1.wav'
plot_magnitude_and_phase(wav_file_path)
Both the phase and magnitude of the sound data chunks are needed for reconstruction.
The idea is to concatenate both spectrogram and phase plot images and pass them to a neural network to detect direction of sound.
Real world sounds arrive originate from moving sources and are affected by the Doppler shift so a neural network capable of addressing direction of arrival from moving sources needs more complexity that a simple Lenet5 network.
Porting a custom CNN to edge TPU
On the coral mini board a python server using the pyaudio library is running which receives the audio channels which are then forwarded to the neural network.
The main steps to port a custom network to the EdgeTPU are as follows:
1.First install Edge TPU compiler
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
sudo apt-get update
sudo apt-get install edgetpu-compiler
To connect to the coral mini board, ssh keys have to be generated.
Issue :
mdt devices
then
mdt shell
2. Design network in Tensorflow or Pytorch
The idea behind the Lenet5 network is to modify the last fully connected layer from 10 to 8 which matches with the number of 8 microphones on board.
The model was defined for the mnist dataset however, for actual use the model will require updating with the resolution of the spectrogram image above.
#the LeNet-5 model architecture for spectrogram images
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(240, 240)), #matches shape of spectrogram image
tf.keras.layers.Reshape(target_shape=(240, 240, 1)),
tf.keras.layers.Conv2D(filters=6, kernel_size=(5, 5), activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5), activation='relu'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(120, activation='relu'),
tf.keras.layers.Dense(84, activation='relu'),
tf.keras.layers.Dense(8)
])
The network requires post quantization aware training so a dataset that represents the data must be used to quantize the weights.
In addition to that it is critical to set the last line otherwise it will not map any instructions to the edge TPU . This seems to be a bug with the compiler which took some time to figure out.
3. Post training quantization and export network
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.experimental_new_converter = False
To convert a neural network model from float model to run on the edge TPU the model has to be processed under full integer quantization using post quantization training.
So at this point the custom model can run on the coral mini board. the next step is to generate data for training and validation sets and replace the MNIST data set with the actual data set. And that's all.
Conclusion
This project showed some experiments for implementation of direction of arrival using neural networks on a Coral MINI edge TPU accelerator using a custom circular microphone array.
Comments