Virtual sonar

Move freely on an open field. Listen to your headphones for audible clues of where virtual objects are. Try to walk through them.

IntermediateWork in progressOver 4 days76

Things used in this project

Hardware components

Blues Swan

DFRobot Gravity: I2C BMI160 6-Axis Inertial Motion Sensor

Seeed Studio XIAO ESP32S3 Sense

Story

This project is unfinished

Due to major hardware failure, I didn't finish in time. First the Blues Swan board stopped flashing new firmware. I switched to the Blackpill board. Same problem. I switched to the Seeed Xiao. Same problem. At that point I realized the problem was on my laptop. Despite a lot of re-installing software I couldn't identify the exact problem. So I switched to another laptop. But at that time I had no time left. So the rest of this project description will be about my unfinished project, which I indeed will finish sometime. After all, it's a great idea and the support from the community has been great.

Basic idea

We recognise direction and distance to a sound source by perceiving the sound with two ears. The sense of direction emerges from two factors. 1) If one ear hears the sound louder, the source of the sound is on that side. 2) If one ear hears the sound a few microseconds earlier, the source of the sound is on that side. The sense of distance emerges from the ratio between the sound level of the direct sound and the sound level of reverberation. In a strongly simplified model, the sound level of the reverberation of the sound is the same no matter how far the sound source is. Simply put, in an echoing, large hall, you hear the same echoing or reverbing of the sound, no matter where you are or where the sound source is.

So, to create a sound which appears to be 10 m away at direction 10 o'clock (60 degrees to the left from straight ahead), a stereo sound is created out of a single monophonic sound. The left channel will be slightly more amplified than the right channel, And the right channel will be time shifted by the extra time it takes for the sound to reach the right ear compared to the left ear. To create the illusion of a 10 m distance to the sound source, the direct sound must be reduced, while a reverb component might remain at same level.

Short description

Blues Swan MCU

The Swan has, among other pins, two DAC pins, which can output audio signals. The signals form together a stereophonic sound with clues of direction and distance to a virtual sound source.

XIAO ESP32S3 Sense

The XIAO and its camera can provide visual info about the environment.

DFRobot BMX160 9-axis Sensor

The BMX160 is an inertial measuring unit (IMU), which can detect movement in 3 dimensions, rotation in 3 axes and the magnetic field of the Earth in 3 dimensions.

Other

The Swan connects to a set of headphones. On the top of the headband, the XIAO and its camera is mounted, as well as the BMX160. Using the equipment requires a soccer field or similar. The field boundary is marked with flags.

An example of a program flow:

The user starts on the middle of a soccer field.
The Swan produces sounds as clues of the environment. At start, the Swan orientates itself with the help of the BMX160. It also places randomly a few collectable virtual objects on the field.
Two adjacent beep sounds tell the user where the goal ends of the fields are.
The objects on the field have their own beeps, which reveal to which goal they belong.
Only objects not more than 10 m from the user will sound in the headphones.
The task is to find all objects by walking or running on the field, listening for objects.
When an object is heard, the user tries to walk over the object. Succeeding in that makes a certain sound. After that the user has to walk or run to the right goal. After that, it's time to search for the next object.

What the Swan does

The Swan creates a model of the field. The sound "sources" are the two goals, the collectable objects and the boundary flags. As the user moves, the BMX160 provides info about movement. The Swan calculates the new position and direction of itself, and hence, the position and direction ofthe user's head. So the user can stand still but turn their head and still hear the right direction of audible objects.

The Swan corrects its own positional "awareness" by comparing its dead reckoning with clues from the actual environment. These clues consist of what the XIAO camera can recognise, as well as the magnetic field.

First acoustic test

I created a simple monophonic sound of a piano playing the note G4. The sampling rate is 44.1 kHz. A quick calculation gives that sound travels some 7 mm on each sample. So the time shift between the ears, when sound coming directly from one side, is about 20 samples.

From the single monophonic sound snippet I made a copy. Now I have a left channela nd a right channel. In Audacity it looks like this:

The upper track is panned to the left and the lower to the right. But the waves are completely identical. When listening with the headphones, it's just like a monophonic sound right in the middle. Either in front of you or behind you.

Now, without changing the sound levels, just by shifting one track a few samples, I can create a sense of direction. I zoom in to actually view the individual samples:

I use the Time Shift Tool to move the lower track 15 samples. This gives the sense of the sound source moving to the left.

The following sound wave is produced by playing this sound sample 15 times. The first sound is completely in the middle. After that, the right channel is shifted 2, -4, 6, -8, 10, -12... samples, making the sensation of the sound source jumping to the left, to the right, back and forth a few times, after returning to the middle for the last three times.

To be listened with headphones.

For everyone of us, the delay in micorseconds probably means different directions. Our head shape and ear distance differ. Some form of calibration is needed for the user. It might also be that a calibration is happening while using the equipment. When the sound comes from the side, the user turns to that side, until the sound comes from straight ahead. Straight ahead in this concept is unambiguous, it's simply when both channels are the same.

Sound volume and other factors

In the video, one can notice that the sound volume is constant. The only thing causing the sensation of direction is the delay. In real world, also the different sound volume in left ear and right ear contributes to the sensation of direction. A third factor is how the sound gets filtered having to pass the listener's head.

A shift in the approach

Due to time constraints, I decided to switch from real time audio processing to pre-recorded sounds. The real time audio processing, where the produced audio would accurately be calculated according to exact measurements from the accelerometer, the gyroscope and the magnetometer turned out to be a huge task, to which I'll definitely return in the future. Right now I'm instead going for pre-produced sound samples. I'll have an audio editor create enough sound samples to be played from an SD card. The sound samples will cover 32 directions and 10 distances.

The state of the project at the time of the deadline

Due to what I think was a hardware problem, I abandoned the Blues Swan board. The board is probably just fine, but my laptop and its drivers aren't. After switching laptop, I never returned to the Blues Swan, which is a shame, because I really wanted to get the two channel DAC work as a stereo audio output. I'll return to that.

Right now I'm still working on getting the Seeed Xiao work with DFRobot's BMX160, a 9 DOF IMU, and DFRobot's DFPlayer. The DFPlayer works nicely by its own and my pre-recorded sound samples work great. The serial connection between the Xiao and the DFPlayer still doesn't work properly.