Introduction: The goal of our project was to build a robot that would listen to you hum and play it on a recorder in real-time.
Abstract: The combination of artificial intelligence with musical expression in human-robot interaction has opened up a fascinating new field of study. To close the creativity gap between humans and robots, this project presents the creation of a robot that can turn hummed tunes into recorder tunes. The main goal is to develop an engaging interface that allows people to easily convey their musical ideas to a robot counterpart, which will provide an immediate and pleasing replay.
The robot will be able to understand and mimic the subtleties of a user's humming input because of the combination of signal processing algorithms and robotic control mechanisms in the proposed system.
This paper outlines the design, implementation, and testing phases of the Real-Time Hum-to-Play Recorder robot. We delve into the technical intricacies of the signal processing pipeline and the robotic control mechanisms that collectively contribute to the execution of the project. Additionally, we discuss the potential applications of this project including interactive performances, educational settings, and artistic collaborations.
As the intersection of robotics and music continues to evolve, this project stands as a testament to the possibilities of human-robot collaboration in the realm of artistic expression.
Inspiration:
We took inspiration from the following Self-Playing Piano which pitches itself similarly, "If you ever dreamt of waking up one day and just be able to play piano fluently without having to practice the Grandiola Piano Player is probably your best option. You only need to pedal the pedals and voila, music."
We also took inspiration from this self-playing guitar robot playing the Pirates of the Carribean theme.
Related work:
Sing2Notes - Your singing transcribed into notes ♫ | Klangio
Sing2Notes lets you upload your vocal recording as an MP3 file or import a YouTube vocal cover and gives you a PDF, MIDI, and MusicXML transcription right away.
Blitz City DIY made a Recorder Musical Robot, where they manually controlled the solenoids to block and release a hole in the recorder.
Implementation: We completed the project in three stages:
Milestone 1: We initially thought of using a Harmonica as our instrument but quickly realized the complexity of repetitively sucking out air.
This was our system design at the time and we hoped to tackle the following: curating the dataset, learning the mechanism behind the instrument we use, mitigating the effect of multipath fading and environmental noise, and the mechanical design of our robot.
Milestone 2: At this stage, we changed our instrument to a recorder due to its simplicity, which would only involve blowing air through an air pump.
We switched from an ESP32 to a Sony Spresense board due to its capability to process and store audio data.
We used an ICS-40180 MEMS microphone to capture audio data.
Our subsequent task involved crafting a robust algorithm to identify musical notes generated through humming. To achieve this, we meticulously mapped the frequency of the hummed sound to their corresponding musical notes. Employing Python, we devised a comprehensive codebase that seamlessly converted an MP3 file into raw PCM data.
Our methodology incorporated the utilization of the Short-Time Fourier Transform (STFT) function, a powerful tool for analyzing time-varying frequency content in audio signals. By applying the STFT function, we obtained a spectrogram of the recorded data. This spectrogram served as a graphical representation, revealing the dominance of various frequencies at distinct points in time during the humming sequence.
The final step involved a straightforward yet effective mapping process. We correlated the identified frequency domains within the spectrogram to their corresponding musical notes. This mapping process allowed us to decipher the musical composition inherent in the hummed input, creating a bridge between the acoustic characteristics of the sound and the musical notes they represented. The culmination of these efforts resulted in a Python codebase capable of translating hummed melodies into a recognizable musical format.
Build: We finalized the build of our project. Here we see a side view of the build around the recorder. An air pump would be positioned near the mouthpiece to blow air into the recorder. Both the recorder and the air pump would be fit into a 3D printed mount that would firmly hold both items. The holes on the recorder would have solenoids attached to each. These solenoids would push down or go up as and when the note would be detected. Each note is mapped to a solenoid.
Our next steps were to design the 3D model of the chassis that would hold together the entire model. This chassis would include the recorder, 6 solenoids, air pump, batteries, Spresense board, and circuit wiring. 3D print the model and iterate until all fittings are precise.
Circuit:
The control circuits for both the solenoid and vacuum pump share a similar design, employing a straightforward NPN transistor-based switching configuration. When the base current of the transistor is elevated, the transistor becomes conductive, allowing current to flow from the collector to the emitter, which is connected to the ground. As a result, the solenoid and air pump which are in series connection with the collector, both powered by external voltage sources, initiate their operation.
Milestone 3:
We modeled the chassis that would hold the recorder and the air pump in Fusion 360. It includes a mount where the air pump would be placed with a smaller fount where the recorder would be fixed.
The solenoid gets fixed using the two attachments we designed in Fusion 360.
The bottom attachment lifts the recorder to be level along with attaching the solenoid with M2 screws.
Together the attachment looks like the figure below.
Altogether the two attachments are fixed to the solenoid (green) using M2 screws.
Together, the chassis along with the recorder and the solenoid looks like this. The attachments are repeated 7 times across the length of the recorder to cover the 7 holes.
This is the front view of the 7 solenoids attached to the recorder using the 3D-printed attachments. We used a hot glue gun to secure the recorder.
Circuit: Next, we connected everything as we described in the Circuit section above.
Code: In parallel, we worked on setting up our code for audio detection, audio recording, note detection, and manipulating the solenoids and the LEDs.
Audio Recording and Note Detection: The Spresense platform can capture audio input from four analog channels. In our application, we utilized one channel specifically for sampling raw audio data at a rate of 16 kHz. The circuit configuration is straightforward, involving the connection of the bias, ground, and VDD of a MEMS microphone to the corresponding ground, bias, and VDD terminals on Spresense's extension board.
Upon obtaining raw PCM data from the Spresense board, characterized by a sampling rate of 16kHz and a bit depth of 12 bits, our approach involved dividing the data into chunks of 0.063 seconds, each corresponding to one frame. Subsequently, we leveraged the KissFFT library to execute Fast Fourier Transform (FFT) on individual chunks, allowing us to identify the predominant frequency within that time interval.
Following the FFT analysis, we extracted the dominant frequency value, which was then mapped to a predefined note table. This process facilitated the conversion of frequency values into corresponding musical notes, providing a clear representation of the audio data.
LED and Solenoid Manipulation: Once we received the numerical notes, we mapped these notes to different digital pins and one separate digital pin of the Spresense board to control the air pump, once a note gets detected the digital pin of the air pump and pins corresponding to those notes gets high and the notes get displayed on the led setup. This simultaneous activation triggered the display of the detected notes on the LED strip, orchestrating a synchronized response between the air pump and the LED output.
Video Demonstration:
Applications:
Music teaching: By giving students a hands-on and engaging experience, the robot may be used as an innovative tool in music teaching. It can help impart musical ideas, melody construction, and the principles of musical expression engagingly.
Assistive Technology: The robot may be customized to provide people with impairments or restricted mobility with a way to express themselves musically that goes beyond conventional boundaries.
Live Concerts and Events: Incorporating the robot into live musical performances can add a captivating element to concerts, festivals, and events. The robot's ability to instantly respond to hummed melodies can create dynamic and interactive musical experiences for the audience.
Conclusion: While we didn't fully achieve what we set out to do in Milestone 1, we believe that what we came up with works well as a proof of concept. The robot that we present is capable of taking humming and sound data and turning it into instructions for recorder instrument playing.
Comments