A Sound Plan
The Blind Camera uses generative artificial intelligence to translate sounds into an image representing the surrounding scene.
The ability of the blind to mentally reconstruct images from sounds is a remarkable phenomenon that highlights the brain's extraordinary adaptability and the power of the human mind. Although they lack the sense of sight, individuals who are blind have developed an exceptional capacity to perceive and interpret the world around them through their remaining senses, particularly hearing.
Through years of experience and practice, blind individuals learn to differentiate between different sound patterns, allowing them to recognize familiar objects and even people by the unique sounds they emit. For example, they can distinguish the footsteps of a friend from those of a stranger, or identify the distinct sounds produced by different types of vehicles. This ability to extract meaningful information from sound sources and translate it into mental imagery enables them to build a comprehensive understanding of their surroundings and navigate the world with a remarkable level of independence.
An artist by the name of Diego Trujillo Pisanty has built a device that he calls the Blind Camera. Rather than sensing light, like a traditional camera, the Blind Camera instead listens to sounds, then translates them into an image representing the scene in front of the device. To be sure, this is by no means the same process an individual with a visual impairment goes through, but perhaps this work of art can give us some appreciation for how these individuals perceive the world around them.
The physical device is reminiscent in design to the instant cameras of a couple decades past, with one significant difference. The Blind Camera also has a large 3D-printed horn attached to it to amplify environmental sounds. On the top, there is also a small display screen that allows the user to see a preview of the image that it produces.
Pisanty is not numbered among the naysayers declaring that generative artificial intelligence is going to destroy art; rather, he recognized that it can enhance his creativity. He embraced the technology and leveraged it to great effect in building the Blind Camera. A custom artificial neural network was developed that can accept an audio recording, then generate an image consistent with those sounds.
To train the model, a number of videos were recorded around Mexico City (if you live in another part of the world, your mileage may vary using the device). Each frame of the video was associated with the previous second of captured audio. This dataset was used to train a model that was capable of encoding the sound, then decoding it back into the matching image. A second generative network was developed, and it would be shown the results of the previous network. If this second network was not convinced that the image was an actual photograph, the first network would have to try again to improve its performance in producing realistic scenes.
The models were trained on a computer with an NVIDIA GeForce RTX 3080 graphics card using Python 3 and TensorFlow 2. It was then optimized to run on a Raspberry Pi 3B single-board computer, embedded within the Blind Camera, using TensorFlow Lite. This resulted in a lightweight, standalone “camera” that can generate images representing the sounds it hears with the push of a button. The results, as you might expect, are far from a perfect copy of the actual imagery, but hey, this is art! It is a very interesting and thought-provoking project, and well worth checking out.