Using multiple Spresense devices (at least two), each equipped with camera and ambient microphone, surrounding a volume of space and looking from the boundary in, a scene can be 3D segmented visually and acoustically. With the devices doing edge sensor fusion and smart vision on their individual local streams, and further forwarding the geotagged streams for remote integration, the scene can be comprehended in multiple sensor dimensions in real time. The devices can be either static or freely mobile. The end user can dynamically choose their point of view on the scene and examine in detail to the extent allowed by the multiplicity of the device streams being integrated.
This technology, lightweight and power-efficient, can be used in many ways: casual observation by passers-by of a captivating street juggling act, crime-scene provenance for identification of blame in court, traffic observation at critical intersections or complex transportation hubs, a compelling new genre of POV-controlled video shorts, etc. Every application where a single camera and microphone provide a flat single-POV perspective on an object or scene of interest, a set of cooperative multi-sensor devices can provide a much rich multi-perspective experience of the scene. Certainly, this is not novel technology (it has been employed to create rich sports viewer experiences in many sports), but the power-efficiency yet high-quality Spresense platform can bring it out of the stadium and put it right out on the sidewalk.
The technologies for stitching streams already exist and support multi-million dollar industries. This is a democratization of the technology into the mainstream. It is impossible to predict all the possible uses it may be put to if made widely accessible and affordable, just like good old flat video was; better implement and put in the hands of creative humans, sit back, and enjoy the emergence of novel applications.
Main featuresThe main idea of the solution is to enrich, integrate, process, and forward multiple streams of sensor data describing the same volume of space at the same moment in time. So, the main features are:
1. Multiple sensors (at least camera, microphone, and GPS) or arrays of sensors creating individual streams at the same time. (Spresence main, vanilla extension, and camera.)
2. Software- or hardware-based sensor fusion (HW might require the augmentation of the computing stack with a lightweight power-efficient FPGA). (Design of custom extension for offloading of HW-based sensor fusion to FPGA.)
3. For full mobility, a custom extension board running on battery power needs to be designed and fabricated. (Design of custom extension to support mobile power in a wearable form factor.)
4. Machine vision models for real-time segmentation of the "flat" local streams and embedding of metadata (aka second-stage sensor fusion).
5. LTE streaming of individual streams. (LTE extension and SIM card with global IoT plan.)
6. Post-processing, stitching and serving of POV-control streamed video. (Development of multi-media protocol extensions for accepting the rich data.) At this point, the data can be all tagged with proof-of-work and other block-chain provenance, and/or encrypted, depending on the application.
Development roadmapThere are 3 or 4 stages of development:
1. Building the sensor-fusion and 2D segmentation pipeline for the local sensor stream for a single Spresence device (main, extension, camera).
2. Geotagging and CORDIC processing of the stream to embed maximum metadata for downstream integration of the streams (main, extension, camera).
3. Simultaneous streaming from at least 2 devices over LTE to a scene-specific cloud server (after proof of concept, this service can be developed to serve real-time scenes and events on demand).
4. Stitching and integration of the multiple simultaneous streams and providing real-time streaming of the rich stream to both browser viewers (no POV control) to custom mobile apps (full POV control).
OutroThis project challenge came during a tough time for me, and after slipping the development roadmap several times due to high-priority personal matters preempting the project, ultimately wasn't realized. It will be taken up again shortly. This submission is meant only to meet my commitment to Hackster.io and the project sponsor Sony.
Comments