MonoEye Uses a Single Wide-Angle GoPro Chest Camera to Perform Accurate Human Motion Capture
A GoPro fitted with a 280-degree wide-angle lens, feeding three neural networks, proves more than a match for traditional mocap systems.
A team of researchers from the Tokyo Institute of Technology (Tokyo Tech) and Carnegie Mellon University have unveiled a motion capture system for people that, they say, can operate with nothing more than a single wearable camera system: MonoEye.
"Existing optical motion capture systems use multiple cameras, which are synchronized and require camera calibration," the researchers explain of the problem they sought to solve. "These systems also have usability constraints that limit the user's movement and operating space. Since the MonoEye system is based on a wearable single RGB camera, the wearer's 3D body pose can be captured without space and environment limitations."
"The body pose, captured with our system, is aware of the camera orientation and therefore it is possible to recognize various motions that existing egocentric motion capture systems cannot recognize. Furthermore, the proposed system captures not only the wearer's body motion but also their viewport using the head pose estimation and an ultra-wide image."
The camera itself is a custom-built chest-mounted prototype which pairs a GoPro Hero 7 Black action camera with an Entaniya M12-280 fish-eye lens, offering a 280 degree field of view — capturing both the wearer's limbs and the surrounding environment. Data from the camera are then processed using three custom deep neural network models: BodyPoseNet, HeadPoseNet, and CameraPoseNet, each estimating the poses of their respective domains in real-time.
"MonoEye’s simple hardware setup enables multimodal motion capture in everyday life without restrictions on location and time," the researchers note. "Different from conventional methods, our system uses the ultra-wide chest-mounted camera that captures not only the user’s 3D pose but also the user’s viewport and the surrounding environment in which there are various activity cues."
"Our method estimates accurate 3D pose that is comparable with third-person view-based monocular motion capture methods. The combination of RGB images and deep learning methods provide stable results even in outdoor environments. We can estimate 3D human pose with camera orientation information [...] by combining the prediction results of CameraPoseNet and BodyPoseNet. We can distinguish the motions that have the same position and different camera directions, that previous portable motion capture systems are not able to distinguish."
The team's work has been published as part of the proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST'20), while the paper is available under open access terms on the ACM Digital Library.