You Ain't Seen Nothin' Yet
A single wrist-mounted camera can estimate full-body poses — even of unseen areas — by using a deep neural network to fill in the details.
Human pose estimation seeks to predict the locations of human joints, or other body parts, in three-dimensional space. This has important applications in virtual reality, human-computer interface design, and more broadly, in teaching computers to have a more natural understanding of human behavior. Despite the utility of this technology, there are still many competing systems, each with significant drawbacks that are preventing their widespread adoption. One common approach uses tags that are attached to key points on the body, in conjunction with anchors that are positioned around the perimeter of the area. The anchors use wireless signals to accurately track the tags, however, this requires significant setup, and is only usable within the perimeter of the anchors. Clearly, this is not exactly a portable, use-anywhere solution.
Many alternative solutions exist that use cameras, accelerometers, or other sensors to estimate body poses, but they are all plagued with issues of impracticality, inaccuracies, and high costs. For this technology to really become relevant in the daily lives of people, it needs to become transparent and portable. Ideally, it would be integrated into a device that we already have with us all the time, like a smartphone or smartwatch. That sounds like quite a leap considering what most pose estimation systems look like today, but a team of engineers at Cornell University has designed a small device called BodyTrak that may just make it possible. BodyTrak uses a single, wrist-mounted camera to estimate the pose of the entire body, even unseen areas, with a high degree of accuracy.
The dime-sized camera worn on the wrist sends captured images into a deep neural network for processing. The model takes that partial image of the body and fills in the details — at present it can recognize a total of fourteen joints on the arms, legs, torso, and head in three-dimensional space in real time. A small study consisting of nine participants was conducted to validate the BodyTrak system. They were given the wrist-mounted camera, then asked to perform a variety of activities, like walking, sitting, or exercising. These activities were conducted under different situations — while indoors, outdoors, and wearing different clothes to show the portability and practicality of the system. When comparing the predicted results with ground truth measurements, the predicted locations of the fourteen joints were found to deviate from reality by an average of only 6.9 centimeters. This is quite impressive considering the camera does not even have a view of all the joints it is predicting locations for.
The natural platform for BodyTrak to be implemented in is a smartwatch, as they are already worn on the wrist. Such devices commonly already have a camera, which means the algorithm could potentially run without any changes to the existing hardware design. This will take a bit of work to make happen, however. The algorithm is computationally intensive, and running it would drain the watch's battery very quickly, if the watch even had sufficient resources to run BodyTrak in the first place. But between software optimizations, and advancements in hardware technology, it should not be too long before this becomes possible.