The COVID-19 pandemic means that many of us are staying at home and sitting down more than we usually do. It’s hard for a lot of us to do the sort of exercise we normally do. It’s even harder for people who don’t usually do a lot of physical exercises. Exercise is especially important now because it can reduce stress, prevent weight gain, boost the immune system, and improve sleep. We can safely engage in physical activity by exercising with family, using online fitness resources, or taking a virtual class. Many fitness experts and trainers have started their streaming services using apps and have seen unprecedented growth in the last five months.
The sedentary lifestyle that most of the working professionals and students live has led to a rise in lifestyle diseases in recent years. Diseases like hypertension and Blood Sugar are the most common ones due to lack of exercise. These diseases have several long-term consequences that do hamper our quality of living. This has, in turn, lead to the rise of fitness and wellness apps which push an individual to exercise and make better lifestyle choices. Upwards of 38 thousand, such apps are available on the Play Store and Appstore.
Our pose detection model can implement in several such apps for ensuring whether an individual is doing a certain exercise correctly. It can also be used for monitoring purposes by the individual himself when no external supervision is available.
A human pose detector which on further development can be used for the streaming of online fitness services. This is inspired by the demo of Intel OPENVINO’s pose detection model.
A fast-stacked hourglass network for human pose estimation on OpenVINO
The stacked hourglass network proposed by [Stacked Hourglass Networks for Human Pose Estimation] (https://arxiv.org/abs/1603.06937) is a very good network for single-person pose estimation regarding speed and accuracy. This network introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a "stacked hourglass" network based on the successive steps of pooling and upsampling that is done to produce a final set of predictions.
Using the OpenVINO Toolkit
Initializing the OpenVINO environment
The model optimizer of OpenVINO was first used to generate the IR (Intermediate Representation) files (.bin and .xml) of the model (.onnx). Then the model was deployed.
Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. The model Optimizer process assumes you have a network model trained using a supported deep learning framework. Model Optimizer produces an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine.
- .xml - Describes the network topology
- .bin - Contains the weights and biases binary data.
The Webcam and an image both were used as inputs. The video demonstration and the output images are given below.
Using OpenVINO Toolkit - Benchmark App
The Benchmark tool was used to estimate deep learning interference performance. Performance can be measured for two inference modes:
1) Synchronous (latency-oriented)
2) Asynchronous (throughput-oriented)
Various parameters were changed and observations were made for the Benchmark tool.
The list of all the parameters and their values tested on the CPU, GPU, and both modes are given in the image below.
Snapshots of the Outputs from Benchmark Tool
#Setting up environment variables
C:\Program Files (x86)\IntelSWTools\openvino\bin>setupvars.bat
Python 3.6.5
[setupvars.bat] OpenVINO environment initialized
#Benchmark app command - CPU
C:\Users\Satvik\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m D:\Human_pose\Fast_Stacked_Hourglass_Network_OpenVino\models\model_best.xml -i D:\Human_pose\sample.jpg -d CPU
#Benchmark app command - GPU
C:\Users\Satvik\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release>benchmark_app.exe -m D:\Human_pose\Fast_Stacked_Hourglass_Network_OpenVino\models\model_best.xml -i D:\Human_pose\sample.jpg -d GPU
Future Scope
1) Pose estimation also presents an opportunity to create more realistic and responsive augmented reality (AR) experiences. From pieces of paper to musical instruments, to pretty much anything you can think of, rigid pose estimation allows us to determine a given object’s primary key points and track them as they move through real-world spaces.AR, in its essence, allows us to place digital objects in real-world scenes. This could be testing out a piece of furniture in your living room by placing a 3D rendering of it in the space or trying on a pair of digitally-rendered shoes. So, with pose detection, if we’re able to locate and accurately track a physical object in real-world space, then we can also overlay a digital AR object onto the real object that’s being tracked.
2) Traditionally, character animation has been a manual process that’s relied on bulky and expensive motion capture systems. However, with the advent of deep learning approaches to pose estimation, there’s the distinct potential that these systems can be streamlined and, in many ways automated. Recent advances in both pose estimation and motion capture technology are making this shift possible, allowing for character animation that doesn’t rely on markers or specialized suits, while still being able to capture motion in real-time. Similarly, capturing animations for immersive video game experiences can also potentially be automated by deep learning-based pose estimation.
3) Traditionally, industrial robotics has employed 2D vision systems to enable robots to perform their various tasks. However, this 2D approach presents several limitations. Namely, computing the position to which a robot should move, given this 2D representation of space, requires intensive calibration processes, and become inflexible to environmental changes unless reprogrammed. With the advent of 3D pose estimation, however, the opportunity exists to create more responsive, flexible, and accurate robotics systems.
Comments