Published November 4, 2020 © GPL3+

Person Pose Estimation

This project detects 2D coordinates of up to 18 types of keypoints: ears, eyes, nose, neck, shoulders, elbows wrists, hips, knees, and ankle

IntermediateWork in progress4 hours249

Early Submission Prizes

Deep Learning Superhero Challenge

Things used in this project

Software apps and online services

Intel OpenVINO™ toolkit

python 3.6

cmake

C++ compile

OpenCV

Story

Prerequisites:

Install requirements:

pip install -r openvino-toolkit

Build pose_extractor module:

python setup.py build_ext

Add build folder to PYTHONPATH:

export PYTHONPATH=pose_extractor/build/:$PYTHONPATH

Running

To run the demo, pass path to the pre-trained checkpoint and camera id (or path to video file):

python demo.py --model human-pose-estimation-3d.pth --video 0

Camera can capture scene under different view angles, so for correct scene visualization, please pass camera extrinsics and focal length with --extrinsics and --fx options correspondingly (extrinsics sample format can be found in data folder). In case no camera parameters provided, demo will use the default ones.

Camera can capture scene under different view angles, so for correct scene visualization, please pass camera extrinsics and focal length with --extrinsics and --fx options correspondingly (extrinsics sample format can be found in data folder). In case no camera parameters provided, demo will use the default ones.

Inference Pipeline

Similar to all bottom-up methods, OpenPose pipeline consist of two parts:

Inference of Neural Network to provide two tensors: keypoint heatmaps and their pairwise

relations (part affinity fields, pafs). This output is downsampled 8 times.

Grouping keypoints by person instances. It includes upsampling tensors to original image

size, keypoints extraction at the heatmaps peaks and their grouping by instances.

Figure 1: OpenPose pipeline.

The network first extracts features, then performs initial estimation of heatmaps and pafs, after that

5 refinement stages are performed. It is able to find 18 types of keypoints. Then grouping procedure

searches the best pair (by affinity) for each keypoint, from the predefined list of keypoint pairs, e.g.

left elbow and left wrist, right hip and right knee, left eye and left ear, and so on, 19 pairs overall.

The pipeline is illustrated in Fig. 1. During inference, input image is resized to match network input

size by height, the width is scaled to preserve image aspect ratio, then padded to the multiple of 8.

For the network inference we use the Intel® OpenVINOTM Toolkit R4 [1], which provides optimized

inference across different hardware, such as CPU, GPU, FPGA, etc. Final performance numbers are

shown in the Table 6, they were measured for a challenging video with more than 20 estimated

poses.

We used two devices: Intel NUC6i7KYB, which performed inference on the integrated GPU Iris

Pro Graphics P580 in half-precision floating-point format (FP16), and 6-core Core i7-6850K CPU,

which performed inference in single-precision floating-point format (FP32). Network input size was

set to 456x256, which is similar to 368x368, but with 16:9 aspect ratio, suitable for processing video

streams.

Inference with OpenVINO

To run with OpenVINO, it is necessary to convert checkpoint to OpenVINO format:

Set OpenVINO environment variables:

source <OpenVINO_INSTALL_DIR>/bin/setupvars.sh

Set OpenVINO environment variables:source <OpenVINO_INSTALL_DIR>/bin/setupvars.sh

Convert checkpoint to ONNX:

python scripts/convert_to_onnx.py --checkpoint-path human-pose-estimation-3d.pth

Convert checkpoint to ONNX:python scripts/convert_to_onnx.py --checkpoint-path human-pose-estimation-3d.pth

Convert to OpenVINO format:

python <OpenVINO_INSTALL_DIR>/deployment_tools/model_optimizer/mo.py --input_model human-pose-estimation-3d.onnx --input=data --mean_values=data[128.0,128.0,128.0] --scale_values=data[255.0,255.0,255.0] --output=features,heatmaps,pafs

Convert to OpenVINO format:python <OpenVINO_INSTALL_DIR>/deployment_tools/model_optimizer/mo.py --input_model human-pose-estimation-3d.onnx --input=data --mean_values=data[128.0,128.0,128.0] --scale_values=data[255.0,255.0,255.0] --output=features,heatmaps,pafs

To run the demo with OpenVINO inference, pass --use-openvino option and specify device to infer on:

python demo.py --model human-pose-estimation-3d.xml --device CPU --use-openvino --video 0

Pukhraj Dhiman

13 projects • 6 followers

Comments

Awards

Early Submission Prizes

Deep Learning Superhero Challenge

Embed the widget on your own site

Person Pose Estimation

Person Pose Estimation

Things used in this project

Software apps and online services

Story

Prerequisites:

Running

Inference Pipeline

Figure 1: OpenPose pipeline.

Inference with OpenVINO

Convert to OpenVINO format:

Credits

Pukhraj Dhiman

Comments

Awards

Related channels and tags