- Install requirements:
pip install -r openvino-toolkit
- Build
pose_extractor
module:
python setup.py build_ext
- Add build folder to PYTHONPATH:
export PYTHONPATH=pose_extractor/build/:$PYTHONPATH
RunningTo run the demo, pass path to the pre-trained checkpoint and camera id (or path to video file):
python demo.py --model human-pose-estimation-3d.pth --video 0
Camera can capture scene under different view angles, so for correct scene visualization, please pass camera extrinsics and focal length with --extrinsics
and --fx
options correspondingly (extrinsics sample format can be found in data folder). In case no camera parameters provided, demo will use the default ones.
Camera can capture scene under different view angles, so for correct scene visualization, please pass camera extrinsics and focal length with--extrinsics
and--fx
options correspondingly (extrinsics sample format can be found in data folder). In case no camera parameters provided, demo will use the default ones.
Inference Pipeline
Similar to all bottom-up methods, OpenPose pipeline consist of two parts:
- Inference of Neural Network to provide two tensors: keypoint heatmaps and their pairwise
relations (part affinity fields, pafs). This output is downsampled 8 times.
- Grouping keypoints by person instances. It includes upsampling tensors to original image
size, keypoints extraction at the heatmaps peaks and their grouping by instances.
Figure 1: OpenPose pipeline.The network first extracts features, then performs initial estimation of heatmaps and pafs, after that
5 refinement stages are performed. It is able to find 18 types of keypoints. Then grouping procedure
searches the best pair (by affinity) for each keypoint, from the predefined list of keypoint pairs, e.g.
left elbow and left wrist, right hip and right knee, left eye and left ear, and so on, 19 pairs overall.
The pipeline is illustrated in Fig. 1. During inference, input image is resized to match network input
size by height, the width is scaled to preserve image aspect ratio, then padded to the multiple of 8.
For the network inference we use the Intel® OpenVINOTM Toolkit R4 [1], which provides optimized
inference across different hardware, such as CPU, GPU, FPGA, etc. Final performance numbers are
shown in the Table 6, they were measured for a challenging video with more than 20 estimated
poses.
We used two devices: Intel NUC6i7KYB, which performed inference on the integrated GPU Iris
Pro Graphics P580 in half-precision floating-point format (FP16), and 6-core Core i7-6850K CPU,
which performed inference in single-precision floating-point format (FP32). Network input size was
set to 456x256, which is similar to 368x368, but with 16:9 aspect ratio, suitable for processing video
streams.
To run with OpenVINO, it is necessary to convert checkpoint to OpenVINO format:
Set OpenVINO environment variables:
source <OpenVINO_INSTALL_DIR>/bin/setupvars.sh
- Set OpenVINO environment variables:
source <OpenVINO_INSTALL_DIR>/bin/setupvars.sh
Convert checkpoint to ONNX:
python scripts/convert_to_onnx.py --checkpoint-path human-pose-estimation-3d.pth
- Convert checkpoint to ONNX:
python scripts/convert_to_onnx.py --checkpoint-path human-pose-estimation-3d.pth
python <OpenVINO_INSTALL_DIR>/deployment_tools/model_optimizer/mo.py --input_model human-pose-estimation-3d.onnx --input=data --mean_values=data[128.0,128.0,128.0] --scale_values=data[255.0,255.0,255.0] --output=features,heatmaps,pafs
- Convert to OpenVINO format:
python <OpenVINO_INSTALL_DIR>/deployment_tools/model_optimizer/mo.py --input_model human-pose-estimation-3d.onnx --input=data --mean_values=data[128.0,128.0,128.0] --scale_values=data[255.0,255.0,255.0] --output=features,heatmaps,pafs
To run the demo with OpenVINO inference, pass --use-openvino
option and specify device to infer on:
python demo.py --model human-pose-estimation-3d.xml --device CPU --use-openvino --video 0
Comments