Note
Introduction
Requirement
Step 1: Step up Jetson Nano
Step 2: Setup tensorflow posenet on Jetson
Step3: Tensorflow and training pose data
Step 4: Setup WebRTC
Step 5: Building android TV App to showcase
Step 6: Printing out the case
Step 7: Demo

Team MixPose:

•

•

•

•

Published March 17, 2020 © GPL3+

MixPose

AI on the Edge streaming platform for yoga instructors and fitness coaches.

AdvancedFull instructions providedOver 1 day8,323

Artificial Intelligence of Things (AIoT): 1st Place

AI at the Edge Challenge

Things used in this project

Hardware components

NVIDIA Jetson Nano Developer Kit

Seeed Studio 7 Inch LCD Cape for Beagle Bone Black - Touch Display

NVIDIA Shield TV Pro

Webcam, Logitech® HD Pro

Software apps and online services

NVIDIA JetPack SDK

TensorFlow

Android Studio

Hand tools and fabrication machines

3D Printer (generic)

Story

MixPose Demo

Note:

We've originally named the project Fitstream, but we found out the name fitstream.com is already registered with another app so we changed our project name to MixPose. We've acquired mixpose.com on 02/10/2020, so some of these documentations are still based on the name fitstream :)

Introduction:

Streaming has been taken off like a storm for the past few years, it has become part of everyday culture and this has created an entire billion dollar industry.

Gamers streaming while playing games

While the most popular streamers are mainly gamers, it's still pretty lacking in other areas, the best thing people can really do is just posting videos on Youtube. That's really lacking the experience because there is lack of real time audience engagement like the ones you have on twitch.

So we build MixPose, we are building streaming platform to empower fitness professionals, yoga instructors and dance teachers through power of AI. The idea behind it is so that fitness professionals and yoga teachers can choose anywhere they feel comfortable, such as beaches, waterfall, outdoors or their own home. And users can watch the stream in the comfort of their own TV.

Streaming yoga on the beach

With MixPose, yoga and fitness professionals can teach classes via streaming platform, which would generate income for them. It is for the benefit and well beings for all the viewers. It also enables many places like India and Nepal, where yoga was born. China, where Tai Chi was, and is still being practiced. This way you would have Yogis teaching live classes from India, or dance teachers from Latin America, etc. This aims would benefit us all.We are building infrastructure and bringing streaming to their world. By doing so, we have targeted UN Sustainable Development GOALS 1, 3, 8, and 9, also touched upon 5 and 10.

UN Sustainable GOALS

Requirement

To build this initial prototype, we will need following items

NVIDIA Jetson Nano
7 inch LCD HDMI screen
NVIDIA Shield TV Pro (to consume the stream content)
Camera for Streamer
Camera for the Viewer.

Step 1: Step up Jetson Nano

The equipment needed is a Jetson Nano, Camera. Make sure you have at least 5v/2.5 amp power supply, between camera. Personally I've tried 2.1amp and it was not enough. Also, use the power jack over the micro usb power, this has proven to be much more stable. You first need to place a Jumper on J48, then the power jack on J25 would work. I've tried up to 5v/6amp and it was fine.

Jetson Diagram

Installation of OS Image and Jetpack can be downloaded at

https://developer.nvidia.com/embedded/jetpack

NVIDIA has already wrote down a pretty detailed guide on Jetson NANO setup, the guide can be seen at

https://courses.nvidia.com/courses/course-v1%3ADLI%2BC-RX-02%2BV1/course/

NVIDIA AI Course

Step 2: Setup tensorflow posenet on Jetson

There are currently a few AI based pose projects on Jetson, some of the featured on Jetson Community Project. We will be building a brand new one in this project that's not from there. And this one will be based on tensorflow posenet. We will be using google's original posenet model at https://github.com/tensorflow/tfjs-models/tree/master/posenet

The other reason being we can easily use posenet on Android devices for user consumption.

First, we will install libraries needed

pip3 install tensorflow-gpu scipy pyyaml
pip3 install opencv-python==3.4.5.20

From here you can clone my project

git clone https://github.com/Nyceane/fitstream-jetson-nano.git
cd fitstream-jetson-nano
python3 posenet_tensor_test.py

We will go through the code a little bit here, we are basically passing the image inside, then going through the posenet model by doing posenet.decode_multi.decode_multiple_poses to get the coordinates, then overlaying the image back to go through cv2.imshow('posenet', overlay_image)

while True:
input_image, display_image, output_scale = posenet.read_cap(
cap, scale_factor=args.scale_factor, output_stride=output_stride)
heatmaps_result, offsets_result, displacement_fwd_result, displacement_bwd_result = sess.run(
model_outputs,
feed_dict={'image:0': input_image}
)
pose_scores, keypoint_scores, keypoint_coords = posenet.decode_multi.decode_multiple_poses(
heatmaps_result.squeeze(axis=0),
offsets_result.squeeze(axis=0),
displacement_fwd_result.squeeze(axis=0),
displacement_bwd_result.squeeze(axis=0),
output_stride=output_stride,
max_pose_detections=10,
min_pose_score=0.15)
keypoint_coords *= output_scale
# TODO this isn't particularly fast, use GL for drawing and display someday...
overlay_image = posenet.draw_skel_and_kp(
display_image, pose_scores, keypoint_scores, keypoint_coords,
min_pose_score=0.15, min_part_score=0.1)
cv2.imshow('posenet', overlay_image)
frame_count += 1
if cv2.waitKey(1) & 0xFF == ord('q'):
break

This portion will test out the tensorflow part of the project, and if all successful you should have something like this

Posenet test

So now that we have posenet running, we would need the 2nd inference to get the pose through object detection method, and we do this via getting the stick figure through posenet going through all of our images.

We can first use test it out via

$ python3 image_test.py

Which would give us results to see the skeleton figure of our yoga pose

$ python3 image_convert.py

in posenet util.py, we've modified the file so we only print out the skeletons this way.

def draw_skel_and_kp_figureonly(
img, instance_scores, keypoint_scores, keypoint_coords,
min_pose_score=0.5, min_part_score=0.5):
out_img = np.zeros((img.shape[0],img.shape[1] ,img.shape[2]), np.uint8)
adjacent_keypoints = []
cv_keypoints = []
for ii, score in enumerate(instance_scores):
if score < min_pose_score:
continue
new_keypoints = get_adjacent_keypoints(
keypoint_scores[ii, :], keypoint_coords[ii, :, :], min_part_score)
adjacent_keypoints.extend(new_keypoints)
for ks, kc in zip(keypoint_scores[ii, :], keypoint_coords[ii, :, :]):
if ks < min_part_score:
continue
cv_keypoints.append(cv2.KeyPoint(kc[1], kc[0], 10. * ks))
out_img = cv2.drawKeypoints(
out_img, cv_keypoints, outImage=np.array([]), color=(255, 255, 0),
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
out_img = cv2.polylines(out_img, adjacent_keypoints, isClosed=False, color=(255, 255, 0))
return out_img

This gives us the direct skeleton figure we need to train our networks.

1 / 2

Now we have skeleton figure training files, we can use labelImg from https://github.com/tzutalin/labelImg to label our images

1 / 2 • labelImg

Once training and testing data are done, we can move onto next step.

Step3: Tensorflow and training pose data

We will be diving up Pose to following 8 poses, more can be added in the future. But for the demo, we will be focusing on Tree and Warrior Two Pose, as others are kind of difficult to train at the moment.

Tree
Warrior One
Warrior Two
Downward dog
Child
Plank
Triangle
Bridge

There are two ways to do this, we can either train it on Jetson Nano itself or use a desktop/server. Following guide will work on both, but if you are gona train on the server Cuda 10.0 and cuDNN will be needed for training. Jetpack SDK already have Cuda Toolkit and cuDNN on Jetson Nano, making it easier to deploy.

After Jetpack being installed we can follow instructions via https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html

Install system packages required by TensorFlow:

$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
Install system packages required by TensorFlow:
$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev

Install and upgrade pip3.

$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip testresources setuptools

Install the Python package dependencies.

$ sudo pip3 install -U numpy==1.16.1 future==0.17.1 mock==3.0.5 h5py==2.9.0 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 enum34 futures protobuf

After that you can install tensorflow via

$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v43 'tensorflow==1.15.0'

After that you can test the tensorflow installed successfully via

$ python3
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, Tensorflow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))

To train the object detection model we need tensorflow model example and and install following dependencies

$ mkdir workspace
$ cd workspace
$ mkdir tensorflow1
$ cd tensorflow1
$ git clone https://github.com/tensorflow/models.git
$ sudo python3 -m pip install --upgrade pip
$ sudo pip3 install pillow 
$ sudo pip3 install lxml
$ sudo pip3 install Cython
$ sudo pip3 install contextlib2
$ sudo pip3 install jupyter
$ sudo pip3 install matplotlib
$ sudo pip3 install pandas
$ sudo pip3 install pycocotools
$ sudo pip3 install absl-py
$ sudo apt-get install python-opencv

Set the PYTHONPATH environment variable

$ export PYTHONPATH=$PYTHONPATH:~/workspace/tensorflow1/models:~/workspace/tensorflow1/models/research:~/workspace/tensorflow1/models/research/slim
$ export PATH=$PATH:PYTHONPATH
$ cd ~/workspace/tensorflow1/model/research
$ python setup.py build
$ python setup.py install

We can test the default model by going into object_detection folder

$ cd ~/workspace/tensorflow1/model/research/object_detection
$ jupyter notebook object_detection_tutorial.ipynb

When all done, you will have following running.

Tensorflow running

ssd_mobilenet_v3_large_coco

We will also use ssd_mobilenet_v3_large_coco from model zoo, this is small enough to load into memory on AI on the edge.

http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz

extract that into object_detection folder

Once this is all done, we will Generating training data by entering command under object detection folder. Put all your images that you did last step into /images/train and /images/test folder then do

$ python3 xml_to_csv.py

This creates a csv file for all the bounding objects on both training and testing files

next we will change generate_tfrecord to our own label classes. This will be automatically included if you just get it from repo. if you have more classes, you can add it from line 31

"""
Usage:
# From tensorflow/models/
# Create train data:
python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record
# Create test data:
python generate_tfrecord.py --csv_input=images/test_labels.csv  --image_dir=images/test --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_importpython3 xml_to_csv.py
import os
import io
import pandas as pd
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict
flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('image_dir', '', 'Path to the image directory')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS
# TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'tree':
return 1
elif row_label == 'warriortwo':
return 2
elif row_label == 'warriorone':
return 3
elif row_label == 'child':
return 4
elif row_label == 'plank':
return 5
elif row_label == 'triangle':
return 6
elif row_label == 'bridge':
return 7
elif row_label == 'downwarddog':
return 8
else:
print(row_label)
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []ssd_mobilenet_v3_large_coco
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
path = os.path.join(os.getcwd(), FLAGS.image_dir)
examples = pd.read_csv(FLAGS.csv_input)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close()
output_path = os.path.join(os.getcwd(), FLAGS.output_path)
print('Successfully created the TFRecords: {}'.format(output_path))
if __name__ == '__main__':
tf.app.run()

After that we can launch the python file via and get the messd_mobilenet_v3_large_cocossage

$ python3 generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record
Successfully created the TFRecords: /home/ai/workspace/tensorflow1/models/research/object_detection/train.record

$ python3 generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=test.record
Successfully created the TFRecords: /home/ai/workspace/tensorflow1/models/research/object_detection/test.record

next we will create a labelmap.pbtxt under training folder, to map the labels and ids.

item {
id: 1
name: 'tree'
}
item {
id: 2
name: 'warriortwo'
}
item {
id: 3
name: 'warriorone'
}
item {
id: 4
name: 'child'
}
item {
id: 5
name: 'plank'
}
item {
id: 6
name: 'triangle'
}
item {
id: 7
name: 'bridge'
}
item {
id: 8
name: 'downwarddog'
}

Next we will follow the similar config file to training/ssdlite_mobilenet_v3_large_320x320_coco.config

Line 14. Change num_classes to the number of different objects you want the classifier to detect. For the above basketball, shirt, and shoe detector, it would be num_classes : 6.

Line 164. Change fine_tune_checkpoint to:

fine_tune_checkpoint : "/home/ai/workspace/tensorflow1/models/research/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14/model.ckpt"

Lines 187 and 189. In the train_input_reader section, change input_path and label_map_path to:

input_path : "/home/ai/workspace/tensorflow1/models/research/object_detection/train.record"
label_map_path: "/home/ai/workspace/tensorflow1/models/research/object_detection/training/labelmap.pbtxt"

Line 198. remove num_examples as it checks entire folder

Lines 198 and 200. In the eval_input_reader section, change input_path and label_map_path to:

input_path : "/home/ai/workspace/tensorflow1/models/research/object_detection/test.record"
label_map_path: "/home/ai/workspace/tensorflow1/models/research/object_detection/training/labelmap.pbtxt"

# SSDLite with Mobilenet v3 large feature extractor.
# Trained on COCO14, initialized from scratch.
# 3.22M parameters, 1.02B FLOPs
# TPU-compatible.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 8
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
use_depthwise: true
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v3_large'
min_depth: 16
depth_multiplier: 1.0
use_depthwise: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 10
max_total_detections: 10
use_static_shapes: true
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 3
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 32
num_steps: 400000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
fine_tune_checkpoint: "/home/airig/workspace/models/research/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14/model.ckpt"
fine_tune_checkpoint_type:  "detection"
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.4
total_steps: 400000
warmup_learning_rate: 0.13333
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 10
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/airig/workspace/models/research/object_detection/train.record"
}
label_map_path: "/home/airig/workspace/models/research/object_detection/training/labelmap.pbtxt"
}
eval_config: {
num_examples: 47
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/airig/workspace/models/research/object_detection/test.record"
}
label_map_path: "/home/airig/workspace/models/research/object_detection/training/labelmap.pbtxt"
shuffle: false
num_readers: 1
}

When all ready, we can start training our model by doing following command. Since train.py Is deprecated and running lots of issues, we'd be using model_main instead.

This is just to show the capability of Jetson Nano, if you feel training on Jetson Nano is too slow, you can use a more powerful GPU machine on Linux with almost exact same method. Jetpack has already pre-installed CUDA and cuDNN, in case if you want to do it on a server, you just need to download CUDA 10.0 and cuDNN v7.6.4 for CUDA 10.0 as well as drivers for your graphics card.

$ python3 model_main.py --logtostderr --model_dir=training --pipeline_config_path=training/ssdlite_mobilenet_v3_large_320x320_coco.config

Training

Additionally, you can check the training progress via tensorboard to check the progress by opening another terminal

$ cd ~/workspace/tensorflow1/models/research/object_detection
$ tensorboard --logdir=training
TensorBoard 1.15.0 at http://ai:6006/

Tensorboard

The generated graph png from tensorboard for ssdlite is

When you check image tab, you should be able to see below while training, this gives you a better idea on what's going on. Since we are only checking images on top of box, it will be pretty clean.

Image train

Once model is trained we will run the following command to get the inference_graph, where XXX is the lastest number. I've personally trained the entire 400000 iterations.

$ python3 export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssdlite_mobilenet_v3_large_320x320_coco.config --trained_checkpoint_prefix training/model.ckpt-XXX --output_directory inference_graph

The frozen model should now be saved in inference_graph folder. We can test the model by running following message and you should see the following.

Simply run python3 Object_detection_image.py and you should get following

Combine that with what we've done before, we can launch

$ python3 skeleton_tracker.py

Skeleton tracking and detection

Since the screen is relatively small, we can use the 640x480 to get a consistency rate. 720p is a bit slow in this case. When all is done, we can do following code

We are doing Double Inference, So let me do a little bit of explanations here, we first inference the image through posenet to get the skeleton, then output that into an image. We then inference that skeleton image via SSDLite, which gives us the frames. After that we merge on top of the image to get both posenet and

while(True):
input_image, display_image, output_scale = posenet.read_cap(
video, scale_factor=args.scale_factor, output_stride=output_stride)
heatmaps_result, offsets_result, displacement_fwd_result, displacement_bwd_result = sess.run(
model_outputs,
feed_dict={'image:0': input_image}
)
pose_scores, keypoint_scores, keypoint_coords = posenet.decode_multi.decode_multiple_poses(
heatmaps_result.squeeze(axis=0),
offsets_result.squeeze(axis=0),
displacement_fwd_result.squeeze(axis=0),
displacement_bwd_result.squeeze(axis=0),
output_stride=output_stride,
max_pose_detections=10,
min_pose_score=0.15)
keypoint_coords *= output_scale
# TODO this isn't particularly fast, use GL for drawing and display someday...
skeleton_frame = posenet.draw_skel_and_kp_figureonly(
display_image, pose_scores, keypoint_scores, keypoint_coords,
min_pose_score=0.15, min_part_score=0.1)
#cv2.imshow('posenet', overlay_image)
# Acquire frame and expand frame dimensions to have shape: [1, None, None, 3]
# i.e. a single-column array, where each item in the column has the pixel RGB value
#ret, frame = video.read()
frame_expanded = np.expand_dims(skeleton_frame, axis=0)
# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: frame_expanded})
# Draw the results of the detection (aka 'visulaize the results')
vis_util.visualize_boxes_and_labels_on_image_array(
skeleton_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=6,
min_score_thresh=0.75)
combined = cv2.add(display_image, skeleton_frame)
# All the results have been drawn on the frame, so it's time to display it.
cv2.imshow('Skeleton Tracker', combined)

We can run the following command

python3 mixpose.py

Step 4: Setup WebRTC

Streamers need a WebRTC platform to stream from. In our case, we can use tokbox to handle the traffic for the prototype. The full documentation is located at https://tokbox.com/developer/sdks/python/. we will go through this a little bit.

We will then need to install opentok>=2.10.0 and other libraries, so we will do that via

sudo pip3 install -r requirements.txt

Next we will be building web RTC so we can check our water quality the same way we check our water sources remotely. In this article we will be using tokbox, you can create an account in https://www.tokbox.com. Create a custom project under OpenTok API to get API Key and secret.

To make this project easier we can create a SessionId under the project. we will be using the SessionId for both parts of the project. Broadcasting and receiving.

After that we can use following code, it's based on tokbox's Python PI

from flask import Flask, render_template
from opentok import OpenTok
import webbrowser
import os
try:
api_key = "[key]"
api_secret =  "[secret]"
except Exception:
raise Exception('You must define API_KEY and API_SECRET environment variables')
app = Flask(__name__)
opentok = OpenTok(api_key, api_secret)
@app.route("/")
def hello():
key = api_key
session_id = "[session_id_you_just_got]"
token = opentok.generate_token(session_id)
return render_template('index.html', api_key=key, session_id=session_id, token=token)
if __name__ == "__main__":
app.debug = True
app.run()
print('doh')
url = "http://127.0.0.1:5000/"
webbrowser.open(url)
from flask import Flask, render_template
from opentok import OpenTok
import webbrowser
import os
try:
api_key = "[key]"
api_secret =  "[secret]"
except Exception:
raise Exception('You must define API_KEY and API_SECRET environment variables')
app = Flask(__name__)
opentok = OpenTok(api_key, api_secret)
@app.route("/")
def hello():
key = api_key
session_id = "[session_id_you_just_got]"
token = opentok.generate_token(session_id)
return render_template('index.html', api_key=key, session_id=session_id, token=token)
if __name__ == "__main__":
app.debug = True
app.run()
print('doh')
url = "http://127.0.0.1:5000/"
webbrowser.open(url)

Run the code and the enter water flow will be streamed

$ python3 broadcast.py

Step 5: Building android TV App to showcase

In this step we will be building an Android TV app for the viewers, just so that the viewers can stream through the content. You can use NVIDIA Shield TV to get this done. Since this article is focused on NVIDIA Jetson NANO, we will be going through this part rather quickly.

Android TV original UI is fairly easy to build, for the demo purposes we will just add in some static data, but these are subjected to change once the app goes live.

Android USB Camera cant work with native library

Since camera API does not work with Android, we'd need separate library to get usb camera running. The AndroidUSBCamera has done a great workaround for it. You can find it via

https://github.com/jiangdongguo/AndroidUSBCamera

Add the JitPack repository to your build file.Add it in your root build.gradle at the end of repositories:

allprojects {
		repositories {
			...
			maven { url 'http://raw.github.com/saki4510t/libcommon/master/repository/' }
			maven { url 'https://jitpack.io' }
		}
	}

Add the dependency

dependencies {
	implementation 'com.github.jiangdongguo:AndroidUSBCamera:2.3.2'
}

This would give us UVCCameraTextureView that we can work with

<com.serenegiant.usb.widget.UVCCameraTextureView
    android:id="@+id/camera_view"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    />

This would give us the camera view on Android TV, next is embedding Tensorflow AI onto Android Device itself. In this case we are using fritz.ai to get this running. We are working on getting Java version of PoseNet working on Android as well in the near future. Easiest way is to just run this and run the Android application, as a default you should be able to see following

$ git clone https://github.com/Nyceane/mixpose_jetson

Connection to TokBox that you've created previous step can be implemented via TokBox SDK.

public static final String API_KEY = "API_KEY";
public static final String SESSION_ID = "SessionId";
public static final String TOKEN = "Tokn";

Following these steps will provide you fully streaming experience like following

MixPose TV

Once those are done, we can use the pose we trained earlier to compare and score, this way users can be engaged with one another.

MixPose Leaderboard

Step 6: Printing out the case

Right now we have a viable prototype that goes from end to end. Next we need the product to look decent, so we can first 3D print a case for our screen and anchor.

3D printing screen enclosure

After everything is printed we should have following items for the screen. The screen cover, buttom, and holder. We will make another case for Jetson Nano.

3D printed case with screen.

When the screen part is all done, we will have a nicely looking screen case showed in following.

3D printed case for the LCD screen

We've also customized the case for Jetson Nano so it can fit in the back of the screen.

When printed, the back of the case would be modified so it can fit

Jetson Nano Case

When all completed, it would look something like this.

MixPose

Step 7: Demo

Now that all is done, we found a real yoga teacher to do our demo and validate our ideas.

Serena doing demo

Code

mixpose.py

# Import packages
import os
import cv2
import numpy as np
import tensorflow as tf
import sys
import time
import argparse
import posenet

parser = argparse.ArgumentParser()
parser.add_argument('--model', type=int, default=101)
parser.add_argument('--cam_id', type=int, default=0)
parser.add_argument('--cam_width', type=int, default=640)
parser.add_argument('--cam_height', type=int, default=480)
parser.add_argument('--scale_factor', type=float, default=0.7125)
parser.add_argument('--file', type=str, default=None, help="Optionally use a video file instead of a live camera")
args = parser.parse_args()

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

# Import utilites
from utils import label_map_util
from utils import visualization_utils as vis_util

# Name of the directory containing the object detection module we're using
MODEL_NAME = 'inference_graph'

# Grab path to current working directory
CWD_PATH = os.getcwd()

# Path to frozen detection graph .pb file, which contains the model that is used
# for object detection.
PATH_TO_CKPT = os.path.join(CWD_PATH,MODEL_NAME,'frozen_inference_graph.pb')

# Path to label map file
PATH_TO_LABELS = os.path.join(CWD_PATH,'training','labelmap.pbtxt')

# Number of classes the object detector can identify
NUM_CLASSES = 6

## Load the label map.
# Label maps map indices to category names, so that when our convolution
# network predicts `5`, we know that this corresponds to `king`.
# Here we use internal utility functions, but anything that returns a
# dictionary mapping integers to appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

# Load the Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

    sess = tf.Session(graph=detection_graph)
    model_cfg, model_outputs = posenet.load_model(args.model, sess)
    output_stride = model_cfg['output_stride']
    start = time.time()
    frame_count = 0
# Define input and output tensors (i.e. data) for the object detection classifier

# Input tensor is the image
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

# Output tensors are the detection boxes, scores, and classes
# Each box represents a part of the image where a particular object was detected
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

# Each score represents level of confidence for each of the objects.
# The score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')

# Number of objects detected
num_detections = detection_graph.get_tensor_by_name('num_detections:0')

# Initialize webcam feed
video = cv2.VideoCapture(0)
ret = video.set(3,640)
ret = video.set(4,480)

while(True):

    input_image, display_image, output_scale = posenet.read_cap(
        video, scale_factor=args.scale_factor, output_stride=output_stride)

    heatmaps_result, offsets_result, displacement_fwd_result, displacement_bwd_result = sess.run(
            model_outputs,
            feed_dict={'image:0': input_image}
    )

    pose_scores, keypoint_scores, keypoint_coords = posenet.decode_multi.decode_multiple_poses(
            heatmaps_result.squeeze(axis=0),
            offsets_result.squeeze(axis=0),
            displacement_fwd_result.squeeze(axis=0),
            displacement_bwd_result.squeeze(axis=0),
            output_stride=output_stride,
            max_pose_detections=10,
            min_pose_score=0.15)

    keypoint_coords *= output_scale

    # TODO this isn't particularly fast, use GL for drawing and display someday...
    skeleton_frame = posenet.draw_skel_and_kp_figureonly(
            display_image, pose_scores, keypoint_scores, keypoint_coords,
            min_pose_score=0.15, min_part_score=0.1)

    #cv2.imshow('posenet', overlay_image)
    
    # Acquire frame and expand frame dimensions to have shape: [1, None, None, 3]
    # i.e. a single-column array, where each item in the column has the pixel RGB value
    #ret, frame = video.read()
    frame_expanded = np.expand_dims(skeleton_frame, axis=0)

    # Perform the actual detection by running the model with the image as input
    (boxes, scores, classes, num) = sess.run(
        [detection_boxes, detection_scores, detection_classes, num_detections],
        feed_dict={image_tensor: frame_expanded})

    # Draw the results of the detection (aka 'visulaize the results')
    vis_util.visualize_boxes_and_labels_on_image_array(
        skeleton_frame,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=6,
        min_score_thresh=0.75)

    combined = cv2.add(display_image, skeleton_frame)


    # All the results have been drawn on the frame, so it's time to display it.
    cv2.imshow('Skeleton Tracker', combined)

    
    frame_count += 1
    # Press 'q' to quit
    if cv2.waitKey(1) == ord('q'):
        break
    print('Average FPS: ', frame_count / (time.time() - start))

# Clean up
video.release()
cv2.destroyAllWindows()