I.R.I.S. : Immersive Region Image Segmentation

An image semantic segmentation model for finding unique features, objects in a given scene

IntermediateFull instructions provided12 hours747

Early Submission Prizes

Deep Learning Superhero Challenge

I.R.I.S. : Immersive Region Image Segmentation

Things used in this project

Hardware components

Intel(R) Core (TM) i7-9750H CPU @ 2.60GHz, 2600 MHz, 4 Core(s), 8 Logical Processor(s)

Intel(R) UHD Graphics 630 GPU

Web Camera

Software apps and online services

Intel OpenVINO™ toolkit

TensorFlow

OpenCV

Python 3.6.5

Story

ADE20K spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. There are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation.

A semantic understanding of visual scenes is one of the holy grails of computer vision. The emergence of large-scale image datasets like ImageNet, COCO, and Places along with the rapid development of the deep convolutional neural network (CNN) approaches, has brought great advancements to visual scene understanding.

Given a visual scene of a living room, a robot equipped with a trained CNN can accurately predict the scene category. However, to freely navigate in the scene and manipulate the objects inside, the robot has far more information to digest from the input image: it has to recognize and localize not only the objects like the sofa, table, cup, and TV but also their parts, e.g., a seat of a sofa or a handle of a cup, to allow proper manipulation, as well as to segment the stuff like floor, wall, and ceiling for spatial navigation.

A living room with semantic segmentation

Using the OpenVINO Toolkit

Initializing the OpenVINO environment

Setting up the Environment variables

The model optimizer of OpenVINO was first used to generate the IR (Intermediate Representation) files (.bin and.xml) of the model (.onnx). Then the model was deployed.

Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. Model Optimizer process assumes you have a network model trained using a supported deep learning framework.

Model Optimizer produces an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine.

Generation of IR Files

.xml and .bin type files are obtained from the Model Optimizer.

.xml - Describes the network topology
.bin - Contains the weights and biases binary data.

The Webcam was used as inputs for two scenes. The video demonstration of both the scenes have been attached below for reference.

Street View Segmentation

Street View - INPUT Video Clip

Street View - OUTPUT Video Clip

College Corridor Segmentation

College Corridor Segmentation - Laptop Web Camera

Command prompt : Output

Using OpenVINOToolkit - Benchmark App

The Benchmark tool was used to estimate deep learning interference performance. Performance can be measured for two inference modes:

1) Synchronous (latency-oriented)

2) Asynchronous (throughput-oriented)

Various parameters were changed and observations were made for the Benchmark tool.

The list of all the parameters and their values tested on the CPU, GPU, and both modes are given in the image below.

Parameters of the Output

Snapshots of the Outputs from Benchmark Tool

Benchmark Values

Future Scope

The ADE20K model has applications in the medical industry. With rise in surgeries done by robots, the model can be incorporated to increase the efficiency and accuracy of surgeries.
The model can also be used by various robots used in household cleaning purposes leading to increased effectiveness in its operation.