ADE20K spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. There are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation.
A semantic understanding of visual scenes is one of the holy grails of computer vision. The emergence of large-scale image datasets like ImageNet, COCO, and Places along with the rapid development of the deep convolutional neural network (CNN) approaches, has brought great advancements to visual scene understanding.
Given a visual scene of a living room, a robot equipped with a trained CNN can accurately predict the scene category. However, to freely navigate in the scene and manipulate the objects inside, the robot has far more information to digest from the input image: it has to recognize and localize not only the objects like the sofa, table, cup, and TV but also their parts, e.g., a seat of a sofa or a handle of a cup, to allow proper manipulation, as well as to segment the stuff like floor, wall, and ceiling for spatial navigation.
Using the OpenVINO ToolkitInitializing the OpenVINO environment
The model optimizer of OpenVINO was first used to generate the IR (Intermediate Representation) files (.bin and.xml) of the model (.onnx). Then the model was deployed.
Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. Model Optimizer process assumes you have a network model trained using a supported deep learning framework.
Model Optimizer produces an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine.
Generation of IR Files.xml and .bin type files are obtained from the Model Optimizer.
- .xml - Describes the network topology
- .bin - Contains the weights and biases binary data.
Street View Segmentation
College Corridor Segmentation
Using OpenVINOToolkit - Benchmark App
The Benchmark tool was used to estimate deep learning interference performance. Performance can be measured for two inference modes:
1) Synchronous (latency-oriented)
2) Asynchronous (throughput-oriented)
Various parameters were changed and observations were made for the Benchmark tool.
The list of all the parameters and their values tested on the CPU, GPU, and both modes are given in the image below.
Snapshots of the Outputs from Benchmark Tool
Future Scope
- The ADE20K model has applications in the medical industry. With rise in surgeries done by robots, the model can be incorporated to increase the efficiency and accuracy of surgeries.
- The model can also be used by various robots used in household cleaning purposes leading to increased effectiveness in its operation.
Comments