Step 1: Making a simple assembly kit
Step 2: Capturing images for training
Step 3: Training Custom Vision models
Step 4: Installing the OpenVINO toolkit
Step 5: Using the Model Optimizer
Step 6: Using the Inference Engine
Step 7: Using ONNX Runtime with OpenVINO
Step 8: Building a desktop application
Step 9: Testing the YWIL app in action

Published September 21, 2020 © GPL3+

YWIL: the computer vision supervisor. You work, it looks

YWIL is a computer vision application for industrial manufacturing that recognizes parts and validates steps in a manual assembly process.

IntermediateFull instructions provided20 hours1,320

Grand Prize

Deep Learning Superhero Challenge

Early Submission Prizes

Deep Learning Superhero Challenge

Things used in this project

Hardware components

Intel Core i7-9750H Processor

Intel RealSense D435 Camera

Software apps and online services

Microsoft Windows 10

Microsoft Visual Studio 2019

Microsoft VS Code

Microsoft Custom Vision

Intel OpenVINO™ toolkit

Hand tools and fabrication machines

Original Prusa i3 MK2S 3D Printer

Story

The main goal of this project is to guide an operator through an assembly process and alert them when an error occurs. YWIL works as a Poka-yoke mechanism, preventing, avoiding, detecting and correcting human mistakes.

Assembly Completed Successfully

In the following sections, I will describe step by step how YWIL uses the power of Computer Vision to carry out its mission with the help of the Intel distribution of OpenVINO toolkit.

Step 1: Making a simple assembly kit

The journey begins with the simulation of an industrial manufacturing environment. In this case, I used a 3D modeling software to design a kind of plastic machine. From now on, I will refer to it as the assembly kit.

Assembly Kit 3D Model

The assembly kit consists of several pieces of different shapes and sizes. Each part will be placed in an exact position at a certain point in the assembly process.

After designing the whole mechanism, it is time to export the 3D models to files in .STL format. Then, these files are converted to G-code. Finally, we are ready to 3D printing.

3D Printing Timelapse

For this demo, I have used random colored filaments for the different parts but it would be a good challenge for the Deep Learning model if all objects were the same color.

3D Printed Parts

You can find a link to the ready-to-use files in the Custom parts section at the bottom of this page.

Step 2: Capturing images for training

The camera model I have used has depth calculation function but only RGB camera function is needed. You can use any type of standard webcam for this project.

To make training easier, all images are captured from the same point of view. Therefore, it is important to keep the camera fixed in an exact position. I have used a mini tripod about 15 centimeters high very close to the base of the assembly kit and I have placed the camera at an angle of approximately 60 degrees to frame the entire mechanism.

Camera Positioning

You can find image with the setup of the camera and the base in the Schematics section at the bottom of this page.

In the code of the project that will be explained later there is a flag to indicate that we are going to take and save images to train our Machine Learning model:

private const bool capturingFlag = true;

This C# method is in charge of converting the camera streaming into an image file and saving it in our system:

private void SaveStreamAsPngFile(Stream inputStream)

While the YWIL application is running in training mode, the captured images are saved in a folder called YWIL_YouWorkItLooks within the Pictures directory. Then, it only remains to carefully arrange the images in different folders. Each folder corresponds to a step in the process or to a label that we want to detect.

1 / 11 • Step 0 Training Image Set

Step 3: Training Custom Vision models

Since YWIL must detect custom objects, we have to train custom Computer Vision models. For this, I have used a Cognitive Service from Microsoft Azure called Custom Vision.

This platform offers three types of image recognition or labeling models:

MulticlassClassification: each image corresponds to only one label.
MultilabelClassification: multiple labels can be assigned to each image.
Object Detection: each label is delimited by a region or bounding box.

Create New Custom Vision Project

For the YWIL use case, I have chosen the General Compact Domain and Basic Platform Export Capability options. These options allow us to export the model to ONNX or TensorFlow formats and deploy it on edge devices.

All Custom Vision projects are built the same way: first, the model is chosen; then the images are uploaded and tagged or labeled; after that, the model is trained; next, it is tested and evaluated with no-labeled images; and finally, the model is exported to a ready-to-use file format.

I used the Multilabel Classification model for steps 0 to 4. This model allows YWIL to verify whether or not the part is correctly positioned, as well as to identify if the step is correct.

Classification Multilabel Project

For steps 5 and 6 I used an Object Detection model. This model allows YWIL to count holes and parts and identify objects in different position, even incorrectly placed parts.

Object Detection Project

Finally, I used a Multiclass Classification model to identify different situations in step 7. The result of this model is a single label that allows YWIL to identify whether or not the last part is positioned correctly.

Classification Multiclass Project

I captured about 1500 images of the assembly process and selected and tagged more than 1200 of them in the three Custom Vision projects. The more images used to train the models, the more accurate the results will be. Keep in mind that you can add more training images and update the models by repeating these processes at any time.

Step 4: Installing the OpenVINO toolkit

We need to download and install the Intel distribution of OpenVINO Toolkit to infer new classifications and scores on our Intel device. In my case, I have used my laptop with an Intel CPU and Windows 10.

First, go to the official Intel website. You just have to choose the operating system and fill in and submit the form.

OpenVINO Toolkit Download Form

Then, you can download one of the versions available for your operating system and choose between the Customizable Package or the Full Package.

OpenVINO Toolkit Download Page

Also, you will receive an email with a serial number, installation instructions, links to documentation and to a Getting Started Guide, etc.

OpenVINO Toolkit Download Email

After downloading the package, I followed the installation instructions for Windows. It is very easy if you do everything carefully and step by step.

OpenVINO Toolkit Installation

At the end of the installation guide there are instructions to run the Image Classification Verification Script and the Inference Pipeline Verification Script. These two tests are used to verify that everything works fine.

Inference Pipeline Verification Script Result

If you followed all the instructions and tried other samples and demos, you will have noticed that whenever we open a new CLI session we must configure the system environment variables.

C:\Program Files (x86)\IntelSWTools\openvino\bin\setupvars.bat

To avoid having to do that every time we want to use OpenVINO, we can permanently set OpenVINO environment variables for Windows 10 in Control Panel > System and Security > System > Advanced System Settings > Environment Variables.

Step 5: Using the Model Optimizer

To use the OpenVINO Toolkit Inference Engine with our custom models we first need to convert them to Intermediate Representation using the Model Optimizer tool. To know how to use this tool you can follow the Model Optimizer Development Guide from Intel website.

We can find the Model Optimizer tool inside the OpenVINO installation directory:

C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer

To get the Intermediate Representation of the YWIL Custom Vision models I used the exported TensorFlow versions. Each framework (ONNX, TensorFlow, Caffe, etc.) has its own tool, try to use the appropriate one and check on this page if all the layers of your model are supported. After using the Model Optimizer tool from the Windows CLI, you will get an output similar to these:

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: YWIL_Classification_Multilabel.xml
[ SUCCESS ] BIN file: YWIL_Classification_Multilabel.bin
[ SUCCESS ] Total execution time: 19.16 seconds.

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: YWIL_Object_Detection.xml
[ SUCCESS ] BIN file: YWIL_Object_Detection.bin
[ SUCCESS ] Total execution time: 25.20 seconds.

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: YWIL_Classification_Multiclass.xml
[ SUCCESS ] BIN file: YWIL_Classification_Multiclass.bin
[ SUCCESS ] Total execution time: 17.48 seconds.

It may happen that you get some errors during the optimization process. To solve most of them, you can refer to the Model Optimizer FAQ.

Step 6: Using the Inference Engine

In this section I will explain how to use the OpenVINO Inference Engine. YWIL uses the Python API but the steps to follow are very similar to the Common Workflow instructions for C++.

Create an Inference Engine Core object: IECore()
Read the Intermediate Representation: read_network()
Compile and load the network to the device: load_network()
Prepare input and output formats: input_info & outputs
Set the image input data: inputs={input: image}
Execute the inference process: infer()
Get the output of the model: result[output]

from openvino.inference_engine import IECore

inference_engine = IECore()

network = inference_engine.read_network(model=model_xml, weights=model_bin)

exec_network = inference_engine.load_network(network=network, device_name='CPU')

input = next(iter(network.input_info))
output = next(iter(network.outputs))

result = exec_network.infer(inputs={input: image})

result = result[output]

Here you can see the JSON formatted result of an inference on the Classification Multilabel model using the Python script above:

[
    {
        "label": "Step4",
        "probability": "99.8"
    },
    {
        "label": "front",
        "probability": "85.2"
    }
]

You can find a more detailed description on the Introduction to Inference Engine page.

Step 7: Using ONNX Runtime with OpenVINO

I had some compatibility issues using the Azure Custom Vision Object Detection model with Model Optimizer and Inference Engine. Therefore, I decided to use a plan B: use the ONNX Runtime from Microsoft with OpenVINO as Execution Provider.

The installation process is very simple and all the instructions can be found on the ONNX Runtime GitHub page. To use OpenVINO and our Intel CPU we just have to add this argument:

--use_openvino CPU_FP32

git clone 
    --recursive https://github.com/Microsoft/onnxruntime

cd onnxruntime

.\build.bat 
    --config RelWithDebInfo 
    --use_openvino CPU_FP32 
    --cmake_generator "Visual Studio 16 2019"

The inference process in ONNX Runtime with Python is very similar to OpenVINO, instead of the Intermediate Representation we use the ONNX model file directly:

import onnx
import onnxruntime

model = onnx.load(model_filename)

onnx.save(model, dirpath)

options = onnxruntime.SessionOptions()

session = onnxruntime.InferenceSession(dirpath, options)

input = session.get_inputs()[0].name

output = session.run(None, {input: image})

Step 8: Building a desktop application

The main part of YWIL is an application that distributes each image of the assembly process to the corresponding model to infer the Computer Vision results.

string result = await ModelInference(CommonSteps.ModelFolders[currentModel]);

switch (currentModel)
{
    case 0:
        List<Classification> multilabel = 
        JsonConvert.DeserializeObject<List<Classification>>(result);
        ProcessMultilabelResult(multilabel);
        break;
    case 1:
        List<Detection> detection = 
        JsonConvert.DeserializeObject<List<Detection>>(result);
        ProcessDetectionResult(detection);
        break;
    case 2:
        List<Classification> multiclass = 
        JsonConvert.DeserializeObject<List<Classification>>(result);
        ProcessMulticlassResult(multiclass);
        break;
    default:
        break;
}

Then, different methods process the results, verify the steps and alert of the errors. Finally, all the information is displayed in a graphical interface:

Graphic Interface Layout

1. Real time image from the camera
2. Assembly instructions and alerts
3. Visual guide and reference pictures
4. Detected objects and probabilities
5. Progress bar of the entire process

You can find a link to the YWIL GitHub page with the complete Visual Studio solution in the Code section at the bottom of this page.

***** Step 8: Python Bonus Track *****

This project was originally created for the Deep Learning Superhero Challenge. The main application was written in C# for the early submission contest but after that, I had time to fully convert it to Python.

The fundamental difference between the two applications is that the inference process is much more straightforward from Python code. In this way, the power of OpenVINO is fully exploited and better results are obtained in terms of inference times and frames per second.

The Python project has more or less the same structure as in the C# application and the well-known Tkinter library has been used to build the same graphical interface. There is a section where it is chosen to which model the image is sent for analysis:

predictions = []

if current_model == 0:
    predictions = multilabel.Infer(image)
elif current_model == 1:
    predictions = detection.Infer(image)
elif current_model == 2:
    predictions = multiclass.Infer(image)

Then, specific methods are in charge of processing the results and displaying the information on the screen:

print_currently(len(predictions))
print_detections(predictions)
print_inference(start, end)

if current_model == 0:
    process_multilabel(predictions)
elif current_model == 1:
    detections = process_detection(predictions)
    image = draw_detections(image, detections)
elif current_model == 2:
    process_multiclass(predictions)

In the 'C #' code there is no method like that one that calculates the average inference time of the last frames:

def print_inference(start, end):
    if infer_times.count == waiting_frames:
        infer_times.pop(0)
    infer_times.append((end-start)/1000)
    ms = median(infer_times)
    text = 'Last inference time: {:.3f}ms'.format(ms)
    window.inference.config(text=text)

You can find the YWIL GitHub page link with the Python application in the Code section at the end of this project.

Step 9: Testing the YWIL app in action

And finally, here comes the fun part: testing the YWIL application in real time during the assembly process of a kit.

Complete Assembly Process

***** Step 9: Python Bonus Track *****

Having a new Python with OpenVINO application requires: first, see it working, then, analyze the results and finally, write some conclusions. Let's go for it!

Complete Assembly Process (Python)

In real time we can see that the classification models take about 7 milliseconds on average to process each image with OpenVINO while the object detection model takes about 33 milliseconds on average using ONNX Runtime with OpenVINO as execution provider.

1 / 3 • Confidence and inference time: Multilabel Classification model

These records are really good when you consider that the time includes the image pre-processing, the inference process and the output post-processing. With those inference times we could reach processing speeds of up to 140 frames per second with a minimum of 30 fps. OpenVINO is really worth it, I recommend you give it a try.

It is also very important to note that the average confidence of all the models is above 95 percent. And all of this has been accomplished with just a few images in our training dataset. Imagine the possibilities of these Custom Vision models.

I hope you have enjoyed the YWIL project!

Schematics

Code

Credits

Sergio Velmay

1 project • 4 followers

I'm an architect who today works as a software developer at Bravent, Idiwork and Brintia. My passions are technology, innovation and beer.

Contact

Comments

Please log in or sign up to comment.

Awards

Grand Prize

Deep Learning Superhero Challenge

Early Submission Prizes

Deep Learning Superhero Challenge

YWIL: the computer vision supervisor. You work, it looks

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Step 1: Making a simple assembly kit

Step 2: Capturing images for training

Step 3: Training Custom Vision models

Step 4: Installing the OpenVINO toolkit

Step 5: Using the Model Optimizer

Step 6: Using the Inference Engine

Step 7: Using ONNX Runtime with OpenVINO

Step 8: Building a desktop application

Step 9: Testing the YWIL app in action

Custom parts and enclosures

YWIL assembly parts on Thingiverse

YWILL assembly model in 3DS format

Schematics

YWIL assembly base and camera setup

Code

YWIL C# code repository on GitHub

YWIL Python code repository on GitHub

Credits

Sergio Velmay

Comments

Awards

Embed the widget on your own site

YWIL: the computer vision supervisor. You work, it looks

YWIL: the computer vision supervisor. You work, it looks

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Step 1: Making a simple assembly kit

Step 2: Capturing images for training

Step 3: Training Custom Vision models

Step 4: Installing the OpenVINO toolkit

Step 5: Using the Model Optimizer

Step 6: Using the Inference Engine

Step 7: Using ONNX Runtime with OpenVINO

Step 8: Building a desktop application

Step 9: Testing the YWIL app in action

Custom parts and enclosures

YWIL assembly parts on Thingiverse

YWILL assembly model in 3DS format

Schematics

YWIL assembly base and camera setup

Code

YWIL C# code repository on GitHub

YWIL Python code repository on GitHub

Credits

Sergio Velmay

Comments

Awards

Related channels and tags