Team OpenVI:

•

•

•

Published November 30, 2023

OpenVI - Open toolkit for Computer Vision

Open source No-code platform for computer vision. Effortlessly create, train, and deploy AI pipelines and solutions.

IntermediateFull instructions providedOver 1 day530

OpenVI - Open toolkit for Computer Vision

Things used in this project

Hardware components

Raspberry Pi Zero Wireless

8-50mm zoom lens

Raspberry Pi High Quality Camera Case

Micro-USB to USB Cable (Generic)

Raspberry Pi High Quality Camera

Software apps and online services

Raspberry Pi Raspbian

OpenCV – Open Source Computer Vision Library OpenCV

Hand tools and fabrication machines

3D Printer (generic)

Story

I. Introduction

AI is a hot topic these days and it can have many applications in various domains in life. However, applying this technology in real life still encounters some obstacles. The lack of AI talents is such an example of that obstacle. To build an AI solution, usually, we need someone, if not a team of data scientists, working over some time. Most of these talented people have been absorbed by big technology and financial service companies, so smaller companies find it difficult to use this advanced technology. Another thing that makes AI not for everyone is the cost of developing an AI solution. Hiring AI experts costs a lot of money; data preparation and data labeling are time-consuming, and model training requires powerful servers. Small and medium companies usually cannot afford these costs and hence can’t apply AI in their businesses.

No-code AI aims to democratize AI, which eliminates the above limits. Using no-code development platforms, users can experiment with different AI models and apply them to specific use cases without extensive technical or programming skills. In no-code AI, users interact with a visual, code-free platform and often use a drag-and-drop interface to deploy AI and machine learning models.

No-code AI enables non-technical users, such as those in SME companies, to try their ideas quickly using AI/ML technology.

Our OpenVI solution provides a no-code tool and reference software/hardware designs for software engineers to build better AI solutions without expert knowledge.

Our project focuses on three main features:

AI Model Trainer: AI trainer for computer vision machine learning.
AI Pipeline Builder: Build computer vision flow with drag-drop and no code.
VI Open Hardware: Edge Device with Raspberry Pi Zero and Raspberry PI HQ Camera.

Architecture Overview

Architecture

Our architecture consists of four main components: an HD Camera, an edge device (Raspberry Pi Zero), an AI model trainer, and an Image Processing Flow.

AI Classification and Object Detection Trainer

AI Trainer: The AI Trainer block offers two prototypes: Image classification (with an auto heatmap feature) and Object detection (with required labeling). In this phase, we implement a basic pipeline for classification and detection models suitable for edge devices, such as NanoDet, Resnet using Pytorch frameworks. In the future, we plan to open our solution for contributions from the community. As part of this project, we plan to develop a user-friendly Training UI. This interface will allow end users to verify the information of data (the number of train/val/test images and their labels), record training logs, visualize graphs, and manage experiments. It will also support checking data formats and converting data labels to meet input format requirements, such as COCO to Yolo format, and vice versa. The labeling tool is a part of the AI trainer block, we modified AnyLabeling tool with improved UI based on DearPyGui. This tool focuses on solving the labeling task efficiently and with a straightforward approach. The model conversion feature allows users to convert their AI models into the ONNX/TFLite format. This feature aims to enhance the flexibility and compatibility of AI model deployment by enabling models to be used across different frameworks and platforms. By integrating DearPyGui as the preferred GUI framework, the model conversion feature not only enables seamless conversion of AI models but also guarantees a user-friendly and delightful experience.
Image Processing Flow Builder: To enhance the user experience, we will develop a Flow Executor with OpenCV, integrated with the popular DearPyGui library, a modern, fast, and powerful GUI framework for Python. This component provides a user-friendly interface for users to design and customize their image processing flows. With the help of OpenCV, a powerful computer vision library, users can effortlessly perform tasks such as image filtering, feature extraction, and object tracking. The integration with DearPyGui ensures a seamless and visually appealing user experience, enabling users to interact with the platform's functionalities conveniently
Open Hardware: Edge Device: The edge device inference block includes inference code that enables the use of classification and detection models on a Raspberry Pi Zero W device, along with an inference UI. Users can choose a high-performing model from the AI Trainer's model conversion process and deploy it on the Raspberry Pi Zero W device. Moreover, a simple OpenCV UI for inference feature will be released to support non-technical users.

II. Auto trainer

OpenVI Auto Trainer currently supports two models:

Image classification: https://github.com/openvi-team/image-classification
Object detection with Nanodet: https://github.com/openvi-team/object-detection

We use Docker to run the training jobs. The training can be triggered from AI Auto Train tab on the UI.

Training UI for new models.

Let's start to train a classification model:

Step 1: Define your project

1.1 Create a "Project Name" and type some project information into "Description".
1.2 Click on "Open Project" to set your project path.
1.3 Confirm by clicking "OK" button before transit to settings interface.

Step 2: Model selection

You can choose what type of model you want to train by click into the box under "Model" item. Currently, AI Auto Train tab support classification and object detection tasks.

Step 3: Dataset

3.1 Set the path to your data by clicking the "Browse" button under the "Dataset".
3.2 Click "OK" to confirm your selection

Step 4: Training configs

We show default information of Num class, number of Epoch, Batch size, Learning rate, and Image size. You can choose the relevant box and adjust the parameters to suit your preferences.

Step 5: Training

Double-check your selections to ensure they are correct, then click 'Start Training' and wait for the results.
Note that: While training is in progress, we present accuracy and training information beneath the "training" item. We also offer a 'Stop Training' option to help you halt the process early if you need to make changes.

Step 6: Results

When the training log reports "Training finished" it means the training is complete. Let's click on the "Refresh" button below "Trained models" to view the summary of results. The trained models will be used in the next module of OpenVI tools - Computer Vision flow builder.

The video below explains how users interact with the AI Auto Train tab to train an image classification model.

OpenVI - AI Auto Trainer - Classification Model Training

III. Computer vision flow builder

The computer vision flow builder of the OpenVI project was based on the Image Processing Node Editor project (https://github.com/Kazuhito00/Image-Processing-Node-Editor) - KazuhitoTakahashi. This editor leverages the node editor in DearPyGui (the UI library) and the power of OpenCV to build a studio environment for image processing. Besides the basic image processing components, we also have deep learning components for model integration.

In flow builder, correspond to the flow of doing a deep learning project, we have input nodes -> image processing nodes -> deep learning models -> showing results and then export the flow for later uses. By using this tools, user can easily get familiar and get hands on deep learning process without any coding knowledge. This resolve the No-code AI issue.

In OpenVI, we developed a flow executor for Raspberry Pi (embedded machine). Once the flow is built and tested with the flow builder on a PC or laptop, we can use the deployment feature to push and execute the flow on a Raspberry Pi.

Building computer vision flow with Flow builder - OpenVI.

After successfully training the AI model, the user aims to deploy it to the device. OpenVI offers a Pipeline Builder that streamlines and simplifies this process.

Pipeline Builder Interface

Step 1: Pipeline Builder

Click on "Pipeline Builder" to switch interface

Step 2: Input Image

2.1 Create Input by clicking on InputNode then choose Image option

2.2 Upload input image: Click on "Select Image" button, add the image path then select "OK" to load image.

Step 3: (Optional) Process Node

This node is used to preprocess the image before entering the Deep Learning model. Some supported functions include: Crop, Flip, Grayscale, Blur, Resize, .... Note: The input image of this node is in BGR format.

Step 4: Deep Learning Node

This is the node containing AI models, you can choose Object detection or Classification.

4.1 Here I will choose the Classification model that I just finished training in part II Auto Train. After clicking Classification, a box representing the classification block appears on the interface as show above Pipeline Builder image, you can arbitrarily drag and drop for convenience.

4.2 Some classification models supported by OpenVI include: Resnet18, Resnet34, Resnet50, MobilenetV2. I choose the default Resnet 18 model with CPU settings. You can select GPU if your computer has this device

Step 5: Connect Node

Connect nodes in flow order and see visual results on the interface.

AI Classification Pipeline Builder

IV. Open hardware: Edge device with Raspberry Pi

Our 3D case design for Rapsberry Pi Zero + Camera.

Raspberry Pi Zero + Raspberry Pi HQ Camera with our 3D case.

To install Raspberry Pi, download the Raspberry Pi Imager and flash your USB. Follow this blog post to setup your Raspberry Pi. https://raspberrytips.com/install-raspbian-raspberry-pi/. Then connect the Raspberry Pi with a LCD display.

To setup Pi HD Camera as the abobe figure, please follow the post here. To check the Pi Camera working, run this command in the Terminal:

raspivid -o ~/Desktop/video.h264 -t 10000

In the above section, we played with Image Processing Flow (Input image, image process, deep learning nodes, etc.) on the Laptop and then setup open hardware with Raspberry Pi Zero and Pi HD Camera. Now, let's connect them together: We want to deploy the exported image processing flow into the Raspberry Pi Zero (our laptop and the Pi must be on the same network).

Step 1: To do this, open a Project, select the Deployment tab, and fill out the required fields.

IP address: IP of the Raspberry Pi (ex: 172.20.10.3)
Port: the port of the socket server run on Pi, which we use to handle deployment request (ex: 12345)
Flow file: The exported image processing flow file (ex: export.json)

Step 2: Click on "Try connection" to check if the socket server is running on Pi

Step 3: Click on "Deployment" to deploy the exported file.

Deployment from Laptop to Raspberry Pi

By clicking on "Deployment", our laptop sends the exported file to Pi.

def socket_client(host, port, zip_path):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# host = socket.gethostname() # Get local machine name
# host = '172.20.10.3
# port = 12345                 # Reserve a port for your service.
logs = []
s.connect((host, port))
f = open(zip_path,'rb')
print ('Sending...')
logs.append("Sending ...")
l = f.read(1024)
while (l):
print ('Sending...')
logs.append("Sending ...")
s.send(l)
l = f.read(1024)
f.close()
print ("Done Sending")
logs.append("Done Sending")
s.close                     # Close the socket when done
return logs

The socket server running on Pi receives the export.json file and replace the old export.json file in the Pi.

import socket               # Import socket module
import os
import zipfile
import shutil
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)         # Create a socket object
#host = socket.gethostname() # Get local machine name
host = '0.0.0.0'
port = 12345 # Reserve a port for your service.
s.bind((host, port))        # Bind to the port
s.listen(5)                 # Now wait for client connection.
zip_name = 'socket_server.zip'
while True:
c, addr = s.accept()     # Establish connection with client.
with c:
f = open(zip_name,'wb')
print ('Got connection from', addr)
l = c.recv(1024)
if (l==b''):
print ("Receive test connection signal!")
continue
while (l):
print ("Receiving...")
f.write(l)
l = c.recv(1024)
f.close()
print ("Done Receiving")
# c.send('Thank you for connecting')
c.close()                # Close the connection
# Handle the received .zip file
base_path = os.path.abspath(__file__)
directory_to_extract_to = os.path.join(base_path, '..', '..', 'node_editor', 'setting')
with zipfile.ZipFile(zip_name, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
# shutil.unpack_archive(zip_name, directory_to_extract_to)
print ("Done extract received file!")

socket server listen to export.json file and update the old config.

On Raspberry Pi, a program is looping load the export.json file, capture the image from the Pi Camera, and process the captured image following the export.json file.

import cv2
import json
from pi.image_processing_handler import image_processing
from pi.deep_learning_handler import deep_learning
from time import sleep
from picamera import PiCamera
import numpy as np
#export_file = 'node_editor/setting/export.json'
#with open(export_file) as fp:
#    work_flow = json.load(fp)
input_nodes = ['WebCam']
image_processing_nodes = [
'ApplyColorMap',
'Blur',
'Brightness',
'Canny',
'Contrast',
'Crop',
'EqualizeHist',
'Flip',
'GammaCorrection',
'Grayscale',
'OmnidirectionalViewer',
'Resize',
'SimpleFilter',
'Threshold'
]
deep_learning_nodes = ['Classification', 'ObjectDetection']
camera = PiCamera()
camera.resolution = (320, 320)
camera.framerate = 24
camera.rotation = 180
sleep(2)
image = np.empty((320, 320, 3), dtype=np.uint8)
while True:     # Loop to capture webcam.
try:
camera.capture(image, 'bgr')
export_file = 'node_editor/setting/export.json'
with open(export_file) as fp:
work_flow = json.load(fp)
link_list = work_flow["link_list"]
for node in link_list:
start_node_idx = node[0].split(":")[0]
start_node_name = node[0].split(":")[1]
end_node_idx = node[1].split(":")[0]
end_node_name = node[1].split(":")[1]
print (f'{start_node_name}-{end_node_name}')
# Handle start nodes
if (start_node_name in input_nodes):
start_node_img = image
elif (start_node_name in image_processing_nodes):
start_node_cfg = work_flow[f"{start_node_idx}:{start_node_name}"]["setting"]
start_node_img = image_processing(image, start_node_name, start_node_cfg)
elif (start_node_name in deep_learning_nodes):
start_node_cfg = work_flow[f"{start_node_idx}:{start_node_name}"]["setting"]
img = deep_learning(image, start_node_name, start_node_cfg)
# Handle end nodes
if (end_node_name in image_processing_nodes):
end_node_cfg = work_flow[f"{end_node_idx}:{end_node_name}"]["setting"]
img = image_processing(start_node_img, end_node_name, end_node_cfg)
# cv2.imshow(f'{end_node_name}', img)
# cv2.waitKey()
# break
elif (end_node_name in deep_learning_nodes):
end_node_cfg = work_flow[f"{end_node_idx}:{end_node_name}"]["setting"]
img = deep_learning(start_node_img, end_node_name, end_node_cfg)
cv2.imshow(f'{end_node_name}', img)
cv2.waitKey()
except:
traceback.print_exc()
cv2.destroyAllWindows()

export.json sample

{
    "node_list": [
        "1:WebCam",
        "2:Blur",
        "3:Canny",
        "4:Flip",
        "8:ObjectDetection"
    ],
    "link_list": [
        [
            "1:WebCam:Image:Output01",
            "2:Blur:Image:Input01"
        ],
        [
            "1:WebCam:Image:Output01",
            "3:Canny:Image:Input01"
        ],
        [
            "1:WebCam:Image:Output01",
            "4:Flip:Image:Input01"
        ],
        [
            "4:Flip:Image:Output01",
            "8:ObjectDetection:Image:Input01"
        ]
    ],
    "1:WebCam": {
        "id": "1",
        "name": "WebCam",
        "setting": {
            "ver": "0.0.1",
            "pos": [
                0,
                0
            ]
        }
    },
    "2:Blur": {
        "id": "2",
        "name": "Blur",
        "setting": {
            "ver": "0.0.1",
            "pos": [
                294,
                25
            ],
            "2:Blur:Int:Input02Value": 1
        }
    },
    "3:Canny": {
        "id": "3",
        "name": "Canny",
        "setting": {
            "ver": "0.0.1",
            "pos": [
                321,
                285
            ],
            "3:Canny:Int:Input02Value": 100
        }
    },
    "4:Flip": {
        "id": "4",
        "name": "Flip",
        "setting": {
            "ver": "0.0.1",
            "pos": [
                17,
                300
            ],
            "4:Flip:Text:Input02Value": true,
            "4:Flip:Text:Input03Value": false
        }
    },
    "8:ObjectDetection": {
        "id": "8",
        "name": "ObjectDetection",
        "setting": {
            "ver": "0.0.1",
            "pos": [
                630,
                101
            ],
            "8:ObjectDetection:Text:Input02Value": "NanoDet-Plus-m (416x416)",
            "8:ObjectDetection:Float:Input03Value": 0.3
        }
    }
}

For example, the above export.json file is a graph of the processing flow. The program runs on Raspberry Pi and capture image from Pi camera -> making image processing functions (Blur, Canny Edge Detection, Flip), and the flipped images would be sent to an Object Detector (OpenCV DNN). The detected objects will be plotted using OpenCV imshow().

Program gets export.json file and do Object Detection