The Computers
The Seed Card
Image Segmentation (Inference
ROI isolation and characterization (Post-processing
The process
Performance
Conclusions
What's next

Created February 12, 2022

Seed Image Processing Aided Analysis

A computer system will provide a set of tools to isolate, analyze, and track the seed features from digital images

IntermediateProtip136

Things used in this project

Hardware components

AMD VCK5000 Versal Development Card

ES1 Silicon Board

C Mount Camera with optics

Generic inspection camera with 2 interchangeable optics and polarized crystal

Polarized Light Ring

LED ring with polarized crystal

Ryzen 7 5800X

This processor has not graphics integrated, a expansion board is required

ProArt X570-CREATOR WIFI

In order to support the VCK5000, Graphics and SSD with the best performance possible

Software apps and online services

Ubuntu 20.04

TensorFlow

version GPU 1 .12

OpenCV – Open Source Computer Vision Library OpenCV

For postprocessing

GIMP

version 2.10

Story

The Computers

For models generation, a Laptop with Xeon E3-1505M processor, Quadro M2200, with 4 GB GPU memory, and 32 GB system RAM using Windows 10 is used as a training computer. In addition, TensorFlow-GPU 1.12 is used in an Anaconda environment.

The Inference computer is a Desktop with Ryzen 7 5800x in x570 MOBO, VCK5000, remember its 16 GB off-chip memory, in the 16x PCIX16_1 and a FirePro W2100 in PCIX16_3 for graphical interface and SSD. The computer has 32 GB of system RAM and Ubuntu 20.04 as OS.

The Seed Card

Acquiring information to analyze and train AI algorithms is the first challenge. In the present development, 60 seeds images from 6 kinds of plants for a UNet architecture were taken in a grid card. The card consists of squares of 1.5 cm to set a seed in each one. A row has 10 cells for one kind of seed and is identified with the letters from A to F for labeling purposes.

Global acquisition seed dataset. From below to up, the seeds are identified from A to F groups. From left to right, each seed is identified from 1 to 10

The image above was acquired with an Arducam C lens with manual focus and aperture adjustment mounted on a camera. The camera storage images with a resolution of 1920x1080 pixels. In addition, the seed card was in a box to minimize the influence of external light. The light source was a polarized ring and the optics has a polarization crystal to adjust the angle.

According to the training computer, the acquired images must be resized, consequently, the seeds could be deformed and the details will be minimized in the preprocessing stage. To reduce this kind of issue in a low-resource computing system, the images were acquired for each seed in the card with a 100x lens.

Consider facedetect demo, this application uses densebox architecture with a 320x320 pixels resolution.

Global seed card reduced to 320x320

Because the reduced resolution could minimize detail level, the seeds acquired with 100x lenses preserve this information in the images.

1 / 3 • A2 seed trimmed from the reduced global acquisition and resized to 320x320

In order to scale the project in the future, there is a small assumption about the image resolution, the form factor in the camera is 16:9, consequently, a scale that holds this form factor could be used,

Scale factors and their resolutions

A scale factor that achieves an integer Width and Height should be used to perform a subsampling of the image without the pixel modification. In addition, the scale will adjust the memory required for training and the detail level present in the training process. For the present work, an scale of 6 was selected.

Image Segmentation (Inference)

The architecture used is the UNet. The original base architecture is the following,

UNet architecture. Source: U-Net: Convolutional Networks for Biomedical Image Segmentation

The CNN-based architectures present some challenges. The first one is the computational power required. The second is the acceleration of the training process, commonly it is required a GPU with sufficient Memory to store the information. Finally, the original architecture is modified to accept 320x180 pixels, scale 6, images, and have a padding parameter to obtain the same dimensions in the output as the input.

The AI is a powerful tool, but if the training process has not the adequate data the AI tends to reduce its performance. Moreover, the architecture is adjusted for the GPU/CPU computing resources, this reduces the inference metrics and could isolate zones not belonging to the seed or another Region of Interest Under Test (RUT). Look at the next example,

Cucurbita pepo seed segmentation

As a final product, we obtain a .png file that allows the codification of segmentation in the alpha channel. Furthermore, the information in the other channels remains like the original one.

ROI isolation and characterization (Post-processing)

There are some regions that are not valid for the instance of interest in the image. To remove those invalid regions, each one is characterized by statistical features. This allows detecting the region near to image center and mounting its features,

Isolated seed

Annotated seed image

In addition, the seed is reoriented and trimmed by the algorithms to reduce the information and isolate the corresponding seed in the acquired image. For the next example, the isolated RUT was resized to cover the original image dimensions, seed isolation, and annotate the image with the Statistical Features.

1 / 2 • Isolated seed in D4 cell. Image acquired with 100x lenses

As the final product, we obtain two .png files with annotated image and seed isolation respectively. Optionally, an auxiliary file is saved with the seed features.

The process

The first step is to take each acquired image and manually segment those instances that are valid for the study of interest. The GIMP software was used for this purpose. It is necessary to teach algorithms what is our objective, for this, the seed segmentation is the language that the UNet understands. Originally, the images have no information about their meaning, consequently, a user must provide this information. The segmentation must be for each acquired image to obtain the largest amount of information for the algorithm.

Manual segmentation for a Cucumis sativus seed

The second stage is to define the UNet, you can consult the model implemented in the Code Section. This model has an input size of 320x180 pixels for color images to avoid GPU memory overflow. In the training computer, the model requires about 35 seconds for each epoch. The product of this stage is a .hdf5 file containing all hyper-parameters of the designed model. The file is stored as archA_epE_hxwD.hdf5, where arch, ep, h, and w are codified as,

arch, architecture model implemented, 2 for the minimal UNet, 1 for the Medium UNet, and 0 for unmodified architecture.
ep, epoch quantity for the training process of the architecture
h, w, the model input size

1 / 2 • Training log

Despite the Inference computer being powerful, the training process is performed in another one because it has an NVIDIA board as the accelerator. Furthermore, the freeze process is performed in that computer too.

The acquisition, manual segmentation, model quantization, model compiling, and application are performed in the Inference computer. Using Vitis-AI toolset in docker the last three stages are executed. Before starting, the environment needs to be adapted for all the frameworks support. This work is based on the FCN8 and UNET Semantic Segmentation with Keras and Xilinx Vitis AI and PyTorch flow for Vitis AI tutorials. Remember! Vitis-AI uses TensorFlow 1.15 as one of its environments.

If you are using a graphic card with the DisplayPort connection it is possible you need to execute the following

export DISPLAY=0.0

You may wonder how to Compile the model for your board. The examples have information for ZCU or VCK190 but what about Kria, VCK5000, and others? You can execute the command below to see all the supported DPUs for the available boards

tree /opt/vitis_ai/compiler/arch/

Available DPUs in Vitis AI

If you do not remember the hyper-parameters of the model you can use the vai_q_tensorflow to obtain information

vai_q_tensorflow inspect --input_frozen_model MODEL.pb

Performance

The training inference time for a single image process is about 153 ms with 160x360 pixels image resolution with the trained model. It is important to compare in the same conditions for either model variants Freeze TensorFlow model and Compiled model for VCK5000. In addition, it is convenient to use the same language, Python, and for one thread.

The TensorFlow script for multiple image input for the model can be consulted in the Code section. It is possible to observe the measured time is only for the tensor processing on GPU

Performance measures in both computers

The test uses two trimmed datasets: one with 13 images and the other with 47 images, both conform the dataset. From the table above, it is observed an improvement in Vitis AI compared with TensorFlow. Unfortunately, the GPU has not the sufficient memory to allocate the 47 images in the training computer, consequently, the inference is not possible in one thread at the same time for the bundle of images. On the other hand, the inference computer can handle those images, this allows a throughput of 629 FPS. To compare both computers, the second bundle can be allocated in the systems. TensorFlow achieved a throughput of 7 FPS mean while Vitis-AI improve the processing to 480.05 FPS.

Conclusions

Performance graph

It is a time processing reduction observation according to the graph above. The data center inference approach requires about 1.5% of the time consumed in the Laptop, this means an improvement of 67x.

The two systems require a delay time to load the architecture and start to process. Vitis-AI requires more time to load the .xmodel than the .pb in TensorFlow, but the inference computing time is a lot inferior than the required for the .pb.

For continuous work and use of the inference service in Data Centers, we can wait more time to load the model for heavy acceleration. In production environments the user is not tolerant with the computing times, consequently, I believe VCK5000 is a great product to improve the throughput in our AI implementations

What's next?

There are more inferencing challenges in this project. Segmentation is the most popular stage in the processing chain. Now we know where is the seed, how it is visualized, and what is the meaningful information. But other involved branches are,

- Whan kind of seed am I seeing?

- Has the seed an adequate form factor?

- According to World Intellectual Property Organization (WIPO), Does the seed accomplish the unique, homogeneous, and stable criteria?

Seeds need to meet quality standards, and if the vegetative material must be protected by organizations for commercial purposes, they require to cover a rubric for registration. The digital approach allows information registering and traceability without aging information damage in long-term storage mediums, sharing results over a network, and post-processing for new evaluation methods.

Xilinx, as a cloud solutions provider, improves information processing, reducing the processing time and centralizing the organization's inferencing requests. The On-premise products such as VCK5000 or C1100, and the Cloud provider services like F1 instances of Amazon Web Services (AWS) represent a continuous improvement in high-performance computing (HPC)

def UNetMinimal(H, W, C):
    image = Input(shape=(H, W, C) name)
     
    conv1 = Conv2D(64, (3, 3), padding='same', activation='relu')(image)
    conv2 = Conv2D(64, (1, 1), padding='same', activation='relu')(conv1)
    pool = MaxPooling2D(pool_size=(2, 2))(conv2)
    
    conv3 = Conv2D(128, (3, 3), padding='same', activation='relu')(pool)
    conv4 = Conv2D(128, (1, 1), padding='same', activation='relu')(conv3)
    
    convT = Conv2D(64, (2, 2), padding='same', activation='relu')(UpSampling2D(size=(2, 2))(conv4))
    merge = concatenate([convT, conv2], axis=3)
    conv5 = Conv2D(64, (3, 3), padding='same', activation='relu')(merge)
    conv6 = Conv2D(64, (1, 1), padding='same', activation='relu')(conv5)
    
    probability = Conv2D(2, (1, 1), activation='softmax')(conv6)

    return Model(inputs = image, outputs = probability)

import tensorflow as tf
from tensorflow.python.platform import gfile
from cv2 import imread, resize
import numpy as np
from glob import glob
from os.path import join as pJoin
import time

sess = tf.InteractiveSession()
f = gfile.FastGFile(pJoin('..', 'Models', '2A_20E_160x320D', '2A_20E_160x320D.pb'), 'rb')
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
f.close()

sess.graph.as_default()
tf.import_graph_def(graph_def)

output_tensor = sess.graph.get_tensor_by_name('import/conv2d_8/truediv:0')
input_tensor = sess.graph.get_tensor_by_name('import/input_1:0')
model_image_size = (input_tensor.shape.dims[1].value, input_tensor.shape.dims[2].value)

DatasetDir = pJoin('..', 'Dataset', 'Seeds', 'training', 'images')
ImagePaths = glob(pJoin(DatasetDir, '*.png'))
NImages = len(ImagePaths)

bundle = np.zeros((NImages, model_image_size[0], model_image_size[1], 3))
for i in range(NImages):
    I = imread(ImagePaths[i])
    I = I[:,:,0:3]/255
    input_image_size = (I.shape[1], I.shape[0])
    image = resize(I, (model_image_size[1], model_image_size[0]))
    bundle[i] = image
    
print(bundle.shape)

PrdIndex = [None]*NImages


time1 = time.time()
prediction = sess.run(output_tensor, {input_tensor: bundle})
    
time2 = time.time()
timetotal = time2 - time1

fps = NImages/timetotal
print("Throughput=%.2f fps, total frames = %.0f, time=%.4f seconds" %(fps, NImages, timetotal))

Seed Image Processing Aided Analysis

Things used in this project

Hardware components

Software apps and online services

Story

The Computers

The Seed Card

Image Segmentation (Inference)

ROI isolation and characterization (Post-processing)

The process

Performance

Conclusions

What's next?

Code

Minimal model implementation

Time measure for frozen model

Credits

Miguel Angel Castillo Martinez

Comments

Embed the widget on your own site

Seed Image Processing Aided Analysis

Seed Image Processing Aided Analysis

Things used in this project

Hardware components

Software apps and online services

Story

The Computers

The Seed Card

Image Segmentation (Inference)

ROI isolation and characterization (Post-processing)

The process

Performance

Conclusions

What's next?

Code

Minimal model implementation

Time measure for frozen model

Credits

Miguel Angel Castillo Martinez

Comments