Yolov5 Background
Updates in Yolov5
Architecture
CSP Backbone
Conversion of YOLOv5 to x-model
Quantizing Yolov5 Pytorch with Vitis AI 3.0
Load the model
Quantization → quant model
Forward pass
Handle the quant model
Compiling a quantized model

LogicTronix [FPGA Design + Machine Learning Company]

Published December 15, 2023 © LGPL

YOLOv5 Quantization & Compilation with Vitis AI 3.0 for Kria

This tutorial is on Quantizing and Compiling the Ultralytics Yolov5 (Pytorch) with Vitis AI 3.0 and targeted for Kria KV260 FPGA Board.

IntermediateFull instructions provided2 hours6,275

YOLOv5 Quantization & Compilation with Vitis AI 3.0 for Kria

Things used in this project

Hardware components

AMD Kria KV260 Vision AI Starter Kit

Software apps and online services

AMD Vitis-AI (3.0)

Story

Yolov5 Background:

The main goal of YOLOv5 is designing a fast operating speed of an object detector in production systems and optimization for parallel computations, rather than the low computation volume theoretical indicator (BLOP).
YOLOv4 implementation in Darknet while that of YOLOv5 is in PyTorch, hence v5 may be easier to bring to production while v4 is where top-accuracy research may continue to progress.
It is a natural extension of the YOLOv3 PyTorch repository.
After fully replicating the model architecture and training procedure of v3, ultralytics began to make research improvements alongside repository design changes.

[Update] We have released code and sources of this hackster.io project at here: https://github.com/LogicTronix/Vitis-AI-Reference-Tutorials/tree/main/Quantizing-Compiling-Yolov5-Hackster-Tutorial

Updates in Yolov5:

PANet updates: new heads, reduced parameters faster inference, and improved mAP.
FP16: as a new default which leads to smaller checkpoints and faster inference
CSP updates

Architecture

Fig: The anatomy of an object detector [3]

Fig: Network architecture for YOLO v5 [1]

It uses the same architecture as that of Yolov4.

Involves creating features from input images. These features are then fed through a prediction system to raw boxes around objects and predict their classes.
The YOLO network consists of three main pieces.

Backbone: A convolutional neural network that aggregates and forms image features at different granularities.

Neck: A series of layers to mix and combine image features to pass them forward to prediction. Head: Consumes features from the neck and takes box and class prediction steps.

Fig: Equations used to compute the target bounding boxes. (a) Equations used in previous versions (YOLOv2, YOLOv3). (b) Equations used in YOLOv5 [1]

Main training procedures:

Data Augmentation: transformation to base training data to expose the model to a wider range of semantic variation than the training set in isolation. Eg: Scaling, Color space adjustment, and mosaic augmentation.
Loss calculation: GIoU, object, and class loss.

Similar to Yolov3, the v5 network predicts the bounding boxes as deviations from a list of anchor box dimensions.

Fig: Use of anchor in prediction of bounding box

Conversion from 32-bit precision to 16-bit precision which helps to speed up the inference time of models.

CSP Backbone

Both YOLOv4 and YOLOv5 implement the CSP Bottleneck to formulate image features. Research credit for this architecture is directed to WongKinYiuand their recent paper on Cross Stage Partial Networksfor the convolutional neural network backbone.

The CSP addresses duplicate gradient problems in other larger ConvNet backbones resulting in fewer parameters and fewer FLOPS for comparable importance. This is extremely important to the YOLO family, where inference speed and small model size are paramount.

The CSP models are based on DenseNet. DenseNet was designed to connect layers in convolutional neural networks with the following motivations:

to alleviate the vanishing gradient problem (it is hard to backprop loss signals through a very deep network);
to bolster feature propagation;
to encourage the network to reuse features and;
to reduce the number of network parameters.

It uses residual and dense blocks to overcome the vanishing gradient problem. However, the problem of redundant gradients occurs which is tackled by the CSPNet by truncating the gradient flow.

PA-Net Neck

Both Yolov4 and v5 implement the PA-Net neck for feature aggregation.

Fig: Structure of FPN, PANet, NAS-FPN and BiFPN

Each one of the P_i represents the feature layer in the CSP backbone.
Improves the information flow and helps in the proper localization of pixels in the task of mask prediction.
In v5, the network has been modified by applying the CSPNet strategy.

Spatial Pyramid Pooling

SPP block performs an aggregation of the information that is received from the inputs and returns a fixed-length output. Thus it has the advantage of significantly increasing the receptive field and segregating the most relevant context features without lowering the speed of the network. This block has been used in previous versions of YOLO (yolov3 and yolov4) to separate the most important features from the backbone, however in YOLOv5(6.0/6.1) SPPF has been used, which is just another variant of the SPP block, to improve the speed of the network

Different configuration files

v5 formulates model configuration in.yaml, as opposed to the.cfg files used in Darknet.

The main difference between these two formats is that the.yaml file is condensed to specify the network's different layers and then multiply those by the number of layers in the block.

Activation function used

Fig: Graph of the activation functions used in YOLOv5. (a) SiLU function graph. (b) Sigmoid function graph.

Used SiLU and Sigmoid activation function.

Major Improvement

The Focus Layer: replaced the three first layers of the network. It helped reduce the number of parameters, the number of FLOPS, and the CUDA memory while improving the speed of the forward and backward passes with minor effects on the mAP (mean Average Precision).
Eliminating Grid Sensitivity: It was hard for the previous versions of YOLO to detect bounding boxes on image corners mainly due to the equations used to predict the bounding boxes, but the new equations presented above helped solve this problem by expanding the range of the center point offset from (0-1) to (-0.5, 1.5) therefore the offset can be easily 1 or 0 (coordinates can be in the image's edge) as shown in the image in the left. Also, the height and width scaling ratios were unbounded in the previous equations which may lead to training instabilities but now this problem has been reduced as shown in the figure on the right.

The running environment: The previous versions of YOLO were implemented on the Darknet framework that is written in C, however, YOLOv5 is implemented in Pytorch giving more flexibility to control the encoded operations.

Conversion of YOLOv5 to x-model

Note: we used Vitis ai 3.0 for the below steps.

Steps:

Download the pre-trained model from the Ultralytics repo (https://github.com/ultralytics/yolov5).
Clone the code from the given GitHub repo.

Arrange the dataset and create a.yaml for the custom dataset.

Fig: Above shown is the bdd.yaml file inside the data folder.

The above yaml is customized for the BDD dataset. Here, the path to the dataset root directory is given as a path while the images for train, test, and validation are given as train, val, and test. Also, the number of classes in the custom dataset should be specified as nc while the class name should be specified as names in the above-shown format.

Change the activation function from SiLU to LeakyRelu (use negative slope as 26/256).

Since the Vitis AI does not support the SiLU activation function (The list of supported operators in PyTorch by Vitis AI is given HERE). Out of all the supported activations, Leaky ReLU with a negative slope of 26/256 gives the better result.

In models/common.py (line 66 and 147) and experimental.py(line 55):

self.act = nn.LeakyReLU(26/256, inplace=True) # in place of nn.SiLU

Train the model using the given code.python train.py
Since the permute and view are not supported in the Vitis-AI 3.0, remove the last layer in the detection head and reimplement that last layer in the post-processing section.

Inside yolo.py:

class Detect(nn.Module):
    # YOLOv5 Detect head for detection models
    stride = None  # strides computed during build
    dynamic = False  # force grid reconstruction
    export = False  # export mode

    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
				#original code

    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
        return x

In detect.py:

## 
predi = model(im, augment=augment, visualize = visualize)
pred = postprocessing(predi)
def postprocessing(x):
    grid = [torch.empty(0) for _ in range(3)]
    z = []
    anchor_grid = [torch.empty(0) for _ in range(3)]
    stride = torch.tensor([ 8., 16., 32.], device='cuda:0')
    # anchors = torch.tensor([[10.,13., 16.,30., 33.,23.],[30.,61., 62.,45., 59.,119.],[116.,90., 156.,198., 373.,326.]] , device='cuda:0')
    anchors = torch.tensor([[1.25000,  1.62500, 2.00000,  3.75000,4.12500,  2.87500],
        [1.87500,  3.81250, 3.87500,  2.81250, 3.68750,  7.43750],
        [ 3.62500,  2.81250, 4.87500,  6.18750, 11.65625, 10.18750]], device='cuda:0')
    anchors = torch.tensor(anchors).float().view(3,-1,2)
    # anchors[0] = anchors[0] / stride[0]
    # anchors[1] = anchors[1] / stride[1]
    # anchors[2] = anchors[2] / stride[2]

    for i in range(3):
        bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
        x[i] = x[i].view(bs, 3, 18, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

        if grid[i].shape[2:4] != x[i].shape[2:4]:
            grid[i], anchor_grid[i] = make_grid(nx,ny,i,anchors,stride)

            
        xy, wh, conf = x[i].sigmoid().split((2, 2, 13 + 1), 4)
        xy = (xy * 2 + grid[i]) * stride[i]  # xy
        wh = (wh * 2) ** 2 * anchor_grid[i]  # wh
        y = torch.cat((xy, wh, conf), 4)
        z.append(y.view(bs, 3 * nx * ny, 18))

    return (torch.cat(z, 1), x)
def make_grid(nx=20, ny=20, i=0,anchors = None, stride = None, torch_1_10=check_version(torch.__version__, '1.10.0')):
    d = anchors[i].device
    t = anchors[i].dtype

    shape = 1, 3, ny, nx, 2  # grid shape
    y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)
    yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility
    grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5
    anchor_grid = (anchors[i] * stride[i]).view((1, 3, 1, 1, 2)).expand(shape)
    return grid, anchor_grid

The postprocessing function is implemented based on the original Detect class present in the ultralytics inside models/yolo.py.

Quantizing Yolov5 Pytorch with Vitis AI 3.0

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like an 8-bit integer (int8) instead of the usual 32-bit floating point (float32).

Reducing the number of bits means the resulting model requires less memory storage, consumes less energy (in theory), and operations like matrix multiplication can be performed much faster with integer arithmetic. It also allows to run models on embedded devices, which sometimes only support integer data types.

Fig: Vitis AI (3.0)- GPU

We will be using Vitis AI 3.0 (GPU) for quantization, we can also perform the quantization on the CPU docker of Vitis AI. GPU docker needs to be built locally while CPU docker can be pulled and used. Performing quantization on GPU Docker is faster than CPU Docker.

The following input should be given for the quantization process. The build directory gives the path to the build folder i.e. to save the quantized xmodel. The quant_mode allows us to specify calib or test mode, both of which are required. Also, the weights help us to specify which pt model is to be quantized and the dataset allows us to enter the root to the small dataset to be used during the forward pass.

Steps for quantization:

Load the model
Quantization → quant model
Forward pass
Handle the quant model

Load the model:

To quantize and compile the model using Vitis AI, at first model(pt or pth model). For YOLOv5, this can be achieved with the following code snippet.

Quantization → quant model

The quantization is performed using the torch_quantizer function from the pytorch_nndct library which is provided by Vitis AI.

The code snippet above creates an instance of a quantizer object. The quantizer object is created by passing the parameters to the torch_quantizer function of the pytorch_nndct library.

The torch_quantizer function takes in:

● quant_mode which can be either “calib” or “test”

● model which is the float model that we loaded.

● rand_in which is the dummy input required for the float model with batch_size, images, each with 3 color channels (RGB), and each image has a resolution of 640 pixels in height and 640 pixels in width.

output_dir is the location where the quantized xmodel is to be saved.

Forward pass

A forward pass is the process of propagating input data through the network's layers to compute an output. During this forward pass, the input data flows through the layers of the model to produce a prediction or output.

As shown in the code snippet above, a dry run is performed on the dataset (that was passed as argument) along with the non-maximum suppression to get the prediction.

Handle the quant model

The quantization result is handled based on the two different modes of quantization: calibration (quant_mode == 'calib') and testing (quant_mode == 'test').

Calibration Mode (quant_mode == 'calib'): In this mode, the script configures the quantizer to perform model calibration and export the quantization configuration. Model calibration is a process where you collect statistics about the model's inputs (e.g., min and max values of activations) to later quantize the model effectively.
quantizer.export_quant_config(): This function exports the quantization configuration obtained during calibration. This configuration is crucial for quantifying the model correctly during deployment.
Testing Mode (quant_mode == 'test'): In this mode, the script performs quantization testing and exports the quantized model in different formats. This mode can also be used to validate the quantized model's performance before deploying it in a production environment.
quantizer.export_xmodel(deploy_check=True, dynamic_batch=True): This function exports the quantized model in "xmodel" format. The parameters deploy_check suggest that this export may include checks for deployment readiness. The quantized model is necessary to generate a compiled model for a given target.

Here is the Quant.py (updated Quantization script with the above mentioned changes).

Compiling a quantized model

Compiling a quantized model refers to preparing a quantized neural network model for deployment on a target platform or hardware. The quantized model is compiled for the target hardware or accelerator. This involves adapting the model to take full advantage of the hardware's capabilities, such as vectorized instructions, hardware accelerators, or specialized memory layouts. The compiled model is integrated into the inference pipeline of the target application.

Once compiled and integrated, the quantized model is ready for deployment on the target platform or device. It can be used to make predictions or perform computations efficiently, taking advantage of the reduced precision and hardware optimizations.

For PyTorch, the quantizer NNDCT outputs the quantized model in the XIR format directly i.e. compiled xmodel formatted model.

Use vai_c_xir to compile the quantized model:

vai_c_xir --xmodel /PATH/TO/quantized.xmodel --arch /PATH/TO/arch.json --output_dir /OUTPUTPATH --net_name netname

Example:

vai_c_xir --xmodel quantized_model/YOLOv5_quantized.xmodel  --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json --net_name yolov5_kv260 --output_dir ./KV260

Final layer view in Netron, below is the view of the compiled model for the KV260 board.

Similar type of approach is also discussed at this Vitis AI Forum link :quantizing-ultralytics-yolov5-vitis-ai-v35-modifying-forward-function

Reference:

[1]. YOLO v5 model architecture [Explained]: https://iq.opengenus.org/yolov5/

[2]. Object Detection Algorithm — YOLO v5 Architecture: https://medium.com/analytics-vidhya/object-detection-algorithm-yolo-v5-architecture-89e0a35472ef

[3]. What is YOLOv5? A Guide for Beginners.: https://blog.roboflow.com/yolov5-improvements-and-evaluation/

[4]. https://github.com/ultralytics/yolov5/issues/

[5]. Vitis AI - Yolov5 Tutorial: https://xilinx.eetrend.com/blog/2022/100565582.html

Kudos,

Kudos to Anupam@LogicTronix.com for writing detail and insightful article/tutorial on "Yolov5 Quantization and Compilation". On next tutorial we will go for "deploying the compiled model in KV260 FPGA Board". Kudos to Dikesh@Logictronix.com for planning this tutorial!

You can find Quant.py and compiled model at below at attachment!

And you can also check the github repo of this tutorial: https://github.com/LogicTronix/Vitis-AI-Reference-Tutorials/tree/main/Quantizing-Compiling-Yolov5-Hackster-Tutorial

For queries, you can write to above email or at info@logictronix.com!

import os
import time
import sys
import argparse
import torch
import numpy as np
from torch.utils.data import Dataset
import torchvision
from torchvision.io import read_image
from pytorch_nndct.apis import torch_quantizer
from models.common import DetectMultiBackend

class CustomImageDataset(Dataset):
    def __init__(self, label_dir, img_dir, width, height, transforms=None):
        self.label_dir = label_dir
        self.img_dir = img_dir
        self.transforms = transforms
        self.height = height
        self.width = width

        self.img_names = []
        for filename in os.listdir(img_dir):
            temp = os.path.splitext(filename)
            self.img_names.append(temp[0])

    def gen_id(name: str):
        name = ''.join((x for x in name if x.isdigit()))
        name = name[0:10] + name[11:len(name)]
        return int(name)

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, idx):
        img_filename = self.img_names[idx] + ".jpg"
        label_filename = self.img_names[idx] + ".txt"
        img_path = os.path.join(self.img_dir, img_filename)
        label_path = os.path.join(self.label_dir, label_filename)

        image = read_image(img_path)
        image = torchvision.transforms.Resize((self.width, self.height))(image)
        image = image.float()  # uint8 to fp16/32
        image /= 255  # 0 - 255 to 0.0 - 1.0

        boxes_array = []
        labels_array = []

        with open(label_path) as f:
            lines = f.readlines()
            for line in lines:
                vals = line.split(" ")
                labels_array.append(int(vals[0]))
                x0 = (float(vals[1]) - (float(vals[3]) / 2)) * self.width
                y0 = (float(vals[2]) - (float(vals[4]) / 2)) * self.height
                x1 = (float(vals[1]) + (float(vals[3]) / 2)) * self.width
                y1 = (float(vals[2]) + (float(vals[4]) / 2)) * self.height
                boxes_array.append([x0, y0, x1, y1])

        boxes_tensor = torch.as_tensor(boxes_array, dtype=torch.float32)
        area_tensor = (boxes_tensor[:, 3] - boxes_tensor[:, 1]) * (boxes_tensor[:, 2] - boxes_tensor[:, 0])
        iscrowd_tensor = torch.zeros((boxes_tensor.shape[0],), dtype=torch.int64)
        labels_tensor = torch.as_tensor(labels_array, dtype=torch.int64)

        target = {}
        target["boxes"] = boxes_tensor
        target["labels"] = labels_tensor
        target["area"] = area_tensor
        target["iscrowd"] = iscrowd_tensor
        # img_id = torch.tensor([self.gen_id(self.img_names[idx])])
        img_id = torch.tensor([idx + 1])
        target["image_id"] = img_id

        if self.transforms:
            sample = self.transform(image=image, bboxes=target["boxes"], labels=labels_tensor)
            image = sample['image']
            target['boxes'] = torch.Tensor(sample['bboxes'])

        return image, target

def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x
    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y
    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x
    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y
    return y

def non_max_suppression(
        prediction,
        conf_thres=0.25,
        iou_thres=0.45,
        classes=None,
        agnostic=False,
        multi_label=False,
        labels=(),
        max_det=300,
        nm=0,  # number of masks
):
    """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = prediction.shape[2] - nm - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    max_wh = 7680  # (pixels) maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 0.5 + 0.05 * bs  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    mi = 5 + nc  # mask start index
    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 5), device=x.device)
            v[:, :4] = lb[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box/Mask
        box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)
        mask = x[:, mi:]  # zero columns if no masks

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = x[:, 5:mi].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        i = i[:max_det]  # limit detections
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if mps:
            output[xi] = output[xi].to(device)
        if (time.time() - t) > time_limit:
            LOGGER.warning(f'WARNING  NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded

    return output

DIVIDER = '-'*50

def quantize(build_dir, quant_mode, weights, dataset):
    quant_model = build_dir + '/quant_model'

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = DetectMultiBackend(weights=weights)
    model = model.to(device)
    rand_in = torch.randn(1, 3, 640, 640)
    quantizer = torch_quantizer(quant_mode, model, rand_in, output_dir=quant_model)
    quantized_model = quantizer.quant_model
    quantized_model = quantized_model.to(device)

    test_dataset = CustomImageDataset(os.path.join(dataset + 'labels'), os.path.join(dataset + 'images'), 640, 640)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, shuffle=False)

    quantized_model.eval()
    
    with torch.no_grad():
        for image, target in test_loader:
            print(f'Image {target["image_id"][0][0]}')
            output = quantized_model(image.to(device))
            pred = non_max_suppression(output)
            print(pred)


    if quant_mode == 'calib':
        quantizer.export_quant_config()
    if quant_mode == 'test':
        quantizer.export_xmodel(deploy_check=False, output_dir=quant_model)
  

def run_main():

  # construct the argument parser and parse the arguments
  ap = argparse.ArgumentParser()
  ap.add_argument('-b',  '--build_dir',  type=str, default='build',    help='Path to build folder. Default is build')
  ap.add_argument('-q',  '--quant_mode', type=str, default='calib',    choices=['calib','test'], help='Quantization mode (calib or test). Default is calib')
  ap.add_argument('-w',  '--weights',  type=str,  help='Path to yolo weights file')
  ap.add_argument('-d',  '--dataset',  type=str,  help='Path to your calibration directory with subdirectories called "images" and "labels"' )
  args = ap.parse_args()

  print('\n'+DIVIDER)
  print('PyTorch version : ',torch.__version__)
  print(sys.version)
  print(DIVIDER)
  print(' Command line options:')
  print ('--build_dir    : ',args.build_dir)
  print ('--quant_mode   : ',args.quant_mode)
  print ('--weights    : ',args.weights)
  print ('--dataset    : ',args.dataset)
  print(DIVIDER)

  quantize(args.build_dir, args.quant_mode, args.weights, args.dataset)
  return

if __name__ == '__main__':
    run_main()

Credits

LogicTronix [FPGA Design + Machine Learning Company]

42 projects • 167 followers

We are Certified FPGA Design + Machine Learning Company with expertise on Machine Learning, Computer Vision, Embedded & Crypto Development.

Contact

Comments

Please log in or sign up to comment.

YOLOv5 Quantization & Compilation with Vitis AI 3.0 for Kria

Things used in this project

Hardware components

Software apps and online services

Story

Yolov5 Background:

Updates in Yolov5:

Architecture

CSP Backbone

Conversion of YOLOv5 to x-model

Quantizing Yolov5 Pytorch with Vitis AI 3.0

Load the model:

Quantization → quant model

Forward pass

Handle the quant model

Compiling a quantized model

Code

Quant.py

KV260-Compiled-Model

Credits

LogicTronix [FPGA Design + Machine Learning Company]

Comments

Embed the widget on your own site

YOLOv5 Quantization & Compilation with Vitis AI 3.0 for Kria

YOLOv5 Quantization & Compilation with Vitis AI 3.0 for Kria

Things used in this project

Hardware components

Software apps and online services

Story

Yolov5 Background:

Updates in Yolov5:

Architecture

CSP Backbone

Conversion of YOLOv5 to x-model

Quantizing Yolov5 Pytorch with Vitis AI 3.0

Load the model:

Quantization → quant model

Forward pass

Handle the quant model

Compiling a quantized model

Code

Quant.py

KV260-Compiled-Model

Credits

LogicTronix [FPGA Design + Machine Learning Company]

Comments

Related channels and tags