Published September 14, 2022 © GPL3+

Smart Cashier with Edge Impulse FOMO

Self checkout smart cashier using object detection to calculate number of items and total price for your purchase.

IntermediateFull instructions provided6 hours5,504

Things used in this project

Hardware components

Raspberry Pi 4 Model B

Adafruit RGB Backlight LCD - 16x2

USB webcam (generic)

Software apps and online services

Edge Impulse Studio

Raspberry Pi OS

Hand tools and fabrication machines

3D Printer (generic)

Story

In this project I’m taking advantage of FOMO (Faster Objects, More Objects) algorithm that’s really fast and efficient in object detection. The algorithm is suitable for recognizing the different types of objects that’s placed on the cashier table without the use of barcode and is able to output the total price of items. 96x96 pixel and grayscale color depth provides enough data to make this project work. The model is exported into a Python program which is deployed to a Raspberry Pi, so it can be run locally. By running the machine learning model on the edge, this device will use less energy, less human labour, and cut down hardware cost overall. This concept can be further developed with more data variation, camera, and different cashier environment and lighting conditions to improve its accuracy in the real-world application.

STEPS:

Preparation:Prepare the Raspberry Pi 4, connect via ssh, install dependencies and Edge Impulse for Linux CLI. Follow this guide: https://docs.edgeimpulse.com/docs/development-platforms/officially- supported-cpu-gpu-targets/raspberry-pi-4

Data collection:

For the image collection I took some pictures using the USB webcam using the Raspberry Pi that’s connected to the Edge Impulse Studio and some other pictures are taken from a smartphone camera. The position and orientation of the items are shifted between pictures to help the ML model recognize the object later in the process.

The photos are taken using a tripod so that the size of the objects placed on the table will not change too much when it’s captured as an image. (This is especially important for FOMO algorithm, FOMO doesn’t perform well with different object sizes). The total object input is 408 items which consists of 8 different objects (snacks).

Data labelling:

Click on the labelling Bounding-Boxes method and choose Raspberry Pi 4 for latency calculations.

Then upload and then drag a box around an object and label it. Split or auto split all training and test data around 80/20.

Train and build model:

Create impulse with 160x160 pixels and Grayscale parameter, and choose Image and Object Detection blocks. Choose FOMO (MobileNet V2 0.35) which will produce 8 layer outputs (cadbury, mentos, indomie, kitkat, etc.) In this example we achieved a pretty good accurace. After testing is done we can check the video stream via browser using edge-impulse-linux-runner. If the result perform as expected, then the model is ready to be deployed to the Raspberry Pi 4.

Deploy in Raspberry Pi 4 wth Python program, output to LCD 16 x 2:The Python program I created utilizes the eim file from the training result, which transforms the object input to prices and quantity of objects output. The program also displays the output in a LCD 16 x 2.

for res, img in runner.classifier(videoCaptureDeviceId):
                if (next_frame > now()):
                    time.sleep((next_frame - now()) / 1000)

                # print('classification runner response', res)

                if "classification" in res["result"].keys():
                    print('Result (%d ms.) ' % (res['timing']['dsp'] + res['timing']['classification']), end='')
                    for label in labels:
                        score = res['result']['classification'][label]
                        print('%s: %.2f\t' % (label, score), end='')
                    print('', flush=True)

                elif "bounding_boxes" in res["result"].keys():
                    print('Found %d bounding boxes (%d ms.)' % (len(res["result"]["bounding_boxes"]), res['timing']['dsp'] + res['timing']['classification']))
                    prices = {"cadbury_DM" : 1.1, "indomie_goreng" : 0.4, "kitkat" : 0.6, "kitkat_gold" : 0.8, "mentos" : 0.7, "milo_nuggets" : 1.0, "pocky_chocolate" : 1.2, "toblerone" : 2.0};   # set item price
                    total = 0
                    for bb in res["result"]["bounding_boxes"]:
                        print('\t%s (%.2f): x=%d y=%d w=%d h=%d' % (bb['label'], bb['value'], bb['x'], bb['y'], bb['width'], bb['height']))
                        img = cv2.rectangle(img, (bb['x'], bb['y']), (bb['x'] + bb['width'], bb['y'] + bb['height']), (255, 0, 0), 1)
                        total += prices[bb['label']]    #set total price
                    print("Writing to display") # write to 16x2 LCD
                    display.lcd_display_string("Items: " + str(len(res["result"]["bounding_boxes"])), 1) # show total bounding boxes as items  
                    display.lcd_display_string("Total: $" + "{:.2f}".format(total), 2) # show total price

                if (show_camera):
                    cv2.imshow('edgeimpulse', cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
                    if cv2.waitKey(1) == ord('q'):
                        break

This project is an example of how embedded object detection can be the solution for a real-world problem. The Smart Cashier can identify objects, quantity of objects and the total price of objects almost instantly and running it locally using a single board computer.

Schematics

Code

smartcashierFOMO_lcd16x2.py

#!/usr/bin/env python

#import device_patches       # Device specific patches for Jetson Nano (needs to be before importing cv2)

import cv2
import os
import sys, getopt
import signal
import time
import drivers      # Driver for LCD 16 x 2
from edge_impulse_linux.image import ImageImpulseRunner

display = drivers.Lcd()

runner = None
# if you don't want to see a camera preview, set this to False
show_camera = True
if (sys.platform == 'linux' and not os.environ.get('DISPLAY')):
    show_camera = False

def now():
    return round(time.time() * 1000)

def get_webcams():
    port_ids = []
    for port in range(5):
        print("Looking for a camera in port %s:" %port)
        camera = cv2.VideoCapture(port)
        if camera.isOpened():
            ret = camera.read()[0]
            if ret:
                backendName =camera.getBackendName()
                w = camera.get(3)
                h = camera.get(4)
                print("Camera %s (%s x %s) found in port %s " %(backendName,h,w, port))
                port_ids.append(port)
            camera.release()
    return port_ids

def sigint_handler(sig, frame):
    print('Interrupted')
    if (runner):
        runner.stop()
    sys.exit(0)

signal.signal(signal.SIGINT, sigint_handler)

def help():
    print('python classify.py <path_to_model.eim> <Camera port ID, only required when more than 1 camera is present>')

def main(argv):
    try:
        opts, args = getopt.getopt(argv, "h", ["--help"])
    except getopt.GetoptError:
        help()
        sys.exit(2)

    for opt, arg in opts:
        if opt in ('-h', '--help'):
            help()
            sys.exit()

    if len(args) == 0:
        help()
        sys.exit(2)

    model = args[0]

    dir_path = os.path.dirname(os.path.realpath(__file__))
    modelfile = os.path.join(dir_path, model)

    print('MODEL: ' + modelfile)

    with ImageImpulseRunner(modelfile) as runner:
        try:
            model_info = runner.init()
            print('Loaded runner for "' + model_info['project']['owner'] + ' / ' + model_info['project']['name'] + '"')
            labels = model_info['model_parameters']['labels']
            if len(args)>= 2:
                videoCaptureDeviceId = int(args[1])
            else:
                port_ids = get_webcams()
                if len(port_ids) == 0:
                    raise Exception('Cannot find any webcams')
                if len(args)<= 1 and len(port_ids)> 1:
                    raise Exception("Multiple cameras found. Add the camera port ID as a second argument to use to this script")
                videoCaptureDeviceId = int(port_ids[0])

            camera = cv2.VideoCapture(videoCaptureDeviceId)
            ret = camera.read()[0]
            if ret:
                backendName = camera.getBackendName()
                w = camera.get(3)
                h = camera.get(4)
                print("Camera %s (%s x %s) in port %s selected." %(backendName,h,w, videoCaptureDeviceId))
                camera.release()
            else:
                raise Exception("Couldn't initialize selected camera.")

            next_frame = 0 # limit to ~10 fps here

            for res, img in runner.classifier(videoCaptureDeviceId):
                if (next_frame > now()):
                    time.sleep((next_frame - now()) / 1000)

                # print('classification runner response', res)

                if "classification" in res["result"].keys():
                    print('Result (%d ms.) ' % (res['timing']['dsp'] + res['timing']['classification']), end='')
                    for label in labels:
                        score = res['result']['classification'][label]
                        print('%s: %.2f\t' % (label, score), end='')
                    print('', flush=True)

                elif "bounding_boxes" in res["result"].keys():
                    print('Found %d bounding boxes (%d ms.)' % (len(res["result"]["bounding_boxes"]), res['timing']['dsp'] + res['timing']['classification']))
                    prices = {"cadbury_DM" : 1.1, "indomie_goreng" : 0.4, "kitkat" : 0.6, "kitkat_gold" : 0.8, "mentos" : 0.7, "milo_nuggets" : 1.0, "pocky_chocolate" : 1.2, "toblerone" : 2.0};   # set item price
                    total = 0
                    for bb in res["result"]["bounding_boxes"]:
                        print('\t%s (%.2f): x=%d y=%d w=%d h=%d' % (bb['label'], bb['value'], bb['x'], bb['y'], bb['width'], bb['height']))
                        img = cv2.rectangle(img, (bb['x'], bb['y']), (bb['x'] + bb['width'], bb['y'] + bb['height']), (255, 0, 0), 1)
                        total += prices[bb['label']]    #set total price
                    print("Writing to display") # write to 16x2 LCD
                    display.lcd_display_string("Items: " + str(len(res["result"]["bounding_boxes"])), 1) # show total bounding boxes as items  
                    display.lcd_display_string("Total: $" + "{:.2f}".format(total), 2) # show total price

                if (show_camera):
                    cv2.imshow('edgeimpulse', cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
                    if cv2.waitKey(1) == ord('q'):
                        break

                next_frame = now() + 100
        finally:
            if (runner):
                runner.stop()

if __name__ == "__main__":
   main(sys.argv[1:])

Smart Cashier with Edge Impulse FOMO

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Data collection:

Data labelling:

Train and build model:

Custom parts and enclosures

LCD holder for Raspberry Pi

Schematics

I2C LCD connection

Code

smartcashierFOMO_lcd16x2.py

Credits

Samuel Alexander

Comments

Embed the widget on your own site

Smart Cashier with Edge Impulse FOMO

Smart Cashier with Edge Impulse FOMO

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

Data collection:

Data labelling:

Train and build model:

Custom parts and enclosures

LCD holder for Raspberry Pi

Schematics

I2C LCD connection

Code

smartcashierFOMO_lcd16x2.py

Credits

Samuel Alexander

Comments

Related channels and tags