In this project I’m taking advantage of FOMO (Faster Objects, More Objects) algorithm that’s really fast and efficient in object detection. The algorithm is suitable for recognizing the different types of objects that’s placed on the cashier table without the use of barcode and is able to output the total price of items. 96x96 pixel and grayscale color depth provides enough data to make this project work. The model is exported into a Python program which is deployed to a Raspberry Pi, so it can be run locally. By running the machine learning model on the edge, this device will use less energy, less human labour, and cut down hardware cost overall. This concept can be further developed with more data variation, camera, and different cashier environment and lighting conditions to improve its accuracy in the real-world application.
STEPS:
Preparation:Prepare the Raspberry Pi 4, connect via ssh, install dependencies and Edge Impulse for Linux CLI. Follow this guide: https://docs.edgeimpulse.com/docs/development-platforms/officially- supported-cpu-gpu-targets/raspberry-pi-4
Data collection:For the image collection I took some pictures using the USB webcam using the Raspberry Pi that’s connected to the Edge Impulse Studio and some other pictures are taken from a smartphone camera. The position and orientation of the items are shifted between pictures to help the ML model recognize the object later in the process.
The photos are taken using a tripod so that the size of the objects placed on the table will not change too much when it’s captured as an image. (This is especially important for FOMO algorithm, FOMO doesn’t perform well with different object sizes). The total object input is 408 items which consists of 8 different objects (snacks).
Data labelling:Click on the labelling Bounding-Boxes method and choose Raspberry Pi 4 for latency calculations.
Then upload and then drag a box around an object and label it. Split or auto split all training and test data around 80/20.
Create impulse with 160x160 pixels and Grayscale parameter, and choose Image and Object Detection blocks. Choose FOMO (MobileNet V2 0.35) which will produce 8 layer outputs (cadbury, mentos, indomie, kitkat, etc.) In this example we achieved a pretty good accurace. After testing is done we can check the video stream via browser using edge-impulse-linux-runner. If the result perform as expected, then the model is ready to be deployed to the Raspberry Pi 4.
Deploy in Raspberry Pi 4 wth Python program, output to LCD 16 x 2:The Python program I created utilizes the eim file from the training result, which transforms the object input to prices and quantity of objects output. The program also displays the output in a LCD 16 x 2.
for res, img in runner.classifier(videoCaptureDeviceId):
if (next_frame > now()):
time.sleep((next_frame - now()) / 1000)
# print('classification runner response', res)
if "classification" in res["result"].keys():
print('Result (%d ms.) ' % (res['timing']['dsp'] + res['timing']['classification']), end='')
for label in labels:
score = res['result']['classification'][label]
print('%s: %.2f\t' % (label, score), end='')
print('', flush=True)
elif "bounding_boxes" in res["result"].keys():
print('Found %d bounding boxes (%d ms.)' % (len(res["result"]["bounding_boxes"]), res['timing']['dsp'] + res['timing']['classification']))
prices = {"cadbury_DM" : 1.1, "indomie_goreng" : 0.4, "kitkat" : 0.6, "kitkat_gold" : 0.8, "mentos" : 0.7, "milo_nuggets" : 1.0, "pocky_chocolate" : 1.2, "toblerone" : 2.0}; # set item price
total = 0
for bb in res["result"]["bounding_boxes"]:
print('\t%s (%.2f): x=%d y=%d w=%d h=%d' % (bb['label'], bb['value'], bb['x'], bb['y'], bb['width'], bb['height']))
img = cv2.rectangle(img, (bb['x'], bb['y']), (bb['x'] + bb['width'], bb['y'] + bb['height']), (255, 0, 0), 1)
total += prices[bb['label']] #set total price
print("Writing to display") # write to 16x2 LCD
display.lcd_display_string("Items: " + str(len(res["result"]["bounding_boxes"])), 1) # show total bounding boxes as items
display.lcd_display_string("Total: $" + "{:.2f}".format(total), 2) # show total price
if (show_camera):
cv2.imshow('edgeimpulse', cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
if cv2.waitKey(1) == ord('q'):
break
This project is an example of how embedded object detection can be the solution for a real-world problem. The Smart Cashier can identify objects, quantity of objects and the total price of objects almost instantly and running it locally using a single board computer.
Comments