Published August 30, 2020 © GPL3+

AutoVend

A vending machine with gesture controlled touch-free console using computer vision and TinyML with Raspberry-pi Zero W and Arduino Nano.

AdvancedFull instructions provided5 days1,219

Things used in this project

Hardware components

Arduino Nano R3

HC-05 Bluetooth Module

Adafruit Standard LCD - 16x2 White on Blue

Linear Regulator (7805)

9V battery (generic)

SparkFun Pushbutton switch 12mm

Jumper wires (generic)

Raspberry Pi Zero Wireless

Raspberry Pi Camera Module

11.1 Volt 3s 1100mAh LiPo Battery

we may change the battery specs according to our need.

DC/DC Converter, Buck Regulator

Software apps and online services

Arduino IDE

Jupyter Notebook

TensorFlow

OpenCV – Open Source Computer Vision Library OpenCV

Hand tools and fabrication machines

Breadboard, 830 Tie Points

Story

Why AutoVend?

Vending machine provides the user a diverse range of products like snacks, beverages, pizzas, cupcakes, soda etc. The interface of the machine includes a number pad where the user can select their desired product, enters product id and makes purchase. During the outbreak period of covid-19 most often touched button pad or console become contaminated, for such reasons we can't go for number pad interface for vending machine. That's why it's better to opt a gesture controlled system which can replace our regular number pad console.

How It Works?

The deep learning model is built using the datasets of some very common gestures like Palm, Okay, Peace, Fist & L. Kaggle dataset enriched with these gestures can be used for training and testing the model | source

Dateset of 5 Gestures

Rather than having a sequential model built from scratch, it is better to use pre-trained model, that's why I have used VGG-16 neural network. besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in our own models and applications. It is a heavy model to work with, but the prediction accuracy is very satisfying. My trained model can be found here. It is generated using Keras API with TensorFlow backend.

What Happens Inside

How Neural Network Works

What we see in the basic model layout, it has around 4 layers -> input layer, output layer, in between two hidden layers.

Information is fed into the input layer which transfers it to the hidden layer
The interconnections between the two layers assign weights to each input randomly (Weight is a numerical parameter which is multiplied to the input, mostly transforms the input to output)
A bias added to every input after weights are multiplied with them individually (Bias is just a numerical parameter added to adjust the output)
The weighted sum is transferred to the activation function
The activation function determines which nodes it should fire for feature extraction
The model applies an application function to the output layer to deliver the output
Weights are adjusted, and the output is back-propagated to minimize error

For VGG16, it is nothing but the stack of 16layers. On those layers different operations are done like: convolution, relu, max pooling etc. Source to find more about VGG-16 architecture and how it can be implemented.

VGG-16 architecture

Before Moving Ahead

As we will be using Raspberry-pi Zero W, we need to have the lite version of Tensor Flow to integrate TinyML in our project. This way, the backend processing will run as smooth as possible. That's why TF_lite model is necessary. My TensorFlow lite model can be found here. To know more about the conversion of TensorFlow lite model from TensorFlow model, we can go here.

I have used Jupyter Notebook to structure the codes and generate the model. For this reason, the dataset had to be stored locally. How I built, train, tune and test my model, also the dataset I used can be found here. Tried to make the notebooks as simple as possible denoting the code with useful comments :) .

Overview of the image processing task in four stages

First the camera starts to capture video which gets decomposed into frames. OpenCv helps to fetch one frame per iteration and does following operations:

When the frame is fetched, a smoothing filter is applied which uses OpenCv bilateral-filter function. Smoothing filter is applied to remove high spatial frequency noise from the frame.

To separate foreground from background to get the gestures only

Used OpenCv background_model_mog2 function which extracts moving objects(Gestures) from static background to get the foreground objects alone.
After creating background_model, morphological operation like erode is used by applying a [3*3] kernel with one iteration. In the output, small objects are removed so that only substantive objects remain. Then bit-wise 'AND' operation is done, Which only keeps relevant parts of output and other pixels become dark.

To get the binary image of extracted gestures

At first, the extracted image from background model is converted into gray.
Then Gaussian-Blur filter is applied which is used to reduce noise. A mask with a size which is generally three times the standard deviation, is chosen.
The gray image is converted into binary image by applying a certain threshold. The gray-tones or subspace of color space creates complexity in the classification. That's why it is better to use binary image.

To set up the target image for prediction

Here stack operation performs joining the sequence of image arrays along the new axis.
The target image needs to be resized, such the trained model can predict the image and doesn't conflict with image size.
Next and last step before going for prediction is that, the target image needs to be reshaped into 224*224*3 (width*height*color_channel_number).

The target image is then feed into predict_rgb_image function which returns prediction score and predicted class of the image.

Transition From Input to Output after a Series of Image Processing Operations.

Workflow of the System

System Flow

The Hardware

The hardware section is divided in two sections:

Capture stream, detect gesture, send product id to Arduino touch-free console via Bluetooth: done by Raspberry Pi Zero W.
Receive product id from Pi, dispatch the item to purchase: done by Arduino console.

In the hardware part, we will try to simulate only the gesture detection, display console and functionality of AutoVend. We will not focus on the mechanical part of it. Because, the mechanical parts will function as same as any vending machine.

To acquire the image frame from camera stream and detect gesture, we need to power up the Raspberry Pi Zero W. For this purpose, we may use 1100mAh 11.1 volt LiPo battery and convert 11.1volt to 5volt by buck converter.

Raspberry-pi Zero W with Camera Setup

Raspberry Pi with Pi Camera

Live Stream

Live stream from Raspberry-pi with VNC Viewer

Communication

We have an integrated Bluetooth module with our Raspberry-pi Zero W, Also, we can use external Bluetooth module and connect it via serial port. We will use it to communicate with Arduino. On the Arduino section, we have HC-05 Bluetooth module. To establish communication between them, we will use COM/Serial port, which listens for action of any external device trying to connect - like: here on COM8, the Bluetooth module integrated with Arduino is trying to connect with Raspberry-pi.

# Sample Code Snippet

serialPort = serial.Serial(port = "COM8", baudrate=9600,
bytesize=8, timeout=2, stopbits=serial.STOPBITS_ONE)

We need to maintain specific baud rate otherwise Bluetooth module integrated Arduino can't sync. We need the following library for this job.

import serial

Display

To simulate the job of purchasing, ordering and showing different prompts to user; a 16*2 lcd panel will be integrated with Arduino.

Toggle Switch

A toggle switch will be used to wake up the machine. This will make the whole camera process work from beginning.

Power

A 9v Battery will power the circuit, which will be converted into 5v using linear converter 7805. Then it will feed into the breadboard, which simulates the vending machine.

The touch-free console

Circuit Diagram

Final Look of Arduino Console

Final Circuit Assembly

How Are We going to Make Purchase

Let's assume this is the product mapping of AutoVend, where 11, 12, 13 ... ... 64, 65, 66 represents product id.

AutoVend product mapping

Now, we have these two gesture maps to select from

Gesture Map-1

Other Digits(Except from 1 to 3): When we want to select other digits, that are not available here, we need to go another gesture map. The last gesture pattern helps to break from this map.

Gesture Map-2

Other Digits(Except from 4 to 6): Works as same as described earlier. Purpose is to break from this map.

Trick: As we can see, all the digits are built up using a specific pattern consisting of three signs. If we need to choose a single digit number(like: 4), then we have to gesture all three signs. But, if we want to gesture multi-digit number(like: 45), then we don't have to gesture the palm sign both times. Only we need to gesture palm sign once, then we can gesture

L + Okay + Fist + Okay = Digit(45)

We want to purchase the product with Id-15

AutoVend in function

Using both maps we need to show following gestures sequentially before camera

Steps to Gesture Sequentially

Finally, we need to show gesture 'L'

Congratulations on Purchase

So the whole process to gesture:

Peace + L + Okay + Peace + Okay + Palm + Fist + Okay + Palm + Okay + L = Digit(15)

Let's Purchase Product with Id-15:

We will simulate gesture steps from Raspberry Pi and see how the touch-free Arduino console functions.

How it Works!

Conclusion:

I have used my local machine(Eg. Laptop) and Jupyter Notebook to train, test and evaluate the model. Finally, I have generated TensorFlow lite model. Real time gesture recognition is performed on raspberry pi zero w & the feed is seen on laptop with VNC Viewer Application. And the touch-free console is built with Arduino Nano.

With AutoVend, it is not only possible to automate the whole process of purchasing grocery items, but also can be a good example of how our regular gadgets can be smart enough to provide us safety against the Covid-19 pandemic :)