Vending machine provides the user a diverse range of products like snacks, beverages, pizzas, cupcakes, soda etc. The interface of the machine includes a number pad where the user can select their desired product, enters product id and makes purchase. During the outbreak period of covid-19 most often touched button pad or console become contaminated, for such reasons we can't go for number pad interface for vending machine. That's why it's better to opt a gesture controlled system which can replace our regular number pad console.
How It Works?The deep learning model is built using the datasets of some very common gestures like Palm, Okay, Peace, Fist & L. Kaggle dataset enriched with these gestures can be used for training and testing the model | source
Rather than having a sequential model built from scratch, it is better to use pre-trained model, that's why I have used VGG-16 neural network. besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in our own models and applications. It is a heavy model to work with, but the prediction accuracy is very satisfying. My trained model can be found here. It is generated using Keras API with TensorFlow backend.
What Happens InsideWhat we see in the basic model layout, it has around 4 layers -> input layer, output layer, in between two hidden layers.
- Information is fed into the input layer which transfers it to the hidden layer
- The interconnections between the two layers assign weights to each input randomly (Weight is a numerical parameter which is multiplied to the input, mostly transforms the input to output)
- A bias added to every input after weights are multiplied with them individually (Bias is just a numerical parameter added to adjust the output)
- The weighted sum is transferred to the activation function
- The activation function determines which nodes it should fire for feature extraction
- The model applies an application function to the output layer to deliver the output
- Weights are adjusted, and the output is back-propagated to minimize error
For VGG16, it is nothing but the stack of 16layers. On those layers different operations are done like: convolution, relu, max pooling etc. Source to find more about VGG-16 architecture and how it can be implemented.
As we will be using Raspberry-pi Zero W, we need to have the lite version of Tensor Flow to integrate TinyML in our project. This way, the backend processing will run as smooth as possible. That's why TF_lite model is necessary. My TensorFlow lite model can be found here. To know more about the conversion of TensorFlow lite model from TensorFlow model, we can go here.
I have used Jupyter Notebook to structure the codes and generate the model. For this reason, the dataset had to be stored locally. How I built, train, tune and test my model, also the dataset I used can be found here. Tried to make the notebooks as simple as possible denoting the code with useful comments :) .
Overview of the image processing task in four stagesFirst the camera starts to capture video which gets decomposed into frames. OpenCv helps to fetch one frame per iteration and does following operations:
- When the frame is fetched, a smoothing filter is applied which uses OpenCv bilateral-filter function. Smoothing filter is applied to remove high spatial frequency noise from the frame.
To separate foreground from background to get the gestures only
- Used OpenCv background_model_mog2 function which extracts moving objects(Gestures) from static background to get the foreground objects alone.
- After creating background_model, morphological operation like erode is used by applying a [3*3] kernel with one iteration. In the output, small objects are removed so that only substantive objects remain. Then bit-wise 'AND' operation is done, Which only keeps relevant parts of output and other pixels become dark.
To get the binary image of extracted gestures
- At first, the extracted image from background model is converted into gray.
- Then Gaussian-Blur filter is applied which is used to reduce noise. A mask with a size which is generally three times the standard deviation, is chosen.
- The gray image is converted into binary image by applying a certain threshold. The gray-tones or subspace of color space creates complexity in the classification. That's why it is better to use binary image.
To set up the target image for prediction
- Here stack operation performs joining the sequence of image arrays along the new axis.
- The target image needs to be resized, such the trained model can predict the image and doesn't conflict with image size.
- Next and last step before going for prediction is that, the target image needs to be reshaped into 224*224*3 (width*height*color_channel_number).
The target image is then feed into predict_rgb_image function which returns prediction score and predicted class of the image.
The hardware section is divided in two sections:
- Capture stream, detect gesture, send product id to Arduino touch-free console via Bluetooth: done by Raspberry Pi Zero W.
- Receive product id from Pi, dispatch the item to purchase: done by Arduino console.
In the hardware part, we will try to simulate only the gesture detection, display console and functionality of AutoVend. We will not focus on the mechanical part of it. Because, the mechanical parts will function as same as any vending machine.
To acquire the image frame from camera stream and detect gesture, we need to power up the Raspberry Pi Zero W. For this purpose, we may use 1100mAh 11.1 volt LiPo battery and convert 11.1volt to 5volt by buck converter.
Raspberry-pi Zero W with Camera Setup
Live Stream
Communication
We have an integrated Bluetooth module with our Raspberry-pi Zero W, Also, we can use external Bluetooth module and connect it via serial port. We will use it to communicate with Arduino. On the Arduino section, we have HC-05 Bluetooth module. To establish communication between them, we will use COM/Serial port, which listens for action of any external device trying to connect - like: here on COM8, the Bluetooth module
integrated with
Arduino
is trying to connect with Raspberry-pi.
# Sample Code Snippet
serialPort = serial.Serial(port = "COM8", baudrate=9600,
bytesize=8, timeout=2, stopbits=serial.STOPBITS_ONE)
We need to maintain specific baud rate otherwise Bluetooth module integrated Arduino can't sync. We need the following library for this job.
import serial
Display
To simulate the job of purchasing, ordering and showing different prompts to user; a 16*2 lcd panel will be integrated with Arduino.
Toggle Switch
A toggle switch will be used to wake up the machine. This will make the whole camera process work from beginning.
Power
A 9v Battery will power the circuit, which will be converted into 5v using linear converter 7805. Then it will feed into the breadboard, which simulates the vending machine.
The touch-free console
Let's assume this is the product mapping of AutoVend, where 11, 12, 13 ... ... 64, 65, 66 represents product id.
Now, we have these two gesture maps to select from
Other Digits(Except from 1 to 3): When we want to select other digits, that are not available here, we need to go another gesture map. The last gesture pattern helps to break from this map.
Other Digits(Except from 4 to 6): Works as same as described earlier. Purpose is to break from this map.
Trick: As we can see, all the digits are built up using a specific pattern consisting of three signs. If we need to choose a single digit number(like: 4), then we have to gesture all three signs. But, if we want to gesture multi-digit number(like: 45), then we don't have to gesture the palm sign both times. Only we need to gesture palm sign once, then we can gesture
L + Okay + Fist + Okay = Digit(45)We want to purchase the product with Id-15
Using both maps we need to show following gestures sequentially before camera
Finally, we need to show gesture 'L'
Peace + L + Okay + Peace + Okay + Palm + Fist + Okay + Palm + Okay + L = Digit(15)
Let's Purchase Product with Id-15:We will simulate gesture steps from Raspberry Pi and see how the touch-free Arduino console functions.
Conclusion:I have used my local machine(Eg. Laptop) and Jupyter Notebook to train, test and evaluate the model. Finally, I have generated TensorFlow lite model. Real time gesture recognition is performed on raspberry pi zero w & the feed is seen on laptop with VNC Viewer Application. And the touch-free console is built with Arduino Nano.
With AutoVend, it is not only possible to automate the whole process of purchasing grocery items, but also can be a good example of how our regular gadgets can be smart enough to provide us safety against the Covid-19 pandemic :)
Comments