•

•

•

•

Published July 27, 2020 © GPL3+

CoffAI - Cafeteria with Artificial Intelligence

Self-scanning checkout for cafeteria.

AdvancedFull instructions provided1,272

CoffAI - Cafeteria with Artificial Intelligence

Things used in this project

Hardware components

ELMO

or any other camera

Raspberry Pi 4 Model B

Raspberry Pi Touch Display

or any other display

Software apps and online services

Kaggle

Dataset

TensorFlow

Training and using the detection model

Story

Problem

I think everyone - who has ever gone to some cafeteria - knows the following problem: At the buffet, you can usually quickly pick up the already prepared meals, whereas on the way to the checkout, especially at rush hours, you have to linger in a perceived queue. By the time you have paid, the food is often already cold.

In our opinion, this should be changed! The waiting time must be reduced to the minimum.

Several cashiers at several cash registers cannot be hired for cost reasons. As many queues as possible have to be created without additional employees, so that long queues are divided into smaller ones therefore the companies investments should not explode.

Idea

A possible solution for this problem is a self-scanning- checkout. Of course, these self-scanning-checkouts should be safe and user-friendly, which includes automatic scanning of the selected food and calculation of the price.

Another solution for this problem would be using QR-Codes. But we have encountered that they come with several problems. For example they are difficult to handle, because the employees would have to do additional work to prepare the QR-Codes. This also creates the risk that the customer could copy these tickets or the customer could take a QR-Code for a cheaper dish although he ordered an expensive meal.

As you can see this idea comes with many issues that's why we choose to set up a camera system that automatically detects the meals and calculates the price.

Therefore we would take a camera (such as an ELMO, which is a document projector) to take photos of the food tray. These images could be analyzed by an AI which will also calculate prices and collect money from visitors.

That’s why we would like to build an API System that can be used to run such a system. A customer could simply install our API on his hardware to create a self-scanning-checkout.

Our API provides many advantages:

the cashier process needs less time
more checkout points are be available
employees can devote themselves to other topics
larger queues are better distributed
higher customer satisfaction

Implementation

For our API system we grouped the meals into different categories. Depending on the following the price will be calculated.

Appetizers
Main dishes
Desserts
Salads
Side dishes
Drinks

Thereby the user of our API system has the opportunity to adjust the prices for each category individually.

Project diary

29.05.2020As we are working on the project, we found out that the pictures in our dataset must fulfill specific criteria. The object that shall be detected by the algorithm has to be surrounded by a bounding box including class name, x and y coordinates. Due to the number of categories, we figured out the amount of pictures - we would have to edit - comes close to 3000. In working hours this comes close approximately 20 hours. In order to save time we would like to automate this process or reduce the categories.

Possibilities to label images:

01.06.2020

In order to solve our picture labeling problem we decided to use the Visual Object Tagging Tool. Our decision was based on the fact that we would like to minimize the failing chance of our project. If we would use any automated pre-labeled picture set we couldn't control the input for our model. In other words: for quality reasons every member of our team had to label approximately 1000 pictures. Our assumption of 16 working hours for completing the labeling process was absolutely wrong. It took about 6 hours for 3000 pictures.

04.06.2020

In the last few days we were working on setting up the environment for training or teaching our model. For this purpose we have been following the steps explained on this page.

During this process we encountered many problems and the problem solving process took a few hours in total. The most difficult problems are shown here:

#1 Pycocotools

Since pycocotools is only developed for Linux, we've needed a workaround to install it on Windows. We found a solution at this website.

#2 Pathlib

A simple mistake was that pathlib was not defined, so we only had to import the pathlib as described here.

#3 Load function

Since the load function is not working with given arguments, we had to change it to load_v2 as you can see from this link.The link also describes another problem we had - it was solved by enabling eager execution.

09.06.2020

After setting up everything, we started to train our model today. For this purpose we've decided to use a pre-trained Faster Rcnn Inception V2 model. Since a little test run couldn't be a bad idea, we took about 50 labeled pictures for each category with about 10.000 steps for training.

Figure 1

16.06.2020

After completing the 10000 steps yesterday, we have evaluated our model today.

Figure 2

As expected our model is not working perfectly. But first steps in the right direction were made. Some categories are evaluated well for example appetizers, desserts, main dishes (mainly if pasta involved) and drinks (only working in certain perspectives).

Figure 3

Figure 4

Figure 5

As you can see in the 3rd figure, a minor problem is that sometimes categories are not detected. Just like the salad.

Another problem occurred as shown in the 5th figure. Some categories are mixed up - just like side dishes and main dishes. We assume that our training data wasn't accurate enough to differentiate between both categories.

While labeling the training data, we already thought that this problem could occur, because some categories in the data set had high similarity.

To solve this issue we will train the model with a more specific data set. While labeling the new data set, we will set harder boundaries between the categories and we will try to avoid intersections.

19.06.2020

After relabeling all 3000 images we started to train our model from ground up. During this process our model ran into some errors that occurred multiple times and crashed the training at different steps. Since the error didn't always occur at the same step, we've been able to continue the training from the last checkpoint.

The Error that showed up was "Invalid Argument: Loss Tensor is inf or nan. : Tensor had NaN values"

21.06.2020

Figure 6

As you can see in figure 6, our model worked very well at first glance. Nonetheless as we were testing with other pictures, the output wasn't as good as this one.

Compared to our first model the category boundaries of our training data were more precise. That led us to the new model, which is able to attach the right categories to the food. Our biggest problem is that some dishes aren't detected. We expect that this problem will be solved by continuing the training further.

For correctness: Figure 7 shows a food tray which won't be supported in the final product. Our API will only detect food if a plate is placed right under the camera. In this case we will cover the majority of European cafeterias. An implementation for American styled trays could be a future side project.

Figure 7

To run more steps in the same time, we have installed the Tensorflow-Object-Detection and all dependencies on a computer with Nvidia GPU, because Tensorflow is using CUDA and cuDNN and they are only working with Nvidia GPU. Otherwise only the CPU is used which is in our case round about 12 times slower than the GPU.

Figure 8

Since the actual version of CUDA was installed - which is not compatible with Tensorflow - an error occured, which told us that the cudart64_100.dll library cannot be loaded. We solved this issue by completing the steps that are shown by this link.

22.06.2020 / 23.06.2020

On these two days we have been checking for errors by testing with a smaller number of TFRecordFiles. We don't exactly know, why those files failed. After doing researches, we found out that the following points could lead into problems:

xmax < xmin
bounding boxes reaching out of image borders
polygons
overlaps
double tagged boxes
too small bounding boxes (must cover more than 1% of the picture area)

Since the first two points weren't possible to produce with our tagging tool, we can exclude those problems. We used polygons and overlaps in files that were running correctly. This made us believing that this is also not the major error. During overlooking our files we noticed that some boxes were accidentally tagged twice. We have expected it to be the error but fixing this problem didn't prevent the error message from showing up again, even though we believe that this was one of the problems causing a error message.

Because of all the problems we have mentioned above, we relabeled the affected pictures and restarted the training.

Finally we have been able to give the training more steps.

24.06.2020

After running the training for 48.000 steps, we have been able to evaluate our model again. Additionally we have added the function which will calculate the price for the meals detected on the tray.

Figure 9

As you can see in figure 9, some categories are labelled with a higher score now.

Figure 10

But we have ran into problems again, as you can see in figure 10: the burger is labelled as a dessert, which is obviously wrong.

Due to this problem we have contacted our local cafeteria to ask if they could provide us a better data set. The data will be from their homepage which shows daily pictures of the food offered for the corresponding day. We expected them to have an archive with all pictures for the last few years. As we don't know if they will respond, we go on in training our model with the actual data set.

27.06.2020

The last few days were spend on implementing an interface for admins and users of our API/System. In order to make a simple system, which can be used by almost everybody we decided to use a HTML Frontend.

It works the following way:

The camera of our system will save the taken images in a specific folder which currently must be on the local disk. By a automated trigger, the taken image will be recognized in the folder. In the next step a batch file will safe the path of our image to a python file. This specific python file will trigger the scanning process of our image data. As return values a labeled picture and a text file is saved to another local folder, which can accessed by our html to display the result.

30.06.2020

Figure 11 shows our front-end for admins. Here you can customize the prices which then will be used to calculate the detected foods on the tray.

Figure 11

As explained on 27.06.2020 the first prototype of the html frontend is finished. Figure 12 shows how the selected image will be displayed. This way the user got a chance to identify its personal tray. We also connected front and backend in order to send the data from html to our python file. The data will be analyzed as expected and the return files are saved locally.

In the next few days we will work on the frontend to display the prices and the labeled image.

Figure 12

05.07.2020

Our local cafeteria was replying to our request about 4 days ago. After some texting back and forth we got some picture we could use exclusively for the purpose of this project. We are very happy to get such a support. Nonetheless we only received about 24 pictures which aren't enough to retrain our model. That's why we only use them for testing purpose and for our hackster.io blog.

Still we are very thankful for them to reply and granting access to use their images.

09.07.2020

The last few days we completely focused on the frontend to display all necessary info to the user. This way the user is able to control whether the correct tray was scanned and is able to check the calculated pricing for the food. Our Project slowly comes to an end by now. Everything is implemented and works pretty fine. Our goal is to completely finish the project. In order to reach this goal we need to create a read.me file for buyers of our system. This way they got the chance to set up their very own system without the help of our team.

13.07.2020

Today we were able to finish our project. All requirements are fulfilled and the system is running on every windows server/pc. Additionally we have continued the training and reached about 180.000 training steps. All in all we are very proud to recognize what was accomplished in the last few weeks.

Finished Product

Figure 13

Here you can see the workflow of our AI. Due to the current corona situation we weren't able to test the system with a real ELMO. But in order to show the whole process we used some shortcuts - such as placing the taken image into the folder by hand.

Explanation of the content in the video:

Settings

Only admins are able to change the pricing of food - by accessing the settings page they can change currency and values. With saving their settings these prices will be applied to calculate every tray.

Database

If you want, you can go to the database features. There you can create new customers, get / recharge credit or withdraw money from credit and get the tables of the database.

Run AI

To automatically detect a new picture in the upload directory, you only have to run the batch file once when booting the system.

In order to run the AI, the taken picture of our tray must be saved locally in the upload directory. In reality, this process is carried out by a camera (e.g. by an ELMO). The trigger, which was created with the batch file, will automatically detect every new picture in this specific directory and will run our AI to scan the tray.

The analyzed picture will be displayed on our website as well as the calculated price of the tray. Users are now able to review the quality and accuracy of the labeling / pricing and can now pay for the food with their credit.

The labeled images and associated bills are saved in directories.

This means that all customer information and activities will be documented, which ensures a good overview, so that the data can be read in easily by administrators at any time.

Qualitynotice:

As you can see in the three following figures our API works properly, but we want to point out clearly that these results are only granted when using high quality pictures. If you ever try to use our project for personal reasons, then keep in mind that not every image from e.g. Google pictures is working properly.

Figure 14

Figure 15

Figure 16

How we sell our product to customers

Figure 17: Packages & Pricing

Our selling strategy mainly focusses on selling different packages with various services. This way the custom can choose what's the best for its use case.

Some customers prefer to order only AI Access and implement the code into their own hardware. The so-called "Start Package" is the least expensive product they can buy. With increasing price the customer got the chance to buy several additional services and hardware to deliver the best results.

Description of services:

Full AI Access: Software application that uses an AI to detect food on a tray

Online Support: Customer Help Desk for troubleshooting - accessible via Mail/Chat

necessary Hardware: Raspberry Pi, ELMO & Display to run the system independently

guarantee of working: we will implement and test the Hardware/System before shipping

Updates: AI model update via internet to deliver even more precise results in future

on side Support: Our Team will setup the whole system at your canteen