Hi Hacksters! Allow myself to introduce.. myself.
I mentioned it briefly in my introduction video but my idea for this project came a few weeks ago when I moved into my new house (instead of the apartment I've lived in for 6 years). I finally have my own back yard with a clothes line, and I thought it would be a great test of AWS machine learning and recognition, along with lambda and Arduino to see if I can get something working where my code can tell just from a photo if I left the washing on my clothes line.
Why?
Well, I live in Australia, we get a lot of spiders at night.. and I don't want spiders in my clothes. So the ability to detect whether I left clothes on the line is a useful one!
I've actually initially made this code runnable on a schedule or a cronjob, so you can be reminded of your clothes on the line at the end of the day, but this project required a little Alexa, so I've decided to make Alexa query my camera and send the image to rekognition, determine the result then tell me right away if there's anything on the line.
DesignSoftwareThis is the breakdown of the design:
I'll walk through it here just in case my graph is as bad as I think it is:
- The user asks Alexa to check the washing
- The Alexa skill calls its own lambda function
- The Alexa lambda function calls my custom lambda function to do the work
- The custom Alexa function polls the Arduino Feather Huzzah for an image
- Once the lambda function receives the image, the lambda function sends the image to AWS Rekognition, which returns a list of labels
- The Lambda function compares the labels returned to our existing list of labels to determine if anything new was found in the image
- The response is returned to the Alexa lambda function
- Alexa responds to the user with the result
Pretty simple, eh?
Now i'll talk a little about what I'll build it with.
HardwareFirst up is the awesome Adafruit Feather HUZZAH with ESP8266 WiFi:
This thing is tiny (weighs only 9.7 grams!), has built in WiFi and is generally awesome. And they're only $16.95!
Next up I need a camera, so I went with one I found locally called the ArduCam OV5642 with a 5 megapixel lens. It looks like this:
In true proof of concept style, I didn't want to solder anything for this project, so I also had a breadboard, a power source (MicroUSB cable) and some jumper wires. The camera came with 8 female to male, and I used 2 more male to male with the breadboard to power the device.
Wiring was relatively straight forward, using the 3rd column as we're using the ESP8266 GPIO.
It's not fabulous, but all wired up it looks like this:
I actually had a problem with this wiring and had to move the red and brown power wires a lot closer to the board to make the images work properly, i'll talk a bit more about that in a moment.
So, that's pretty much all the hardware we need!
ImplementationArduino cameraFirst up we need our camera to be taking photos, and a way for us to access these photos from the lambda function.
The ArduCam comes with some sample projects, and I ended up using one of these to achieve exactly what I needed, which was just a URL endpoint to grab the image. I won't paste the code here, as it's available right here on github.
The only configuration required was to get the device on my own WiFi, and to select the default image size once booted (instead of having to change it in the html web UI they provide, which uses some javascript to achieve the same).
This was as simple as changing:
myCAM.OV5642_set_JPEG_size(OV5642_320x240);
to:
myCAM.OV5642_set_JPEG_size(OV5642_1600x1200);
And for WiFi I simply populated these:
const char* ssid = "SSID"; // Put your SSID here
const char* password = "PASSWORD"; // Put your PASSWORD here
And assuming your wiring is correct and the stars align, you should be able to compile and run your Arduino program! This connects to my wifi and makes images available on a URL, lets take a look at http://192.168.0.116/capture:
Hurrah! (or Huzzah!) it works! that looks like.. me.. kinda upside down, but it works!
Next up, lets point it at the clothes line.
Looks good! Now I did have one problem here, where I couldn't pull any images over 640x480 without them cutting off. After a visit to the Arducam github I found out that it was most likely the wires on my breadboard being too long. I moved them up next to where the camera was plugged in and it pretty much resolved my issue. Yay! So my final wiring and installed position looked something like this:
And a quick visit to http://192.168.0.116/capture after a swim in my pool looks like this:
And just to confirm i'm not cheating, both in the same pic:
Alright, so the camera is working, woohoo!
Lambda & RekognitionThe next step is to write our Lambda code. I'll give you a warning straight up: I'm not a programmer, I just keep poking things until they work. (I'm a Sysadmin with a Computer Science degree so I can usually figure something out if I need to).
First up for anyone who doesn't know: Lambda is a part of Amazon Web Services where you can upload your programming code, and it will run it for you. You don't have to run your own servers or anything, it just runs your code for you. It's pretty awesome and basically the basis of serverless computing. It's also SUPER cheap, allowing you to run 1 million requests per month for free, which we will never exceed.
Next up is Amazon Rekognition. Rekognition is a Deep learning-based image and video analysis. It's pretty simple to use, you upload an image to it and it gives you back a list of things it thinks it found in that image. The first thing we need to do is to train it.
Training the detectionSo as I just mentioned, Rekognition returns a list of items in your image. To train my model, we want to feed it lots of pictures of the backyard and clothes line with NO clothes on it. This will give us a large list of items that are normally there in our image, and when something new is introduced into the image then we REALLY know something that isn't normally there has appeared.
You may ask "why don't you just detect clothes?". Well, Rekognition isn't that great at detecting clothes just yet, but it's great at telling me if something new is in the image compared to what's normally there.
So, as I said I fed it a lot of images, and this is the list of items it recognised in about 10-15 pictures of my backyard:
'Backyard', 'Outdoors', 'Yard', 'Bench', 'Park Bench', 'Flora', 'Plant', 'Tree', 'Park', 'Forest', 'Grove', 'Land', 'Nature', 'Vegetation', 'Pond', 'Water', 'Blossom', 'Path', 'Pavement', 'Cherry Blossom', 'Flora', 'Flower', 'Plant', 'Fence', 'Lilac', 'Hedge', 'Harbor', 'Port', 'Waterfront', 'Nature', 'Conifer', 'Urban', 'Flower Arrangement', 'Grove', 'Wilderness', 'Ornament', 'Jar', 'Potted Plant', 'Pottery', 'Vase', 'Vine', 'Sidewalk', 'Ivy', 'Yew', 'Oak', 'Sycamore', 'Moss', 'Grass', 'Lupin', 'Resort', 'Hotel', 'Building', 'Flagstone', 'Bonsai', 'Tarmac', 'Walkway', 'Trail', 'Office Building', 'Aisle', 'Indoors', 'Human', 'People', 'Person', 'Intersection', 'Road', 'Garden', 'Architecture', 'Asphalt', 'Soil', 'Patio', 'Alley', 'Alleyway'
That's quite a list! there's some strange ones, but if that's what rekognition is detecting, who am I to argue? You'll see how I use this in the code shortly.
LambdaAlright, it's time for the main meaty code that makes this thing work. The repository is here if you want to see the file in its entirety.
I'll split it up a bit and explain the important parts.
At the start of the file we define a few variables, and we create an array with all of the items found above in a variable called knownItems. Nothing too obscure here.
Then we grab the image from my camera. I do have a public URL where a camera image can be directly accessed by lambda, but I don't want to advertise it here. Instead, I uploaded an image to imgur with some clothes on my line, so I define it as the URL to test:
var url = 'https://i.imgur.com/cUGEYbl.jpg';
Which looks like:
Then we use request, a simplified HTTP request client that makes HTTP processing reallllly easy in nodejs, and we call request.get on the url and store the result in 'body'.
request.get(url, function(error, response, body) {
Then if there's no error, we begin to set up the parameters to send the file straight to rekognition. We specify that the image is from the body parameter, and the maximum amount of labels that we want rekognition to send back to us to be 20. We also specify the minimum confidence required to be 50. This refers to a rating system that rekognition gives to each label that it finds, if it's really sure that it found something it will have a confidence of 80-100, below that it's a little unsure.. and below 50 it's really just guessing, so I'm happy to only get the results above 50.
if (!error && response.statusCode == 200) {
var params = {
Image: {
Bytes: body
},
MaxLabels: 20,
MinConfidence: 50.0
};
Next, we send our image to rekognition's "detectLabels" function , passing in those params we just defined.
rekognition.detectLabels(params, function(err, data) {
Rekognition then returns all the labels it finds, and we begin to iterate through them. The method i'm using here is to loop through all the labels that were returned, and then loop through my knownItems to see if any of them match. If they do, set a "found" flag to 1. If the found flag is still 0 once it has looped through all our items, then we obviously have found a label that doesn't exist in our knownItems. We now add it to a list of items for Alexa to read out, and set a "notify" flag to 1, recording the fact that we found at least one item for Alexa to say.
var labels = data.Labels;
var key = 'Name';
for (key in labels) {
found = 0;
// loop through knownItems to see if anything matches
for (var x = 0; x < knownItems.length; x++) {
// if something matches, lets record it with the found variable
if (knownItems[x] == labels[key].Name) {
found = 1;
}
}
// if we get here, the loop had no matching knownItem for the key we're checking. this means something was found.
if (found == 0) {
if (notify == 1) // formatting
items += ',';
// record what was found so we can email it to the user
items += " " + labels[key].Name;
// confirm we're going to send a notification
notify = 1;
}
}
And then if that notify flag is set to 1, let Alexa know what to say. Firstly we replace the last comma with an "and" so it sounds nice, then return the list of items on the line, otherwise say we didn't find anything.
You'll also notice here that I'm using c.succeed. This is a lambda function which returns this result to the lambda function that called it. More on this in a moment.
if (notify == 1) {
// replace last comma with 'and' to make it sound nice.
items = items.replace(/,\s([^,]+)$/, ' and $1');
c.succeed("Yes you have something on your clothes line! I found" + items);
} else {
c.succeed("I didn't find any washing on the line");
}
and that's it for my lambda function! all up, it was 113 lines. That we can achieve so much, image recognition, detection of objects, etc in so little amount of code is just awesome and shows the true power of AWS.
The Alexa appThere's quite a lot that goes into making an Alexa app, and I'll leave that tutorial for another time. Here I'll just show you what I need to call the lambda function I just spoke about.
var lambda = new AWS.Lambda();
var params = {
FunctionName: 'bringinthewashing',
InvocationType: 'RequestResponse',
LogType: 'Tail',
Payload: ''
};
lambda.invoke(params, function(err, data) {
if (err) {
console.log(err);
} else {
console.log(data.Payload);
emit(':tell', data.Payload);
}
});
In here I define the lambda variable then define my parameters. The parameters specify the lambda function I'm going to call and I set the InvocationType to "RequestResponse", meaning I want a response from the lambda function I'm calling. This where the c.succeed in the previous function comes into play, this is the bringinthewashing function sending a response to this Alexa lambda function.
I then invoke the lambda function. If there's an error, I print it to the console, otherwise I log the response to the console, and also call my emit function, which tells Alexa what to say back to the user.
And that's pretty much it!
DemoVideos are worth a billion words, so here it is in action!
Pretty cool hey? Sorry about the jump cut but I promise the demo worked as shown, I just cut out some unnecessary silences!
This is the actual image that got processed by rekognition:
And if you feed this into the rekognition console yourself, you can see the three labels that are detected:
Why didn't Alexa mention "Outdoors" or "Yard"? because they were already part of my item list of things regularly detected in my empty backyard.
IssuesI like to mention issues I encountered while developing just in case anyone encounters something similar
- 1. The first was getting the USB serial cable working in order to upload code to the arduino device. It was throwing this error:
warning: espcomm_sync failed
error: espcomm_open failed
error: espcomm_upload_mem failed
MacOS High Sierra had some major issues with this. The solution was found here, where I had to install a different esptool binary.
- 2. As I mentioned earlier, the issue with downloading large images from the Arduino device. It seems the power to the camera is the main culprit here, and directly soldering the camera to the device or using shorter jumper cables would help in this regard.
- 3. Speed. Again, this is down to the Arduino device. Downloading an image from the device can take 5-10 seconds sometimes, which really slows down the alexa processing and response. The main way of getting around this that I found was creating a scheduled (cron) job to pull the image from the camera every 5 minutes, upload it to Amazon's Simple Storage Service (S3), and then simply get the Alexa skill to query the stored image instead of the Arduino device itself. This returned near instant responses from Alexa, but I felt keeping it real time would be a bit more "real" for the demo I recorded.
This project was a lot of fun for me, and my first foray into image recognition and machine learning! I had also never used lambda this much before, especially with the RequestResponse model. I could have put all of the image downloading and calls to rekognition straight in the Alexa code, but I felt it was neater to split up the functionality.
I hope you learned something, and please let me know if you have any questions!
- Nick
Comments