This project uses the Soracom Wio LTE running an Arduino sketch to send data to AWS IoT core. There are prerequisites to complete before we can connect and program our cellular device and send data to AWS IoT core. Please follow the following guide and videos in the listed order.
Step 1Next we will use the rules query statement to ingest our complete incoming data package under your topic name. Remember you can delimit any field you like or set up conditional ingestion but for most purposes its useful to grab all the IoT Data and then parse it later as needed., Arduino demo LED sketch, and the Wio LTE DFU and enable button sequences necessary to proceed.
https://docs.google.com/document/d/1gkRrCF4lvVy_spg2ew1EJV8t7RgtL6Es6zZQoezIZXM/edit?ts=5caa6fe0
Step 2This is my video on creating device certificates and a IoT policy on AWS IoT core. You need to do this to complete the project and connect through Soracom Beam. Almost all embedded IoT devices, connecting through MQTT+TLS1.2, on AWS will need to use this security device creation and IoT policy process.
https://www.youtube.com/watch?v=sdgXt7Sq2dM&t=34s
Step 3
This is my video on customizing the provided Soracom Arduino sketch examples based of Soracoms "MQTT client" example and "Groove DHT" example to connect your board to AWS IoT Core. I have included two modified Arduino Sketchs to help you connect to AWS if you have the Groove DHT11 sensor or if you do not. The sketch without needing a sensor generates random, fake sensor data and dispatches it via MQTT.publish.
The video also shows how to set up the Soracom Beam application from the Soracom.io dashboard. The Beam service is needed as a back-end intermediary to process our incoming MQTT data on port 1883 and transform it to outgoing MQTT+TLS1.2 on port 8883. We must paste in our three newly created security certificates from AWS IoT to Soracom Beam, and then provide Beam with our custom MQTT endpoint on AWS IoT.
The Arduino Sketches are included in this tutorial but can also be found on this link to my AWS IoT Github
This Arduino skecth dispatches DHT11 JSON sensor values to Soracom Beam to process and dispatch to AWS IoT Core.
https://github.com/sborsay/AWS-IoT/blob/master/mqtt_dht11.ino
This Sketch functions without and sensor and generates random values.
https://github.com/sborsay/AWS-IoT/blob/master/mqtt-client2.ino
The rest of this project will take place on the AWS cloud platform and will involve analyzing and visualizing near real time sensor data using inexpensive AWS services. Hopefully this initial project and associated videos were helpful to you in beginning your own projects and prototyping, using the cool Soracom Wio LTE development board with the Arduino IDE and AWS IoT.
Now that our data is on AWS through the IoT gateway we have all kinds of options on how to store and retrieve our data. I've worked extensively with AWS IoT over the last few years and I have some strong opinions on the best methods depending on the clients needs. However, most IoT data analytics options revolve around either extracting and analyzing data from a database or ingesting and parsing data from a data lake. Both have their uses, but many of the final architectural decisions are going to be contingent on whether the AWS service meets your performance requirements, and which services are the most inexpensive over time. Lets' review just some of the many options for handling our IoT data transmitted from our Soracom Device.
1. ElasticSearch with Kibana: Somewhat of a one-off for IoT but the service is well integrated from AWS IoT Core and is easy to set up outside of composing the index table over cURL. Kibana is free but this service tends to be a more expensive “non-serverless” solution because the service uses a VM "instance" while running.
2. Kinesis Firehose to S3: A fairly common data lake solution but it has some built in complications. The output is not correct JSON so the data will need to be transformed via lambda or extracted using a non-JSON ingestion or parsing method. The biggest limitation in my experience is the Firehose data dump to the S3 data partition is time or size limited. Thus, for data streams over time you are going to have your data split over multiple partitions within a S3 bucket which then forces the client to use AWS Glue with AWS Athena to crawl, classify the schema, and amalgamate the data back to a single partition for a usable data lake. If we don't use this more expensive technique then we experience another problem as we don't know what the name of our data partition object before it is written to S3. While we can designate a prefix, our exact, timestamped S3 object name will only be known after the partition is written. This is problematic when retrieving data with hard-coded methods.
3. Lambda to S3: We can use AWS lambda to keep records of our data in an array, then concatenate the array, and then dump the data into a single partition on S3 at any desired interval. This will incur many wake ups with “cold start “ issues but it is a clever workaround if you want to avoid a database or data stream duration or data size limits. An excellent example of this technique can be found in the python lambda function demonstrated here: https://itywik.org/2018/04/17/build-a-5-iot-thing-with-esp8266-mongoose-os-and-aws-iot-core/
4. DynamoDB to S3: via AWS Data Pipeline, API Gateway, and/or Lambda. DynamoDB has generous quotas on the AWS free tier and its integrated directly with IoT core. Making a simple data table is fairly trivial. From DynamoDB we can extract all IoT Data held in the database via Data Pipeline directly to S3. From there we can create a manifest and import it into AWS QuickSight for visualizations. These days I usually prefer working with a Database for IoT over a data lake and DynamoDB is my database of choice for serverless models.
5. SQS with XML Parser using AJAX requests: An inexpensive technique I've developed with but SQS fifo ques are not available for IoT Core and composing an XML parser to extract data is not an optimal solution for time critical or fast streaming data applications. A positive factor is that it does tend to be a very cheap solution.
We also always have the option of hosting a dynamic website on EC2 and extracting any information we want directly from DynamoDB or S3. This “non-serverless” solution is more expensive and non-trivial to develop but offers complete flexibility. For this Soracom project I'm going to focus on a relatively new AWS service I've been extremely impressed with: IoT Analytics. IoT Analytics offers several huge advantages over traditional methods. First its an "ad-hoc” managed solution with very little overhead to set up and get started with. The extremely flat learning curve has helped my students on Udemy become productive very quickly with the normally difficult task of near real-time data analysis on AWS. It is also the first AWS IoT solution that facilitates quick analysis and visualization so that I no longer have to direct new students to easy to use, but limited IoT centric third-party "cloudish" services like Losant or ThingsSpeak.
Specifically, for this project we will use AWS IoT Analytics, with Lambda for enrichment, and built-in S3 data delivery to place the data into a S3 data partition with a determinable name. Using the object URL of our data in S3 we will then code a simple JavaScript static website, also hosted on S3, to parse our data using Regular Expressions to place our variables into arrays, and then use Google Charts to visualize our environmental data. It should be noted that this method is totally flexible. Other free CDN based packages like D3, Plotly, or Chart.js can also easily be implemented instead of Google Charts. Also, I will briefly show how to use the built in IoT Analytics connector in AWS QuickSight to visualize data if this “code-free” method is preferred for business analytics. I will also link a video showing how to preform analysis and visualizations with AWS SageMaker in Python utilizing a data set produced from IoT Analytics.
A key benefit of IoT Analytics is how well it integrates with Lambda for data enrichment. I will demonstrate a very simple lambda that we can implement in the IoT Analytics Pipeline which demonstrates useful data enrichment. I chose to demonstrate a simple lambda function in Node because my function doesn't require rolling up and uploading any Node.js dependent modules, and can be completed directly from the Lambda editor.
What I'm most excited about in IoT Analytics is a new feature that was just added may of 2019 designated as "Data Set content delivery to S3". This announcement went under the radar but it is a HUGE deal... Why? Well it has always been problematic to use S3 with IoT streaming data because S3 is not a file store, it is a data partition which is written to in one shot,. Thus we cant concatenate or append data without writing a completely new object in S3. While this limitation hasn't changed, by using IoT analytics we have a non-database solution to write our data set, indexed to any duration, and written to a known S3 partition. The fact that we can specify a name for our S3 partition before coding is a useful advantage over most other data lake techniques. Specifically, by allowing us to designate a defined name, and thus a known object URL from S3, we can then ingest that data via our code in a static or dynamic instance without having to use a AWS Glue generated schema, index crawling, and queries with AWS Athena to categorize our data back to a S3 partition.
Getting Started on AWS
Hands on with AWS IoT Analytics
To Start, let's use the new “one-click” creation for IoT Analytics directly from the AWS IoT Core, as it is a built in service. First go to IoT Core; from there click on the "Act" tab on the column on the left.
Once in the Actions menu in the upper right on your screen click on "Create". This creates a new action to implement on our incoming IoT data from our Soracom cellular device. Give your new action any name you like. Here I call my AWS action the unforgettably, catchy moniker of Analy530.
Next we will use the rules query statement to ingest our incoming data package under a user provided topic name. Remember you can delimit your incoming data to any field you like, or set up conditional processing based on the incoming variable values. However, for most purposes it's useful to grab all the IoT Data and then parse it later as needed, which is what we will do here.
Finally, select the "Add action" button where we can add in our IoT Analytics service to process our IoT data coming through the IoT Gateway.
Now click "Configure action." We are now going to use the new AWS Quick create process for IoT Analytics.
Previously you had to individually create the Channel, Pipeline, and Data Store but now you can conveniently create all three with one click and then edit them separately later (which we will do for the two options when we enrich and store our data). Add your own prefix, then select "Quick Create". All your components will then be automatically created.
Now you can Select "Add action."
Now select "Create Rule". Congratulations now you have linked the AWS IoT gateway for your incoming data to your own AWS IoT Analytics service. You should see your new action at the bottom of your actions menu. Make sure your new action is enabled.
At this point I want to add my first modification to IoT Analytics before I start to transmit data from my Soracom device. I'm now going to create a Lambda utilized in the "Pipeline" section of the IoT Analytics service. To do this I'm going to exit out of IoT Core and go directly into the Lambda service to compose my Node.js function. Once my function is complete we can add it into our pipeline to modify our data.
In Lambda go to Lambda--> Functions--> Create Function. Select a name of your choice. For my function I'm using Node.js 8.10 ( I'll provide this function code, and all code I use in the code section of this project as well as a reference to my Github for AWS IoT.
Normally we would create our "Permissions" with a pre-made role to allow our Lambda function to have access to IoT Analytics. We will do this in the next step from the AWS CLI. Currently this service doesn't allow accessing roles created in AWS IAM directly into Lambda for IoT Analytics. I assume the more standard role access method will be implemented soon so that we don't have to use the AWS CLI exclusively, but currently IAM creation of a lambda role to access to IoT Analytics is not supported. By the time you read this, and it is likely integrated, just create the previously described role and attach it to your Lambda function. In any case select "Create function" now.
Normally we would want to test our lambda function at this point. However we are acting on data from our pipeline so we will get an error regardless of the correctness of the code so we will skip testing. However don't worry, we will test our Lambda function from the IoT Analytics Pipeline implementation explained soon. If you haven't installed the AWS CLI please do so now. Again, we will check if your roles and lambda code work once we get to the pipeline step soon.
From the AWS CLI we will grant permission for our lambda to work with IoT analytics by inputting the correct AWS CLI commands. You must add a function policy to allow AWS IoT Analytics to invoke your Lambda function. (for more detail see https://docs.aws.amazon.com/iotanalytics/latest/userguide/pipeline-activities.html).
Use the following command by pasting it into your AWS CLI with your own naming specifics:
aws lambda add-permission --function-name <lambda-function-name> --statement-id <your-statement> --principal iotanalytics.amazonaws.com --action lambda:InvokeFunction
As a reference, here is my implementation with my lambda function name that I just created in the previous step.
Upon input it will return your specific JSON parameters with slashes if it executes successfully. Ignore the "Python File" message.
Now that we have our Lambda function with a role that allows it to interact with our IoT Analytics action lets add it to our Pipeline. To do this go to the "IoT Analytics" service in AWS. Once in IoT Analytics select "Pipelines" on the left side panel. Select the Pipeline you just created as part of your IoT Analytics "Quick creation" process. You probably will only have the pipeline you just created available as an option.
In the "Activities" section click "Edit".
Click "Next" as we don't have any incoming data yet to auto-detect. In "Pipeline Activities" we are now going to add our lambda function, select "Add activity". Scroll down to the "Transform message with Lambda function" option.
Select the lambda function you just created from the drop down menu, and test it with the "update preview" option.
If you have correctly entered the lambda function AND gave it the appropriate permissions with the AWS CLI you will then see the server side timestamp returned. If you get an error message either your lambda function has an error or you haven't provided sufficient permissions in the role we created previously with the AWS CLI. If that is the case, go back now and review those portions of this walk-through. You can't use your lambda function unless it returns the appropriate result. Now that it is verified to work go ahead an click "Save changes". Click "cancel" next where it shows a calendar as you don't need to reprocess any old messages whether you have them or not. We are also not going to process messages on a schedule but this is a very useful option for regular reporting.
Now we want to make one more modification to our IoT Analytics service before we implement it. Normally the IoT data would just be sent to the Data Store and then organized into a Data Set. However since we want to analyze our IoT data with our custom JavaScript code hosted on our static website on S3, we will need to store our data locally and make a data lake. The ability to do this is a brand new revolutionary feature as discussed earlier, so lets enable this powerful option. Again if you haven't worked with S3 and IoT extensively it may be hard to appreciate how awesome this new feature is, but trust me, it solves many problems in creating an inexpensive and flexible data lake.
Go back to IoT Analytics and select "Data Set", then select "Edit" within the "Data set content delivery rules"
Select to "Deliver result to S3"
I suggest you create a public bucket in S3 in which to send your data if you don't already have one. Here I'm using one on my old public buckets called mykbucket14. I made up a "key bucket expression" where I can direct the transfer of my data set to a new partition in S3 and named it "enviro/dataSet530". Finally, you will create a new role that allows IoT Analytics to write data to S3. I named the new role dataSetRule530. This role simply allows s3:PutObject permission from IoT analytics. You can always examine your newly created role by going to "IAM" and examining it if you want more information.
Now click "Save" to enable content delivery to your specified partition in your designated S3 Bucket.
We are finally done modifying our IoT Analytics service with our custom Lambda and S3 delivery options. We could now run our IoT Analytics process from our enabled action from IoT core, however we don't have any data to process yet. Now is the time to start sending data from our Soracom device to the IoT gateway. To do this go back to the IoT Core gateway. Go to "Test" and enter your incoming topic name. If you remember we named our incoming topic as 'mytopic'. Go ahead and subscribe to your topic name, activate your Soracom device, and start processing your data to AWS as explained in my earlier videos.
Now if you want to cheat a bit, or you don't have a device transmitting data, you can simply use the built in MQTT client within the IoT gateway and publish directly to your topic with a fake data package. In this way you are constructing your own "Shadow State" which I won't explain further. My recommendation is to use your Soracom device to send date if it's available as explained in my previous videos. The IoT gateway is agnostic as to data sources.
Now if your previously created IoT Analytics action is enabled, all your incoming data should be being placed into the IoT Analytics process with your attached lambda enrichment function, and then sent to S3 upon running your data set. So now let's run our data set to process our IoT data. Go back to the IoT Analytics service console and select your newly created "Data set". Go to "Actions" and select "Run now" in the upper right corner from the drop down menu.
After a couple of minutes (or more depending on how much data you have), your IoT data will be processed and the service will display "Succeeded". Now click on the content tab to view your data.
Voila! your data is now ready. Also notice that AWS has automatically added on a "__dt" field. This is just date time followed by delta time. However it isn't overly useful for indexing as it has little granularity. This is why our server side date timestamp is a more useful field. As you remember this timestamp was added by our custom lambda function.
Next, lets check that our data was successfully sent to our S3 bucket as configured by our data set content delivery rules. I see my partitions holding my data was indeed created but wait...there's a problem.. I cant access it.! The reason for this is that the data is automatically AWS-KMS encrypted. Even with a permissive CORS policy and configuring a public bucket and partition we still need to remove the automatic KMS encryption that occurs when data is transferred from AWS IoT Analytics to S3. In my opinion, this is our second minor "growing pains" issue with this new service (the first being the mandatory AWS CLI role creation for our Lambda). AWS-KMS encryption should not be automatic if it can't be configured from the content delivery option from IoT Analytics. I've contacted AWS about fixing this so I expect most of you won't even have this issue going forward. However if you do have your data encrypted we will now simply turn it off manually. Go to your partitions "properties" tab and select "Encryption".
Select "none" and "Save". To turn off the encryption.
Next select to "Make Public" to make your data partition public so that you can download your data and verify that it is valid and none of the values have been transmuted.
Now click your "Object URL" to download your data. Notice it's in CSV style format. You can actually save it to .csv format if you want to use a built in csv parser in your web code. However I'm just going to use regular expressions in JavaScript to parse it, so I'll leave it as an ambiguous data blob.
Creating our static website to parse and graph our data on S3
If you made it this far- well done! We are almost finished. We are next going to create a static website hosted in a S3 bucket. If you don't know how to create a static website in S3 it is very easy. There is a lot of videos and documentation on how to do this so I won't waste time here documenting it. Once you create your S3 static website simply paste in my provided HTML+JavaScript code into your index page on S3. The only modifications you need to make is to point your AJAX GET request to your own data partition on S3. The RegEx I used for this project works to parse the data into arrays based on variable position, so you shouldn't have to change any of the variable names to align with your data names from IoT Analytics. If you want to see an example doing the same thing with differently formatted data using Kinesis Firehose I have that on my GitHub as well. As I said before, I believe IoT Analytics is now a superior service in creating IoT Data Lakes than Kinesis Firehose due to determinable naming and unrestricted data duration and size. However it probably can't match Kinesis Firehose for data throughput for massive IoT operations and fast streaming.
One last thing, If you try to access your website now you will get a "restricted access message". This is because any resource on S3 needs to be granted explicit permission to be accessed by any requester. So make sure your data set is publicly accessible. To do this go to your partition then select "Permissions" then under "Public access" select "Everyone" and give "read access". You may also need to make the bucket and partition "public" depending on your prior settings.
If you have done everything correctly you should now simply be able to click your the button on the website you just created and it will automatically graph your data. If you still see a restricted message when you point to your webpage, remember to make your HTML file public by "Make Public" in the S3 options, otherwise you can alter your CORS policy to be more permissive as needed if you are trying to access your data between buckets or remotely. As a note for those wishing to go further with AWS web hosting and requests, the correct way to handle all this is to configure your website with AWS API Gateway. This service will allow you a world of access options, however this is beyond the scope of the project as it can be a very involved process depending on how secure and robust you want to make your website and data objects.
Here is what my data looks like graphed on my web page.
Congratulations on working so hard through many different AWS services.
Going Further
AWS IoT Analytics has an automatic connector from the Data Set in IoT Analytics to AWS QuickSight. AWS QuickSight is a no-coding service to easily produce visualizations for business analytics. To integrate your data set into QuickSight go to AWS QuickSight and then: New Analysis-->New Data Set
And Select the "AWS IoT Analytics" connector. Now we can select a line chart and simply drag and drop our variables from our data set to create visualizations.
To go further if you want to explore ML, AI, or other graphing methods with data I have created a video on using AWS SageMaker with Python and Pandas to visualize your IoT Analytics data. That video can be found here.
https://youtu.be/n68_AmhtKU0
Conclusion
Working with the new AWS IoT related services is always fun, and as mentioned in my instructional videos, using a great cellular IoT device like Soracom produces is a pleasure to work with. Its especially refreshing that they are not dependent on abstruse and nebulous "AT" commands and non-abstracted support libraries when developing for these devices. This ease of use is nice change of pace from many of the radio dependent devices I have worked with over the years. Also a great advantage of these devices is their low price-point and inexpensive cellular service plans combined with their enhanced functionality provided by arbitrating transmissions through their very capable website.
This concludes the Soracom Cellular IoT on AWS project showing you one possible way to analysis an visualize almost real-time IoT data from your Soracom device utilizing AWS. Give yourself a hand for sticking with it and please post questions if something is unclear if you have any suggested enhancements.
Comments