In some situations we find ourselves in need of quickly capturing data from sensors and making a quick analysis of the general behaviours and trends of the variables we are measuring.
In this project we describe how we use a USB sensor dongle to store data from four sensors in a CSV-formatted file without the need for any software, and then import the file into a spreadsheet (Google Sheets in our case) to perform quick analysis and visualisation of the obtained data.
VOCsVolatile organic compounds (VOCs) are emitted as gases from certain solids or liquids. VOCs include a variety of chemicals, some of which may have short- and long-term adverse health effects. Concentrations of many VOCs are consistently higher indoors (up to ten times higher) than outdoors. VOCs are emitted by a wide array of products numbering in the thousands.
Organic chemicals are widely used as ingredients in household products. Paints, varnishes and wax all contain organic solvents, as do many cleaning, disinfecting, cosmetic, degreasing and hobby products, and since many people spend much of their time indoors, long-term exposure to VOCs in the indoor environment can contribute to serious health issues.
USB sensor dongleThe first step is to find a way to measure the indoor environment. Bosch Sensortec manufactures the tiny BME680, a MEMS sensor module designed to measure VOCs through the change in the resistance of a sensitive layer exposed to the ambient. This module also integrates temperature, humidity and air-pressure sensors.
For this project, we use the uThing::VOC, a small USB dongle that integrates the BME680, a voltage regulator and a MCU that fetches data from the sensor and continuously runs the Bosch BSEC air-quality index algorithm to calculate the IAQ (air quality index) that we use to analyse the indoor air condition. This IAQ index along with the other sensor data is then printed in a configurable interval over the virtual serial port to the host PC.
The uThing::VOC has the options to output the data in JSON or CSV format, which is very convenient to import in a spreadsheet, or to use languages as Python or R to apply some data analysis. More information and documentation can be found here.
Step #1: Configure CSV output formatFirst, we need to find the virtual serial port name that our system automatically assigns to the uThing::VOC when we plug it. This name depends on the operating system (detailed information can be found on the dongle's datasheet).
For convenience, we've plugged the dongle into a RaspberryPi, since we can leave the Pi powered-in for several days while it captures data. The Raspbian distribution assigns the name /dev/ttyACM0
to the first connected VCP device.
In Unix based distributions (MacOS, Ubuntu, Debian, Raspberrian, etc.) we can directly use the console to send a command to the dongle. In our RPi case is as simple as
echo C > /dev/ttyACM0
Where "C" is the command to make the uThing::VOC output the data in CSV format.
Each line should look like:
24.37, 1012.04, 38.03, 721620, 129.8, 3
Where the columns order is:
["temperature", "pressure", "humidity", "raw-resistance", "IAQ", "accuracy"]
Note: Depending on the Linux distribution, the user may not have access to the serial port by default. This can be walked around by running the commands with "sudo" or by changing the permissions of the serial device (i.e.: "sudo chmod 666 /dev/ttyACM0").
Step #2: Define your time frameBy default, uThing::VOC prints the sensor data every 3 seconds. That's 1200 data-points per hour and 28800 per day!. The size of a "convenient" dataset depends on the tools you will use to process the data (and your PC resources). I'm using Google Sheets for quick visualisation and basic statistics of the data I get in CSV, and I've found that around 10K-50K rows is a reasonable limit to process the data and create some basic charts. Then if you want to capture data for, let's say 1 week, you will need either a different tool or to reduce the amount of data by reducing the sampling rate.
In the uThing::VOC, the data output period can be configured between 3 seconds (the minimum defined by the internal operation of the BSEC algorithm) and 1 hour with the following commands:
- “1”: 3 seconds
- “2”: 10 seconds
- “3”: 30 seconds
- “4”: 1 minute
- “5”: 10 minutes
- “6”: 30 minutes
- “7”: 1 hour
So, coming back to the example of 1 week worth of data, a reasonable sampling rate could be 1 sample/minute (this is 60 samples/hour * 24 hours/day * 7 days = 10080 datapoints). Let's configure the dongle for this rate with the following command:
echo 4 > /dev/ttyACM0
Where "4" is the configuration for 1 sample/minute and ttyACM0 the default assigned port. The dongle should start printing 1 data-point per minute now.
Step #3: Start logging the dataIn Linux and MacOS, saving the input from the serial port into a file can be as simple as:
cat /dev/ttyACM0 >> file.log
This has a caveat though: if you close the shell window (in case you are using ssh) this command ends, meaning that the data logging will stop.
In order to fix this, the easiest way is to "fork" the process by simply adding the "&" character at the end of the command:
cat /dev/ttyACM0 >> IAQlog-04_01_2019.csv &
If the RaspberryPi is not reset, or the dongle unplugged, the process should keep storing the inputs on the file until it is stopped. For instance, the command "killall cat" will do the job (please don't kill any kitty for real :P).
There are more "robust" (proper) ways to log the data for longer periods (like setting a service like in this other project), but if your RPi is not rebooting too often this is a quick way of logging some data.
Note: If the dongle is attached to a Windows machine, you can use any terminal application that allows to save the session log. For example, in Putty, use the session logging option to set the file name.
Step #4: Import in a spreadsheetIn most of the spreadsheet apps, just click in "File->Import...". At least Google spreadsheets imports the CSV files straight with the default configuration.
It's not in the scope of this project to discuss which variables to analyse and which processing algorithms to apply to the dataset since thats very dependent on specific use-cases, so I will only briefly comment on the observations I've made from my case:
Timestamp: The uThing::VOC doesn't maintain a real-time clock by itself (i.e. for timestamping), but as the dongle is to be plugged in systems that always track the real time and data, a timestamp can be added to every sample. In this example, I've simply noted the time when I started collecting data into the file, and the sampling period (10 seconds for the graph below), and add it in the spreadsheet as a new column (for each row is increased in 10 seconds).
IAQrange: A note from the BME680 BSEC algorithm documentation states: "Indoor-air-quality (IAQ) gives an indication of the relative change in ambient TVOCs detected by BME680. The IAQ scale ranges from 0 (clean air) to 500 (heavily polluted air). During operation, algorithms automatically calibrate and adapt themselves to the typical environments where the sensor is operated (e.g., home, workplace, inside a car, etc.).This automatic background calibration ensures that users experience consistent IAQ performance. The calibration process considers the recent measurement history (typ. up to four days) to ensure that IAQ=25 corresponds to typical good air and IAQ=250 indicates typical polluted air."
So, for our analysis of the IAQ output, we need to keep in mind that the behaviour of the IAQ value is relative to the environment where the sensor operates (it needs initially about 4 days to obtain baseline level).
Below is a chart of around 60 hours of data of my office (with the sampling period set to 10 seconds), with some annotations of the events that affected the quality and circulation of the air.
Note: Probably more information could be deducted from the raw sensor resistance measurements, but we need to keep in mind that this value is also affected by changes in ambient temperature and humidity (which the BSEC algorithm uses also to generate the IAQ values).
------
Please let us know what you think in the comments, and you can follow us on Twitter for more air quality sensing stuff.
Comments