•

Evan Diewald

•

Nathan Pirhalla

Published August 5, 2020 © GPL3+

LongHive

An end-to-end beehive monitoring solution enabled by the Helium Network and deep learning.

IntermediateFull instructions provided10 hours14,850

Grand Prize

Helium #IoTforGood

Things used in this project

Hardware components

Helium Developer Kit

Raspberry Pi 3 Model B

Seeed Studio ReSpeaker 2-Mics Pi HAT

50kg Load Cells with HX711 Amplifier

SparkFun Air Quality Breakout - CCS811

HARDWARIO DS18B20 Temperature Sensor 1m

Software apps and online services

TensorFlow

Grafana

Helium Console

Pipedream

Hand tools and fabrication machines

3D Printer (generic)

Story

Introduction & Project Overview

Bees are critical to the livelihood of our ecosystem, but unfortunately in the past decade, bee populations have decreased by 30% [1]. In addition to the ecological impact, this instability poses an economic threat to the commercial honey bee pollination industry, which is valued at over $10 billion annually in the U.S. alone [2]. Much of the decline is attributed to a complicated phenomenon known as Colony Collapse Disorder (CCD), which results in the hive’s rapid abandonment of their queen, but the causes of CCD are not well understood. Whether you are an individual hobbyist or a commercial farmer who relies on large-scale pollination, monitoring your hive with simple sensory data can help beekeepers detect problematic trends in colony health. Our project, known as “LongHive”, is a full-service infrastructure for beehive maintenance, enabled by the Helium Network and Deep Learning (DL). Data-driven beekeepers can install our LongHive system, which fits underneath standard beehives and includes a suite of relevant sensors, a pre-trained convolutional neural network (CNN) for classifying the hive’s acoustic signatures, and a web-based dashboard for easy visualization of the transient signals.

The LongHive System is an end-to-end solution for beehive monitoring. We'll look at each of these facets in greater detail.

The LongHive infrastructure is designed to facilitate a collaborative community forum (where users share data to improve DL models and general beekeeping insights), but the hardware operates within a modular framework to meet the specific needs of each beekeeper. Our goal is to help beekeepers make the most out of their time and reduce the frequency of intrusive hive inspections while still detecting problems within the hive. This is done by flagging problematic trends in the data, which could represent major issues like CCD, hive robbery, or missing queens. Our competition uses WiFi for connectivity, which has a limited range and consumes a significant amount of power, but the Helium Network enables low-power devices that can operate in a much more remote environment. We combine edge computing and a pre-trained network to circumvent the most glaring constraint of LoRaWAN – low transaction throughput – in our DL classifier. A Raspberry Pi bears the computational burden locally, so only the network output (the classification) needs to be transmitted over LongFi.

LongHive in Action

The LongHive Promo Video

The LongHive Sensor Suite

In our review of relevant literature and existing commercial solutions, we found a slew of passive sensors that have proven to give some indication of hive health. First and foremost, we want to provide beekeepers with real-time data that they will use to augment their existing heuristics and improve productivity. Variation in hive weight is a sign of honey production and population. Temperature is a simple, but critical source of information; bees like to keep very precise thermal conditions for optimal hive development. In fact, they have fascinating mechanisms in place for maintaining this delicate homeostasis: when the hive is too hot, they fan their wings to increase convective cooling; when it is too cool, they generate heat by vibrating their flight muscles. Similarly, beekeepers must keep an eye on the relative humidity in their hive - eggs cannot hatch when it's too dry, but damp conditions can be a sign of mold or disease. Carbon dioxide is released into the hive as a byproduct of honey production. Thus, a lack of proper ventilation can result in CO2 poisoning and other maladies. Beekeepers are responsible for making it as easy as possible for their hives to maintain this balance by making tweaks to airflow and insulation. We also found a lot of literature suggesting that the acoustic signals emitted by a hive can be a rich source of information, but it will take a more complex processing pipeline to make sense of it (more on this later).

With this careful ecosystem in mind, we want to be as unobtrusive as possible. The good news is that beehives have standard dimensions, which means that we can potentially design a "one-size-fits-all" solution. If you've ever seen a hive in person, you're probably familiar with this stackable assembly:

A standard stackable beehive in all its glory. Take special note of the Hive Stand at the base.

After discussing with a friend who keeps bees, we decided that the empty Hive Stand will be used to house the electronics, batteries, and load cells. Wired sensors are threaded up into the hive itself to give relevant measurements. A CAD rendering of the enclosure is shown below (a wooden frame with 3d printed housings for the components).

1 / 5 • CAD rendering of our housing, which fits under the beehive. It's a wooden frame sized to match a standard hive's dimensions with 3d printed housings for the LoRa controller, Raspberry Pi, and USB battery pack. Also shown are the four load cells in the corners.

Classifying Buzzing Signals with Deep Learning

Our custom CNN architecture for classifying the mel spectrogram images.

While you can tell a lot about a hive’s health from first-degree data sources like temperature and humidity probes, researchers have proven that you can also extract useful information by listening to the bees themselves [3]. As a proof of concept, we have implemented a CNN that classifies a hive based on whether or not it has a queen by encoding the spectral content of its acoustic signals. Once a robust, labeled dataset is collected (hopefully through the LongHive community), we suspect that we can use a similar pipeline to make other classifications. The training dataset was compiled from an open source publication, where beekeepers recorded their hives and labeled the audio files according to whether or not they had a queen. Because it represents a variety of geographic locations, recording techniques, and background noise, the data is robust and generalizable. We split the WAV files into 4.5-second segments, resulting in about 2, 000 training samples per class (queen or no queen). In a purely temporal domain, the acoustic signals are not easily separable, as it is difficult (for a DL model) to differentiate audio of differing amplitudes and background noise. Mel spectrograms are commonly used for audio classification tasks, as they extract more relevant spectral information from the time-series signals into an image, allowing us to take advantage of mature CNN-based techniques. The x axis is time, the y axis is frequency, and the color is the power of the signal at that frequency band.

To increase the separability of the dataset, we transform the raw audio signals (top row) into the time-frequency domain. Here we see the spectrogram outputs for (bottom row, left to right) hives with a queen, hives with no queen, and a control case.

Once the mel spectrograms were cropped and resized to 256x256x3 inputs, they were fed into the CNN. We found network training to be somewhat unstable, which is likely due to the small and noisy dataset. The architecture contains about 144, 000 trainable parameters (for reference, the groundbreaking AlexNet architecture has over 60 million parameters!) and consists of descending convolutional and max pooling layers with Leaky ReLU activations for feature extraction and two fully connected layers for classification (see the figure at the beginning of this section). We wanted to keep the size as small as possible for the performance, in order to run optimally on the Raspberry Pi. In this case, we’re using binary cross entropy for the loss function, a batch size of 32, and a fairly low learning rate of 1e-4. To contend with the aforementioned network instability, we implement early stopping if training accuracy does not improve for 3 consecutive epochs. The final accuracy for this binary classification was 89% on a test set, but as the LongHive community grows, the model will only improve.

Training of the CNN was erratic and unstable on this noisy dataset, but early stopping and a low learning rate helped keep us from jumping out of the local minima.

Edge Computing on the Raspberry Pi

For real-time model evaluation on the Raspberry Pi, it’s computationally inefficient to run a full-blown Tensorflow implementation on the Pi’s processor, so we’re using ARM-friendly Tensorflow Lite for the classification task. The pre-trained TF model was exported, the architecture and weights were converted into.TFLITE format, and it was copied to the Pi’s local memory. To collect the audio signals, we’re using the ReSpeaker 2-Mics Pi HAT, which has a well-documented Python library. We’re also using the same exact pre-processing pipeline to generate the mel spectrogram test images as we used for the training data. Saving the recording, calculating the fourier transforms, and evaluating the model takes about 10 seconds on the Pi. The classification label (1: queen detected, 0: no queen detected) is transmitted to the STM board via the serial port at pre-defined intervals. This is edge computing at its finest: we distilled several gigabytes of training data down into a pre-trained 500KB TFLITE model that can be loaded into the Pi’s RAM. Upon evaluation, all this knowledge is characterized in the classification by a single byte. LongFi may be known for its low throughput, but that does not mean it cannot represent vast amounts of information.

How do we represent thousands of training samples-worth of data in a single byte?

For further documentation of code usage, please refer to the README file in the LongHive Github repository.

Programming the LoRaWAN Microcontroller

The atomic unit for the entire LongHive system is the Helium payload. The LoRa board from the Developer Kit compiles data from the sensors and the Raspberry Pi into a Cayenne Low-Power Payload (LPP) and transmits it over Helium’s LoRaWAN-based protocol. In our case, our LoRa board needs to get readings from the load cells, temperature probe, CO2 and air quality sensors, and microphone. These values are encoded as separate channels in a CayenneLPP packet so that they can be decoded and parsed by our integrations later. After including the required libraries and specifying your DevEUI, AppEUI, and AppKey (as documented in the Arduino Quickstart Guide), the heart of the code is in the do_send function.

void do_send(osjob_t *j) {
    // Check if there is not a current TX/RX job running
    if (LMIC.opmode & OP_TXRXPEND) {
        Serial.println(F("OP_TXRXPEND, not sending"));
        Serial.println("BAD");
    }
    else {
        Serial.println("HERE");
        int serialData;
        float serialOut;
        float weight;
        int eC02;
        int TVOC;
        float tempC;
        float tempF;

        while(true) {
            //temp probe
            sensors.requestTemperatures();
            tempC = sensors.getTempCByIndex(0);
            tempF = sensors.getTempFByIndex(0);
            //air quality
            if(ccs.available()) {
                if(!ccs.readData()) {
                    eC02 = ccs.geteCO2();
                    TVOC = ccs.getTVOC();
            }
            else {
                Serial.println("ERROR!");
                while(1);
                }
            }

        //queen from rpi
        if(Serial.available() > 0) {
            serialData = Serial.read();
            if (serialData > 0) {
                serialOut = 1.0/(serialData - 48);
                Serial.println("Serial Data received: ");
                Serial.println(serialOut);
            }
        }

        //scale
        for(int i = 1;i<16;i++){
            weight = scale.get_units() + weight;
        }
        weight = weight/15;
        //check sensor data in serial output
        Serial.println("Sensor Data: ");
        Serial.println("Temp: " + String(tempC));
        Serial.println("eC02: " + String(eC02));
        Serial.println("TVOC: " + String(TVOC));
        Serial.println("Queen: " + String(serialOut));
        Serial.println("Weight: " + String(weight));
        Serial.println();
        delay(5000);
        }

        lpp.reset();
        lpp.addTemperature(1, tempC);
        lpp.addAnalogOutput(2, eC02);
        lpp.addAnalogOutput(3, TVOC);
        lpp.addAnalogOutput(4, serialOut);
        lpp.addAnalogOutput(5, weight);

        LMIC_setTxData2(1, lpp.getBuffer(), lpp.getSize(), 0);
        Serial.println(F("Packet queued"));
    }
// Next TX is scheduled after TX_COMPLETE event.
}

Here, we are taking the readings and allocating the different channels for each of the sensors (temperature, CO2, TVOC, queen/no queen classification, & weight). Our resulting payload – which is transmitted every 60 seconds – is just 10 bytes (2 bytes per value), which fits comfortably within the 24-byte per data credit limit on the Helium Network. In other words, coverage for transmitting the packets every minute will cost just $5.26 per hive per year. Switching to hourly uplinks drops the price to less than $0.09 annually!

Sending and Collecting LongHive Sensor Suite Data

The exact details of setting up Helium Console to accept the payloads, decode the data, and route it through an HTTP Integration are more scrutinously detailed in a previous post by one of the project authors. We will refer you to this documentation up to the point where we set up the Pipedream Endpoint, as briefly outlined in this section. However, rather than enter the sensor outputs into a Google Sheet, we will be storing the data in a Postgres database.

Flowchart From Evan's Blog. Instead of Google Sheets, we will be routing the payloads to a PostgreSQL database.

After formatting and encoding the CayenneLPP payload in the do_send function, the encoded data is transmitted by the LoRa board to a hotspot, allowing it to be accessed in the Helium Console. Once a packet arrives in the console, it is passed through a Decoder Function, which arranges the sensor data into human-readable JSON format. Finally, an HTTP Integration routes this decoded payload to a Pipedream endpoint.

Evan's blog post makes it evident how simple it is to use Pipedream Workflows to transfer data from our endpoint to a plethora of other websites and applications. Unfortunately for this project, there were no free and simple database workflow actions that Pipedream offered, which ultimately lead us to create a solution using a local Postgres server. The following Node.js script pulls the payloads from our endpoint (via Pipedream's REST API) and inserts them into a Postgres table:

const options = {
  hostname: "api.pipedream.com",
  port: 443,
  path: "/v1/sources/<source id>/event_summaries?expand=event&limit=1",
  headers: {
  "Authorization": "Bearer <Bearer code>",
  },
}

const sql_query = 'INSERT INTO hivedata_table(datetime, temperature, eC02, TVOC, queen, weight) VALUES($1, $2, $3, $4, $5, $6)'

let getData = new Promise(function(resolve, reject) {
    setTimeout(() => {

    const req = https.request(options,  resp => {
        let data = ""
        resp.on("data", chunk => {
            data += chunk
    })

    resp.on("end", () => {
        var obj = JSON.parse(data)
        var values = []
        var time1 = getTime(obj.data[0].event.body.hotspots[0].reported_at)
        values.push(time1)
        for (var i = 0; i<5; i++){
            values.push(obj.data[0].event.body.decoded.payload[i])
        }
        client.connect()
        // callback
        client.query(sql_query, values, (err, res) => {
        if (err) {
            console.log(err.stack)
        } else {
            console.log(res.rows[0])
        }
        })
    })
}).on("error", err => {
    console.error("[error] " + err.message)
})
    req.end()
}, 2000)
})

With Node.js we were able to call the REST API to retrieve the JSON data from the most recent POST request and parse the fields for the relevant sensor data. The formatted data is arranged into a SQL query and inserted into the Postgres table.

Grafana Dashboard

Grafana is a simple and elegant way to visualize various types of data. From the inception of this project, we knew in the end that whatever data we collected would be best visualized on a dashboard with all the sensor data easily visible to the user. Creating and maintain a complete website was considered, but it was not until we discovered Grafana that we knew it was exactly what we needed.

After downloading Grafana and logging in for the first time, the next step was to add a data source and connect our Postgres database. Grafana makes connecting data sources very simple and offers a wide range of databases and other connections. Once the data is accessible to Grafana, creating a dashboard and adding panels is just as easy.

The nature of our data required mostly graphs, but Grafana has plenty of different visualizations to display diverse data sets. In the end, Grafana provides a stylish and professional looking medium for displaying our data.

Performance & Future Work

We are quite pleased with this working prototype, but there are three main technical and infrastructural aspects we would focus on as the project develops:

Hardware/Software Efficiency Improvements

This iteration is a viable proof of concept of the potential of the LongHive system, but it is far from optimal. The Raspberry Pi and LRWAN-1 board would eventually be replaced by specialized hardware with deep sleep capabilities for improved battery life performance. In this case, we used a USB battery pack to power the system for three days, but we believe we can increase longevity by several weeks by refining the electronics and code.

Growing the LongHive Community

While preparing this project, we had our first real exposure to the beekeeping community, which we found to be extremely friendly and collaborative. One of our future goals is to provide a platform where beekeepers can pool their knowledge and data to improve analytics and share insights. Ideally, the passive sensor data would be gathered in a cloud database to draw broader geographic and seasonal trends.

Issuing a Challenge to the Helium Network

The Helium Network has already shown its ability to rapidly expand in major metropolitan areas. While we understand that these population centers are a necessary first target when it comes to scaling a peer to peer network (no small task), we hope we've added new evidence to the argument that every farm should have access to a hotspot. Modern agriculture is one of the most innovative and technologically-driven fields in the U.S., but unfortunately, expensive, closed-source solutions are too-often reserved for high-yield corporate facilities. However, as "The People's Network", we would like to challenge Helium to make it part of their mission to make connectivity affordable for smaller local farms. LongHive and other LoRaWAN-enabled agriculture projects (some of which may be our competitors in this contest) show the potential for providing low-cost, long-range networking capabilities to this user base. We hope that small farms still get a seat at the table when it comes to reap the benefits.

1 / 3 • Some action shots of our LongHive system installed under a real hive on Two Gander Farm in Downingtown, PA

Acknowledgments

Special thanks to our beekeeping advisor, Trey, from Two Gander Farm in Downingtown, PA!
We would also like to thank Nathan's sister, Kathryn for our incredible logo design!