Thermal imaging cameras (TIC) have been shown to be invaluable tools for firefighters. They have led to firefighters being able to locate a victim more quickly, successfully navigate out of burning houses more consistently and reduce the time required to satisfactorily complete a search. Some issues, however, still remain. When searching for victims, it takes time to pull out the TIC, set it up and read the display. Hand-held TIC displays can be difficult to accurately see in dense smoke, leading to slower victim location. Using a display requires the operator to take their eyes and attention off of their immediate surroundings, leading to a loss of awareness and an increased risk of tunnel vision.
A sensory substitution device which utilises machine learning to give firefighters the sense of infrared (IR) would be a useful tool. It would reduce the amount firefighters need to look away from their surroundings at a display (also decreasing the risk of tunnel vision) as they would no longer feel the need to constantly monitor the TIC display as they would be continuously fed important IR information. They would be able to respond more quickly when an appropriate stimulus is felt than having to refer to a display, leading to a decrease in victim rescue time. Key information being streamed to the firefighter directly would lead to fewer occasions where the poor visibility of the display in thick smoke was a concern. Overall, firefighters with a sense of IR would be more effective in search-and-rescue operations.
How To Set The Project UpWe connected the Qwiic Cable (Breadboard Jumper (4-pin)) to the MLX90640 SparkFun IR Array Breakout (MLX). The four wires (black, red, yellow and blue) represent GND, VIN (3.3V), SCL and SDA respectively.
Because our Qwiic Cable was not a Female Jumper, we used four f-f breadboard wires to connect the Qwiic Cable pins to the Raspberry Pi (Pi). In the image below one can see the black, red, yellow and blue Qwiic pins were connected to brown, red, yellow and orange f-f breadboard wires respectively.
The colours of the f-f breadboard wires aren’t really important but what is crucial is that the Qwiic cable GND, VIN, SCL and SDA pins are connected to the appropriate Pi pinouts (pins 6, 1, 5 and 3 respectively). The Pi pinout guide can be seen below along with our hookup to our Pi.
With that done, we created a small cardboard camera-holder to aid in holding up the camera by cutting off a side of a pizza box. This step isn’t necessary so feel free to skip it or make your own.
We then began the process of preparing the Pi to receive data from the MLX. If you’re unfamiliar with using I2C or the Adafruit CircuitPython MLX90640 library a useful guide by Joshua Hrisko is available on Maker Portal.
Implementation OverviewConnecting the camera to the Raspberry PiUsing the Adafruit library we were able to read the data from our MLX90640 thermal camera.
import adafruit_mlx90640
import time,board,busio
i2c = busio.I2C(board.SCL, board.SDA, frequency=1000000) # setup I2C
mlx = adafruit_mlx90640.MLX90640(i2c) # begin MLX90640 with I2C comm
mlx.refresh_rate = adafruit_mlx90640.RefreshRate.REFRESH_2_HZ # set refresh rate
frame = [0]*768 # setup array for storing all 768 temperatures
mlx.getFrame(frame)
At this point we have the temperature data stored in an array ready for preprocessing.
Preprocessing the raw temperature data and feeding to classification serviceOur preprocessing steps require turning the temperature data into an image that can be fed into our Edge Impulse model later on. This is done using the Matplotlib library, the image is then saved to a Buffer Stream and encoded as a base64 string.
mlx.getFrame(frame)
mlx_shape = (24,32)
fig = plt.figure(frameon=False)
ax = plt.Axes(fig, [0., 0., 1., 1.])
ax.set_axis_off()
fig.add_axes(ax)
thermal_image = ax.imshow(np.zeros(mlx_shape), aspect='auto')
MIN= 18.67
MAX= 43.68
data_array = (np.reshape(frame,mlx_shape)) # reshape to 24x32
thermal_image.set_data(np.fliplr(data_array)) # flip left to right
thermal_image.set_clim(vmin=MIN,vmax=MAX) # set bounds
buf = io.BytesIO()
fig.savefig(buf,format='jpg',facecolor='#FCFCFC',bbox_inches='tight')
img_b64 = base64.b64encode(buf.getvalue()).decode()
buf.close()
plt.close(fig)
The base64 string and raw frame data are sent to our classification microservice using an http post request. The microservice will run the image through the Edge Impulse model and following some post processing (explained below) returns a flag stating wether a person was detected in the frame, and if so, what direction they are, i.e. left, centre, right.
The Classification ServiceThe classification service accepts a post request expecting two objects of data in the request: the raw data array and the image represented in base64. The image needs preprocessing before being able to be passed into the classifier. First we must extract the raw features of the image. The image is decoded into an Image Buffer which is then cast to a hexadecimal representation of the image. This hexadecimal string is sliced into the individual RGB values and converted into integers ready for processing into the classifier.
let raw_features = [];
let img_buf = Buffer.from(request.body.image, 'base64')
try{
let buf_string = img_buf.toString('hex');
// store RGB pixel value and convert to integer
for (let i=0; i<buf_string.length; i+=6) {
raw_features.push(parseInt(buf_string.slice(i, i+6), 16));
}
} catch(error) {
throw new Error("Error Processing Incoming Image");
}
The raw features are fed into the classifier with an object consisting of two labels being returned. The labels are a confidence rating of a person being in the image and a person not being in the image.
let result = {"hasPerson":false}
let classifier_result = classifier.classify(raw_features);
no_person_value = 0
person_value = 0
if(classifier_result["results"][0]["label"] === "no person"){
no_person_value = classifier_result["results"][0]["value"]
} else {
throw new Error("Invalid Model Classification Post Processing")
}
if(classifier_result["results"][3]["label"] === "person"){
person_value = classifier_result["results"][3]["value"]
} else {
throw Error("Invalid Model Classification Post Processing")
}
The two labels value's are then compared to our confidence threshold values to determine wether or not a person has been seen in frame. If not, the classifier service responds to the post request with a object consisting of one field:
- "hasPerson" = false.
If, however, the confidence values meet or exceed the threshold criteria, the raw temperature data is used to ascertain where in the frame the heat source is coming from.
if(person_value > person_threshold
&& no_person_value < no_person_threshold){
result["hasPerson"] = true
// If is person find brightspot in the image
let frame_data = request.body.frame
let column_average = new Array(32)
index_count = 0;
for(let j = 0; j < 24; j++){
for (let i = 0; i < 32; i ++){
column_average[i] = (column_average[i] || 0)
+ parseFloat(frame_data[index_count])
index_count++
}}
left_avg = 0
centre_avg = 0
right_avg = 0
for(let i = 0; i < 16; i++){
left_avg = left_avg + column_average[i]
}
for(let i = 8; i < 24; i++){
centre_avg = centre_avg + column_average[i]
}
for(let i = 17; i < 32; i++){
right_avg = right_avg + column_average[i]
}
var direction
if(left_avg > centre_avg && left_avg > right_avg){
direction = 1
} else if (centre_avg > left_avg && centre_avg > right_avg){
direction = 2
} else if (right_avg > left_avg && right_avg > centre_avg){
direction = 3
} else {
direction = 4
}
result["direction"]=direction
A response object is sent back from the post request with two values:
- "hasPerson" = true
- "direction" = <direction value>
We used the Neosensory SDK for Python in order to send motor commands to the Buzz after pairing the Buzz with the Pi. We chose to use spatiotemporal sweeps (“patterns that are encoded in both space and time”) because a study by Novich and Eagleman [2] found them to be an optimal method of encoding data to skin when compared to spatial patterns and patterns consisting of a single motor stimulating an area of skin by vibration. The higher identification performance of the spatiotemporal sweeps allows for greater information transfer (IT) through the skin, meaning the firefighter can receive more useful information (and a greater potential effectiveness of a gained IR sense). Because we are only concerned with three direction values, three sweep arrays were created to describe a person being on the left, right or in the centre of the frame as can be seen below.
sweep_left = [255,0,0,0,0,255,0,0,0,0,255,0,0,0,0,255,0,0,0,0]
sweep_right = [0,0,0,255,0,0,255,0,0,255,0,0,255,0,0,0,0,0,0,0]
sweep_centre = [255,0,0,0,0,255,0,0,0,0,0,255,0,0,255,0,0,0,0,0]
When the Pi receives a response object which conveys that there is a person in the frame and where in the frame they are, a vibrate motor command is sent to the Buzz.
if(response['hasPerson'] == True):
print("has person")
if(response['direction']):
print(response['direction'])
if response['direction'] == 1:
await my_buzz.vibrate_motors(sweep_right)
print("Right")
elif response['direction'] == 2:
await my_buzz.vibrate_motors(sweep_centre)
print("Centre")
elif response['direction'] == 3:
await my_buzz.vibrate_motors(sweep_left)
print("Left")
else:
print("inconclusive")
else:
print("no person")
We found it was important to put the Buzz into pairing mode every time we ran the code to minimise the possibility of the Buzz not vibrating when sent a command.
Edge ImpulseData Set CollectionWhile the MLX was pointing at someone, something or nothing the array of 768 temperature values was saved in a.txt file. The files were organised for ease of upload to Edge Impulse in terms of which had people in the frame, which had objects (such as a radiator or a dog) in the frame and which had nothing.
Our method for pre processing the data for training the model was similar to how we processed the live feed from the camera illustrated above, with the exception of the data source being the text files containing the temperature values. We programmed a python script to iterate over all these files and output them as images. We needed to have a min and max value to assign to the colour range when converting the temperatures to an image. To find these min, max values we found the lowest and highest temperatures in our dataset of ~400 temperature arrays.
Creating the ModelUsing Edge Impulse we uploaded collected data with appropriate labels of ‘Person’ and ‘No Person’ and allowed the data to be automatically split between training and testing. For the impulse design, we tried different combinations of processing and learning blocks (such as image and neural network (Keras)) but we found that with our data an image processing block and a transfer learning learning block performed best.
We used RGB for the color depth parameters and then generated features.
We set the number of training cycles to 20, a learning rate of 0.0005 and set a minimum confidence rating of 0.6.
We deployed the impulse as a WebAssembly library with the default optimisations.
Model in ActionThe model worked fairly well but had some problems as it would occasionally output false positives as well as false negatives. Below are some screenshots of the outputs of the model running in our node js server and Python code on the Pi when someone sat directly in front of the MLX and the model correctly identified that they were in the centre of the frame.
Although the MLX is a brilliant TIC for hobbyists at home, we feel it just won’t cut it when it comes to helping firefighters save lives. A ‘high performance’ TIC like the FLIR K53, visible here, packs an impressive 320x240 resolution (76800 total pixels compared to the MLX’s 768) and a refresh rate of 60 Hz. A more sophisticated TIC (with higher resolution and refresh rate) would give models an easier time in accurately detecting a human body shape and would be necessary to turn this project into a product.
Further Training of ModelWe also feel a more sophisticated model would be necessary to turn this project into a product. The next steps in the project’s journey will be to develop a model which can detect multiple people in the same frame and not only output if a person is detected in the frame but also where in the frame they are. This model should give location in terms of both the x and y axes. If possible, information regarding the person’s distance to the camera should be conveyed or even information related to how ‘visible’ the person is (e.g. if only an arm is detected, the model should convey that a person is ‘partially visible’ due to an obstruction such as a bed or rubble).
For this project we had a very simple haptic language which only had to convey information regarding three locations. Moving forward, as the model outputs more information, a more refined haptic language will need to be designed. This could be leveraged with a larger array of actuators located on a firefighter’s body to facilitate a richer variety of spatio-temporal sweeps. Rather than one Buzz, the final product could aim to have firefighters wearing a haptic sleeve or vest.
References[1] - Raspberry Pi Pinout
[2] - Using space and time to encode vibrotactile information: toward an estimate of the skin’s achievable throughput. Scott D. Novich, David M. Eagleman. 10, s.I. : Springer Nature, 2015, Experimental Brain Research, Vol. 233, pp. 2777-2788.
Comments