A Device for Sensing and Alerting Students to Poor Ergonomic

An ergonomics monitoring gadget for students that uses computer vision to track posture, alerting and guiding them towards healthier habits.

Things used in this project

Hardware components

DFRobot FireBeetle ESP32 IOT Microcontroller (Supports Wi-Fi & Bluetooth)

DFRobot Fermion: DFPlayer Pro - A mini MP3 Player with On-board 128MB Storage (Breakout)

Pimoroni MLX90640 Thermal Camera Breakout

Software apps and online services

Arduino IDE

Arduino IoT Cloud

TensorFlow

Story

The Project:

The project utilizes a DFRobot FireBeetle ESP32 board with a camera modulethat aims to monitor and provide voice feedback on bad sitting postures in real-time. The project is primarily based on the TensorFlow Lite official example for pose estimation, utilizing the MoveNet model. Additionally, it incorporates a simple fully connected neural network for pose classification to determine whether the person's posture falls into categories such as "standard sitting posture," "cross-legged," or "forward head and hunched back." The pose classification network is trained using a dataset of labeled images, which are processed to extract the landmark coordinates detected by the MoveNet model.

Motivation:

The project aims to enable real-time pose estimation using the DFRobot FireBeetle ESP32 and a camera module, coupled with a server created with FastAPI. By leveraging the capabilities of the FireBeetle ESP32 and the server's computational power, the project provides a cost-effective solution for real-time pose estimation.

Functionality:

The DFRobot FireBeetle ESP32 with the camera module captures an image.
The captured image is sent to the FastAPI server via an HTTP POST request.
The FastAPI server receives the image and performs pose estimation using the MoveNet model.
The detected keypoints are used as input to the TensorFlow Lite model on the server to predict the pose class.
The predicted pose class is sent back to the FireBeetle ESP32 via an HTTP response.
The FireBeetle ESP32 receives the pose class prediction.

Data collection:

To train the machine learning model, data collection is essential. However, finding suitable data for training a posture detection system proved challenging. Existing sources like Kaggle or Google Image Repository did not have relevant images of people sitting in chairs. Therefore, manual work was required. A Google Form was created, and friends were asked to provide images following specific guidelines. Additionally, gaming chair advertisements and illustrations of good/bad postures were used. Despite the limitations imposed by the pandemic, around 90 images were gathered. Although this is a small dataset, techniques like image augmentation can be employed to increase dataset size and variability. Importantly, the dataset was well-balanced, with an equal distribution of labels (e.g., 40 images with good posture and 50 with bad posture).

Preprocessing:

The dataset was augmented using techniques such as resizing, cropping, and rotation to increase variability. Preprocessing steps included normalization, landmark extraction, and encoding of target labels for training the machine learning model.

Model Training and Evaluation:

The machine learning model was developed using a feed-forward neural network architecture. The model underwent training using the compiled dataset, employing the categorical cross-entropy loss function and the Adam optimizer. During training, the model's performance was monitored using evaluation metrics such as accuracy. The trained model was then evaluated on a separate test set to assess its generalization ability and effectiveness in classifying sitting postures accurately.

confusion matrix to better understand the model performance

Results and Discussion:

The experimental results demonstrated the effectiveness of the proposed system in accurately classifying sitting postures. The trained model achieved a high classification accuracy on the test set, indicating its potential for real-time posture classification applications. The system's real-time feedback capability offers individuals an opportunity to correct their sitting postures promptly, leading to improved ergonomic practices and reduced risk of musculoskeletal disorders.

Code

#include <Arduino.h>
#include <WiFi.h>
#include "soc/soc.h"
#include "soc/rtc_cntl_reg.h"
#include "esp_camera.h"
#include "esp32cam.h"
#include "esp32cam/apps/PersonDetector.h"
#include <PubSubClient.h>

const char* ssid = "********";
const char* password = "********";  // Replace with your Wi-Fi password

String serverName = "";   // REPLACE WITH YOUR IP ADDRESS or domain name
String serverPath = "/uploadfile/";     // The default serverPath should be upload.php

const int serverPort = 8000;
const char* mqtt_server = ""; // Replace with your MQTT server IP address

WiFiClient client;
PubSubClient mqttClient(client);

// CAMERA_MODEL_AI_THINKER
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27

#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

const int timerInterval = 30000;    // time between each HTTP POST image
unsigned long previousMillis = 0;   // last time image was sent

void setup() {
  WRITE_PERI_REG(RTC_CNTL_BROWN_OUT_REG, 0); 
  Serial.begin(115200);

  WiFi.mode(WIFI_STA);
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(ssid);
  WiFi.begin(ssid, password);  
  while (WiFi.status() != WL_CONNECTED) {
    Serial.print(".");
    delay(500);
  }
  Serial.println();
  Serial.print("ESP32-CAM IP Address: ");
  Serial.println(WiFi.localIP());

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  if(psramFound()){
    config.frame_size = FRAMESIZE_UXGA;
    config.jpeg_quality = 10;  //0-63 lower number means higher quality
    config.fb_count = 2;
  } else {
    config.frame_size = FRAMESIZE_CIF;
    config.jpeg_quality = 12;  //0-63 lower number means higher quality
    config.fb_count = 1;
  }
  
  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    delay(1000);
    ESP.restart();
  }
}

void loop() {
  unsigned long currentMillis = millis();
  if (currentMillis - previousMillis >= timerInterval) {
    previousMillis = currentMillis;
    captureAndUpload();
  }
}

void captureAndUpload() {
  camera_fb_t* fb = NULL;
  
  fb = esp_camera_fb_get();
  
  if (!fb) {
    Serial.println("Camera capture failed");
    return;
  }

  WiFiClient client;
  
  if (client.connect(serverName.c_str(), serverPort)) {
    String head = "--BoundaryString\r\nContent-Disposition: form-data; name=\"file\"; filename=\"esp32-cam.jpg\"\r\nContent-Type: image/jpeg\r\n\r\n";
    String tail = "\r\n--BoundaryString--\r\n";

    uint16_t imageLen = fb->len;
    uint16_t extraLen = head.length() + tail.length();
    uint16_t totalLen = imageLen + extraLen;

    client.println("POST " + serverPath + " HTTP/1.1");
    client.println("Host: " + serverName);
    client.println("Content-Length: " + String(totalLen));
    client.println("Content-Type: multipart/form-data; boundary=BoundaryString");
    client.println();
    client.print(head);

    uint8_t *fbBuf = fb->buf;
    size_t fbLen = fb->len;
    for (size_t n = 0; n < fbLen; n = n + 1024) {
      if (n + 1024 < fbLen) {
        client.write(fbBuf, 1024);
        fbBuf += 1024;
      } else if (fbLen % 1024 > 0) {
        size_t remainder = fbLen % 1024;
        client.write(fbBuf, remainder);
      }
    }

    client.print(tail);
    
    esp_camera_fb_return(fb);
    
    int timoutTimer = 10000;
    long startTimer = millis();
    boolean state = false;
    
    while ((startTimer + timoutTimer) > millis()) {
      Serial.print(".");
      delay(100);      
      while (client.available()) {
        char c = client.read();
        Serial.print(c);  // print all incoming data
        if (c == '\n') {
          if (getAll.length()==0) { state=true; }
          getAll = "";
        }
        else if (c != '\r') { getAll += String(c); }
        if (state==true) { getBody += String(c); }
        startTimer = millis();
      }
      if (getBody.length()>0) { break; }
    }
    Serial.println();
    Serial.println(getBody);
    client.stop();
  }
  else {
    getBody = "Connection to " + serverName +  " failed.";
    Serial.println(getBody);
  }
}

from fastapi import FastAPI, UploadFile, Request, File
from fastapi_mqtt import FastMQTT, MQTTConfig
from tensorflow_hub import load
import tensorflow as tf
import numpy as np
from PIL import Image
import io
import os
import sys

# Add the path to the pose estimation example
pose_sample_rpi_path = os.path.join(os.getcwd(), '../examples/lite/examples/pose_estimation/raspberry_pi')
sys.path.append(pose_sample_rpi_path)

# Import the necessary modules
import utils
from data import BodyPart
from ml import Movenet

app = FastAPI()

mqtt_config = MQTTConfig()

fast_mqtt = FastMQTT(
    config=mqtt_config
)

@app.on_event("startup")
async def startup_event():
    await fast_mqtt.connection()

# Load the MoveNet model
movenet = Movenet('movenet_thunder_fp16.tflite')

# Load the TFLite classifier model
interpreter = tf.lite.Interpreter(model_path="pose_classifier.tflite")
interpreter.allocate_tensors()

# Define function to run pose estimation using MoveNet Thunder.
# You'll apply MoveNet's cropping algorithm and run inference multiple times on
# the input image to improve pose estimation accuracy.
def detect(input_tensor, inference_count=3):
  # Detect pose using the full input image
  movenet.detect(input_tensor.numpy(), reset_crop_region=True)

  # Repeatedly using previous detection result to identify the region of
  # interest and only croping that region to improve detection accuracy
  for _ in range(inference_count - 1):
    person = movenet.detect(input_tensor.numpy(), 
                            reset_crop_region=False)

  return person

def predict_pose(interpreter, keypoints):
    """Predicts the pose class for the given keypoints using the TFLite model."""
    input_index = interpreter.get_input_details()[0]["index"]
    output_index = interpreter.get_output_details()[0]["index"]

    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    keypoints = np.expand_dims(keypoints, axis=0).astype('float32')
    interpreter.set_tensor(input_index, keypoints)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the class with highest
    # probability.
    output = interpreter.tensor(output_index)
    predicted_label = np.argmax(output()[0])

    return predicted_label

@app.post("/uploadfile/")
async def upload_file(request: Request, file: UploadFile = File(...)):
    # Load the image
    image = Image.open(io.BytesIO(await file.read()))
    image = tf.convert_to_tensor(np.array(image))
    image = tf.image.resize(image, [192, 192])  # Resize to model's expected input size

    # Run the pose estimation
    person = detect(image)

    # Get landmarks and scale it to the same size as the input image
    pose_landmarks = np.array(
                  [[keypoint.coordinate.x, keypoint.coordinate.y, keypoint.score]
                    for keypoint in person.keypoints],
                  dtype=np.float32)

    # Write the landmark coordinates to its per-class CSV file
    coordinates = pose_landmarks.flatten().astype(np.str).tolist()

    # Predict the pose class for the keypoints
    predicted_label = predict_pose(interpreter, coordinates)

    # Read the labels from the text file
    with open('pose_labels.txt', 'r') as f:
        labels = [line.strip() for line in f]
    
    # Get the client's IP address
    client_host = request.client.host

    # After processing the image and getting the result, publish it to the MQTT topic
    topic = "esp32-cam/" + client_host  # Use the IP address of the ESP32-CAM as the topic
    message = {"predicted_label": labels[predicted_label]}
    fast_mqtt.publish(topic, message)

    # Return the predicted label
    return {"predicted_label": labels[predicted_label]}

Credits

aztech

2 projects • 0 followers

Contact

Mr.Fred

0 projects • 0 followers

Contact

Thanks to .

Comments

Please log in or sign up to comment.

A Device for Sensing and Alerting Students to Poor Ergonomic