kennedy saine Banda
Published © GPL3+

Dual Control wheelchair

Control your wheelchair with a smile! FaceMesh tech + voice command = effortless mobility. Independence redefined. #assistivetech #innovatio

IntermediateFull instructions provided3 days284

Things used in this project

Hardware components

UNIHIKER - IoT Python Programming Single Board Computer with Touchscreen
DFRobot UNIHIKER - IoT Python Programming Single Board Computer with Touchscreen
×1
Seeed Studio XIAO ESP32S3 Sense
Seeed Studio XIAO ESP32S3 Sense
×1
Blues Notecarrier A
Blues Notecarrier A
×1
Blues Notecard (Cellular)
Blues Notecard (Cellular)
×1
ESP32
Espressif ESP32
×1
Webcam, Logitech® HD Pro
Webcam, Logitech® HD Pro
×1
hoverboard motor
×1

Software apps and online services

Arduino IDE
Arduino IDE
VS Code
Microsoft VS Code
mind+
Blues Notehub.io
Blues Notehub.io
OpenCV
OpenCV
Mediapipe
Edge Impulse Studio
Edge Impulse Studio
MQTT
MQTT
cvzone
speech recognition

Hand tools and fabrication machines

Soldering iron (generic)
Soldering iron (generic)
Solder Wire, Lead Free
Solder Wire, Lead Free
Mastech MS8217 Autorange Digital Multimeter
Digilent Mastech MS8217 Autorange Digital Multimeter

Story

Read more

Schematics

System Diagram

This is the systems Diagram

Code

Unihicker

Python
These codes are for the Unihicker
"""
This Python script combines facial recognition and voice command recognition with an MQTT client.
It allows real-time face tracking and voice command processing, where the results are published 
to an MQTT broker. The script uses threading to run both face recognition and voice command 
recognition simultaneously. The GUI interface enables starting or stopping these processes.

Main components:
1. **MQTT Client**: Connects to an MQTT broker and publishes face and voice recognition results.
2. **Face Recognition**: Uses MediaPipe's face mesh model to track facial landmarks. Detects mouth 
   status (open/closed) and gaze direction (left/right/center).
3. **Voice Command Recognition**: Captures voice commands via a microphone and recognizes commands 
   such as "move forward", "move left", "move right", and "stop".
4. **GUI Interface**: Provides buttons to start and stop voice and face recognition processes.

Prerequisites:
- Install the required libraries: paho-mqtt, cvzone, opencv-python, mediapipe, speechrecognition, unihiker.

Usage:
- Press "START VOICE COM" to start voice command recognition.
- Press "START FACE RECOG" to start face recognition.
- The system will display detected statuses on the screen and publish corresponding messages to the MQTT broker.
"""

import paho.mqtt.client as mqtt
from unihiker import GUI
import time
import cvzone
import cv2 as cv 
import threading
import mediapipe as mp
import speech_recognition as sr

# Initialize the GUI interface for Unihiker
gui = GUI()

# MQTT settings
mqtt_broker = "broker.hivemq.com"  # Replace with your MQTT broker address
mqtt_port = 1883
mqtt_topic = "esp32/face_control"

# Initialize MQTT Client
client = mqtt.Client()

# Define callback for when the client connects to the MQTT broker
def on_connect(client, userdata, flags, rc):
    print(f"Connected with result code {rc}")
    client.subscribe(mqtt_topic)  # Subscribe to the topic to listen for incoming messages

client.on_connect = on_connect
client.connect(mqtt_broker, mqtt_port, 60)

# Start the MQTT client loop in a separate thread
client.loop_start()

# Global flags to control the running state of face recognition and voice command recognition
face_recognition_running = False
voice_command_running = False

def faceRecognition():
    """ 
    This function uses MediaPipe to perform real-time face recognition. It detects mouth 
    status (open/closed) and head orientation (looking left, right, or center), and publishes 
    these statuses to the MQTT broker.
    """
    global face_recognition_running

    # Initialize MediaPipe face mesh and drawing tools
    mp_drawing = mp.solutions.drawing_utils  
    mp_drawing_styles = mp.solutions.drawing_styles  
    mp_face_mesh = mp.solutions.face_mesh  

    # Set drawing specifications for face landmarks
    drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)

    # Open the camera for video capture
    cap = cv.VideoCapture(0)
    cap.set(cv.CAP_PROP_FRAME_WIDTH, 320)
    cap.set(cv.CAP_PROP_FRAME_HEIGHT, 240)
    cap.set(cv.CAP_PROP_BUFFERSIZE, 1)

    # Create a full-screen window to display the face mesh
    cv.namedWindow('MediaPipe Face Mesh', cv.WND_PROP_FULLSCREEN)
    cv.setWindowProperty('MediaPipe Face Mesh', cv.WND_PROP_FULLSCREEN, cv.WINDOW_FULLSCREEN)

    # Start face mesh detection
    with mp_face_mesh.FaceMesh(max_num_faces=1, refine_landmarks=True, 
                               min_detection_confidence=0.5, min_tracking_confidence=0.5) as face_mesh:
        while face_recognition_running and cap.isOpened():
            success, image = cap.read()
            if not success:
                print("Ignoring empty camera frame.")
                continue

            # Convert the image to RGB and process it with the face mesh
            image.flags.writeable = False
            image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
            results = face_mesh.process(image)

            # Convert the image back to BGR for OpenCV processing
            image.flags.writeable = True
            image = cv.cvtColor(image, cv.COLOR_RGB2BGR)

            if results.multi_face_landmarks:
                for face_landmarks in results.multi_face_landmarks:
                    # Draw face mesh landmarks on the image
                    mp_drawing.draw_landmarks(
                        image=image, 
                        landmark_list=face_landmarks, 
                        connections=mp_face_mesh.FACEMESH_TESSELATION, 
                        landmark_drawing_spec=None, 
                        connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_tesselation_style()
                    )
                    mp_drawing.draw_landmarks(
                        image=image, 
                        landmark_list=face_landmarks, 
                        connections=mp_face_mesh.FACEMESH_CONTOURS, 
                        landmark_drawing_spec=None, 
                        connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_contours_style()
                    )
                    mp_drawing.draw_landmarks(
                        image=image, 
                        landmark_list=face_landmarks, 
                        connections=mp_face_mesh.FACEMESH_IRISES, 
                        landmark_drawing_spec=None, 
                        connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_iris_connections_style()
                    )

                    # Calculate the distance between the upper and lower lips
                    upper_lip = face_landmarks.landmark[13]
                    lower_lip = face_landmarks.landmark[14]
                    lip_distance = ((upper_lip.x - lower_lip.x) ** 2 + (upper_lip.y - lower_lip.y) ** 2) ** 0.5
                    if lip_distance > 0.02:
                        cvzone.putTextRect(image, "Mouth Open", (10, 40), scale=1.5, thickness=2)
                        client.publish(mqtt_topic, "Mouth Open")  # Publish to MQTT
                    else:
                        cvzone.putTextRect(image, "Mouth Closed", (10, 40), scale=1.5, thickness=2)
                        client.publish(mqtt_topic, "Mouth Closed")  # Publish to MQTT

                    # Detect gaze direction based on cheek positions
                    left_cheek = face_landmarks.landmark[234]
                    right_cheek = face_landmarks.landmark[454]
                    if left_cheek.x < 0.3:
                        cvzone.putTextRect(image, "Looking Left", (10, 80), scale=1.5, thickness=2)
                        client.publish(mqtt_topic, "Looking Left")  # Publish to MQTT
                    elif right_cheek.x > 0.7:
                        cvzone.putTextRect(image, "Looking Right", (10, 80), scale=1.5, thickness=2)
                        client.publish(mqtt_topic, "Looking Right")  # Publish to MQTT
                    else:
                        cvzone.putTextRect(image, "Looking Center", (10, 80), scale=1.5, thickness=2)
                        client.publish(mqtt_topic, "Looking Center")  # Publish to MQTT

            # Rotate the image to display it correctly in portrait mode
            image = cv.rotate(image, cv.ROTATE_90_CLOCKWISE)
            cv.imshow('MediaPipe Face Mesh', image)

            # Break the loop if the 'ESC' key is pressed
            if cv.waitKey(5) & 0xFF == 27:
                break

    # Release the camera and close all OpenCV windows
    cap.release()
    cv.destroyAllWindows()

def voiceCommands():
    """
    This function listens for voice commands using the microphone and recognizes specific
    commands such as "move forward", "move left", "move right", and "stop". Recognized 
    commands are published to the MQTT broker.
    """
    global voice_command_running

    recognizer = sr.Recognizer()
    mic = sr.Microphone()

    # Adjust the recognizer for ambient noise
    with mic as source:
        recognizer.adjust_for_ambient_noise(source)

    while voice_command_running:
        with mic as source:
            print("Listening...")
            audio = recognizer.listen(source)
        try:
            command = recognizer.recognize_google(audio).lower()
            print(f"You said: {command}")

            # Publish recognized commands to MQTT
            if "move forward" in command:
                print("Moving Forward")
                client.publish(mqtt_topic, "Move Forward")
            elif "move left" in command:
                print("Moving Left")
                client.publish(mqtt_topic, "Move Left")
            elif "move right" in command:
                print("Moving Right")
                client.publish(mqtt_topic, "Move Right")
            elif "stop" in command:
                print("Stopping")
                client.publish(mqtt_topic, "Stop")
            else:
                print("Command not recognized")

        except sr.UnknownValueError:
            print("Sorry, I did not understand that.")
        except sr.RequestError as e:
            print(f"Sorry, there was an error with the speech recognition service: {e}")

def startFaceRecognition():
    """Starts the face recognition process in a separate thread."""
    global face_recognition_running
    face_recognition_running = True
    threading.Thread(target=faceRecognition).start()

def stopFaceRecognition():
    """Stops the face recognition process."""
    global face_recognition_running
    face_recognition_running = False

def startVoiceCommands():
    """Starts the voice command recognition process in a separate thread."""
    global voice_command_running
    voice_command_running = True
    threading.Thread(target=voiceCommands).start()

def stopVoiceCommands():
    """Stops the voice command recognition process."""
    global voice_command_running
    voice_command_running = False

# Add buttons to the GUI for starting voice and face recognition processes
gui.add_button(x=120, y=100, w=180, h=30, text="START VOICE COM", origin='center', onclick=startVoiceCommands)
gui.add_button(x=120, y=180, w=180, h=30, text="START FACE RECOG", origin='center', onclick=startFaceRecognition)

# Display a static text label on the GUI
info_text = gui.draw_text(x=120, y=50, text='MOMO', color="red", origin='bottom')

# Main loop to keep the script running and allow the GUI and threads to function
while True:
    time.sleep(1)

Credits

kennedy saine Banda

kennedy saine Banda

4 projects • 5 followers

Comments