🤖Autonomous Lab Assistant Robot @24Hour UoM Hackathon 2025 - Sony x Raspberry Pi Challenge
_____________________________________________ 1. Story
2. What it does
3.How we built it
4.Challenges we ran into
Accomplishments that we are proud of
What we learned

•

•

•

Published April 13, 2025 © Apache-2.0

Life-size AI Robot that carries your tools around the lab

An autonomous robot concierge for human environments. With computer vision on a single Raspberry Pi! Built for a Hackathon 2025 in 24 Hours!

IntermediateWork in progress24 hours9,311

Life-size AI Robot that carries your tools around the lab

Things used in this project

Hardware components

Raspberry Pi 3 Model B

Raspberry Pi Camera Module V2

Arduino UNO Wifi Rev.2

DC Motor, 12 V

AA Batteries

Raspberry PI AI Camera

Software apps and online services

Arduino IDE

Nordic Semiconductor nRF Connect SDK

Hand tools and fabrication machines

3D Printer (generic)

Story

🤖Autonomous Lab Assistant Robot @24Hour UoM Hackathon 2025 - Sony x Raspberry Pi Challenge 🤖

_____________________________________________ 1. Story

We have experienced the challenges of uncertainty in specific lab environments. "Where can I get my things 3D printed?", "Who can help me do such and such?" or "What are the safety protocols"? And oftentimes, the Internet of Things (IoT) offers incredible potential to help users get trained and interact with all the resources available in the university. By creating a mobile IoT platform. We can bridge the gap between physical spaces and digital resources by building a literal robot that does that :P. Which provides just-in-time training, equipment status monitoring and seamless integration with university systems.

Also, what's the point of having a physical robot if its just gonna stand there and talk to you? That's why LabAssist helps you even transport tools across the room, guiding new users through safety protocols, or providing accessibility support for those with mobility needs. Imagine you need assistance to carry objects around and are unable to do so, it would be very difficult to work without a staff or other person constantly there to help you. So just place your tools on its head tray and it will use machine learning to follow you to your next work station! A personal butler of sorts!

2. What it does

LabAssist is an intelligent autonomous robot that acts as your personal laboratory assistant. The system uses advanced computer vision and motion control to create a seamless assistance experience:

Person Recognition: The robot identifies and locks onto a specific individual using computer vision, maintaining a consistent following distance.
Obstacle Avoidance: Advanced real-time mapping allows LabAssist to navigate around objects and people in the lab environment.
Tool Transportation: A specialized tray mounted on the robot can securely carry tools and materials, freeing up the user's hands.
Interactive Interface: A user-friendly display provides information, responds to queries, and can be used to book resources or request assistance.
Workplace Training: LabAssist can guide new users through safety protocols, equipment usage, and lab procedures.

Lidar scan of workspace, showing robot navigation (Red is path trajectory computation)

3.How we built it

LabAssist combines hardware and software components to create a full-stack solution:1.Hardware Layer:

Raspberry Pi 5 with Sony's IMX500 AI-powered camera module
Arduino Uno with L293D motor driver for motor control
USART communication between Pi and Arduino
4 12V DC motors with a chassis built from Expanded Polystyrene
Interactive display for theuser interface (quite literally can be an Ipad)

2.Software Layer:

Computer Vision: Pre-provided models for the IMX500 for person detection and tracking
Control System: Arduino IDE for motor control programming
Navigation Logic: Python backend sending commands to the Arduino
User Interface: Web-based interface for interaction and information display
IoT Integration: API connections to lab booking systems and communication tools

Initial Small scale Inpiration - What if you put a phone on a buggy and gave it a bunch of functionality?

The prototyping process:we realised that a small version is not sufficient for human-computer interaction and would need to build a large version - however, weight would be a big issues. Hence, after the initial blocking and scale we then moved on to construct the robot with Expanded Polystyrene:

Scaling and the prototype blocking

1 / 2 • Iterative Building of the Buggy prototype using a much lighter and hollow material used

4.Challenges we ran into:

Optimizing ML models to run efficiently on the IMX500 AI camera while maintaining tracking accuracy.
Developing a reliable communication protocol between the Raspberry Pi and Arduino to ensure smooth motor control with minimal latency.
Fine-tuning the robot's following behavior to maintain appropriate distances in varied lab environments.
Balancing computational resources between vision processing, navigation algorithms, and the user interface.

Accomplishments that we are proud of

The physical design of the buggy - Just look at it 🥹
The 3D Lidar Path tracking - woah, looks so realistic 🤯
Accomplishing all of this ideation and execution of a physical product in just 24 hours of no sleep!

What we learned:

The Sony IMX500 camera's ability to handle ML processing directly on the sensor revolutionized our approach to robot vision. Much like how the human visual system pre-processes information at the retina before sending it to the brain, the IMX500 processes visual data at the source before relaying results to the Raspberry Pi. This architecture allows for faster response times, lower power consumption, and more efficient resource allocation - critical factors for a responsive robot assistant. We also discovered that this distributed processing approach opens possibilities for scaling with multiple camera inputs without overwhelming the main processor. So you could have many of the cameras around the robot or around the environment for full mapping of every activity

Code

Follow.py

import argparse
import multiprocessing
import queue
import sys
import threading
from functools import lru_cache

import cv2
import numpy as np

from picamera2 import MappedArray, Picamera2
from picamera2.devices import IMX500
from picamera2.devices.imx500 import (NetworkIntrinsics,
                                      postprocess_nanodet_detection)


class Detection:
    def __init__(self, coords, category, conf, metadata):
        """Create a Detection object, recording the bounding box, category and confidence."""
        self.category = category # The important category!
        self.conf = conf
        self.box = imx500.convert_inference_coords(coords, metadata, picam2)

def follow_target(detection) :
    """FUNCTION TO HANDLE FOLLOWING AFTER PERSON IS DETECTED"""

def parse_detections(metadata: dict):
    """Parse the output tensor into a number of detected objects, scaled to the ISP output."""
    bbox_normalization = intrinsics.bbox_normalization
    threshold = args.threshold
    iou = args.iou
    max_detections = args.max_detections

    np_outputs = imx500.get_outputs(metadata, add_batch=True)
    input_w, input_h = imx500.get_input_size()
    if np_outputs is None:
        return None
    if intrinsics.postprocess == "nanodet":
        boxes, scores, classes = \
            postprocess_nanodet_detection(outputs=np_outputs[0], conf=threshold, iou_thres=iou,
                                          max_out_dets=max_detections)[0]
        from picamera2.devices.imx500.postprocess import scale_boxes
        boxes = scale_boxes(boxes, 1, 1, input_h, input_w, False, False)
    else:
        boxes, scores, classes = np_outputs[0][0], np_outputs[1][0], np_outputs[2][0]
        if bbox_normalization:
            boxes = boxes / input_h

        boxes = np.array_split(boxes, 4, axis=1)
        boxes = zip(*boxes)

    detections = [
        Detection(box, category, score, metadata)
        for box, score, category in zip(boxes, scores, classes)
        if score > threshold
    ]

    for detection in detections:
        if detection.category == 0:  # Check if the category is "person"
            follow_target(detection)  # Call the trigger function
 
    return detections


@lru_cache
def get_labels():
    labels = intrinsics.labels

    if intrinsics.ignore_dash_labels:
        labels = [label for label in labels if label and label != "-"]
    return labels


def draw_detections(jobs):
    """Draw the detections for this request onto the ISP output."""
    labels = get_labels()
    # Wait for result from child processes in the order submitted.
    last_detections = []
    while (job := jobs.get()) is not None:
        request, async_result = job
        detections = async_result.get()
        if detections is None:
            detections = last_detections
        last_detections = detections
        with MappedArray(request, 'main') as m:
            for detection in detections:
                x, y, w, h = detection.box
                label = f"{labels[int(detection.category)]} ({detection.conf:.2f})"

                # Calculate text size and position
                (text_width, text_height), baseline = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
                text_x = x + 5
                text_y = y + 15

                # Create a copy of the array to draw the background with opacity
                overlay = m.array.copy()

                # Draw the background rectangle on the overlay
                cv2.rectangle(overlay,
                              (text_x, text_y - text_height),
                              (text_x + text_width, text_y + baseline),
                              (255, 255, 255),  # Background color (white)
                              cv2.FILLED)

                alpha = 0.3
                cv2.addWeighted(overlay, alpha, m.array, 1 - alpha, 0, m.array)

                # Draw text on top of the background
                cv2.putText(m.array, label, (text_x, text_y),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

                # Draw detection box
                cv2.rectangle(m.array, (x, y), (x + w, y + h), (0, 255, 0), thickness=2)

            if intrinsics.preserve_aspect_ratio:
                b_x, b_y, b_w, b_h = imx500.get_roi_scaled(request)
                color = (255, 0, 0)  # red
                cv2.putText(m.array, "ROI", (b_x + 5, b_y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
                cv2.rectangle(m.array, (b_x, b_y), (b_x + b_w, b_y + b_h), (255, 0, 0, 0))

            cv2.imshow('IMX500 Object Detection', m.array)
            cv2.waitKey(1)
        request.release()


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, help="Path of the model",
                        default="/usr/share/imx500-models/imx500_network_ssd_mobilenetv2_fpnlite_320x320_pp.rpk")
    parser.add_argument("--fps", type=int, help="Frames per second")
    parser.add_argument("--bbox-normalization", action=argparse.BooleanOptionalAction, help="Normalize bbox")
    parser.add_argument("--threshold", type=float, default=0.55, help="Detection threshold")
    parser.add_argument("--iou", type=float, default=0.65, help="Set iou threshold")
    parser.add_argument("--max-detections", type=int, default=10, help="Set max detections")
    parser.add_argument("--ignore-dash-labels", action=argparse.BooleanOptionalAction, help="Remove '-' labels ")
    parser.add_argument("--postprocess", choices=["", "nanodet"],
                        default=None, help="Run post process of type")
    parser.add_argument("-r", "--preserve-aspect-ratio", action=argparse.BooleanOptionalAction,
                        help="preserve the pixel aspect ratio of the input tensor")
    parser.add_argument("--labels", type=str,
                        help="Path to the labels file")
    parser.add_argument("--print-intrinsics", action="store_true",
                        help="Print JSON network_intrinsics then exit")
    return parser.parse_args()


if __name__ == "__main__":
    args = get_args()

    # This must be called before instantiation of Picamera2
    imx500 = IMX500(args.model)
    intrinsics = imx500.network_intrinsics
    if not intrinsics:
        intrinsics = NetworkIntrinsics()
        intrinsics.task = "object detection"
    elif intrinsics.task != "object detection":
        print("Network is not an object detection task", file=sys.stderr)
        exit()

    # Override intrinsics from args
    for key, value in vars(args).items():
        if key == 'labels' and value is not None:
            with open(value, 'r') as f:
                intrinsics.labels = f.read().splitlines()
        elif hasattr(intrinsics, key) and value is not None:
            setattr(intrinsics, key, value)

    # Defaults
    if intrinsics.labels is None:
        with open("assets/coco_labels.txt", "r") as f:
            intrinsics.labels = f.read().splitlines()
    intrinsics.update_with_defaults()

    if args.print_intrinsics:
        print(intrinsics)
        exit()

    picam2 = Picamera2(imx500.camera_num)
    main = {'format': 'RGB888'}
    config = picam2.create_preview_configuration(main, controls={"FrameRate": intrinsics.inference_rate}, buffer_count=12)

    imx500.show_network_fw_progress_bar()
    picam2.start(config, show_preview=False)
    if intrinsics.preserve_aspect_ratio:
        imx500.set_auto_aspect_ratio()

    pool = multiprocessing.Pool(processes=4)
    jobs = queue.Queue()

    thread = threading.Thread(target=draw_detections, args=(jobs,))
    thread.start()

    while True:
        # The request gets released by handle_results
        request = picam2.capture_request()
        metadata = request.get_metadata()
        if metadata:
            async_result = pool.apply_async(parse_detections, (metadata,))
            jobs.put((request, async_result))
        else:
            request.release()