Tired of lugging around your tools or constantly trekking to your coworker's desk to send documents? Imagine if the equipment you need could effortlessly follow you wherever you go. Say hello to Betsy, your very own personal assistant robot. Betsy is here to revolutionize the way you work and navigate your lab environment, making your daily tasks a breeze.
Our theme choice for the VIAM challenge revolves around crafting a personal assistant, drawing inspiration from our previous project Bella, a front desk assistant chatbot, and Tipsy, the drink-serving robot from VIAM. Our creation, named Betsy, embodies Scuttle's personal assistant robot.
Initially, the aim was to develop a robot that would follow alongside me, carrying tools and necessary items for tasks within our lab. However, we soon realized the potential for more. Why limit Betsy's capabilities when we could fashion a fully-fledged smart robot assistant capable of a multitude of tasks?
Beyond tool-bearing, Betsy is designed to efficiently ferry messages and documents between locations. Moreover, she doubles as an entertainment hub, readily providing cheerful conversation and companionship within the office environment.
Problem StatementTraditional programming methods relying on if-else loops to interpret commands proved inefficient. Accents and nuances in verbal instructions often led to misinterpretation, hindering Betsy's ability to understand context and accurately fulfill tasks.
To address the challenge of creating a versatile personal assistant robot, we implemented a Large Language Model (LLM) to enhance Betsy's decision-making capabilities.
here is an example where upon receiving the command, the system processes the natural language input and understands the intended action. Despite the discrepancies or inaccuracies in the speechecognition process, the language model (LLM) comprehends the context and meaning of the command. In this example, even if the speech recognizer didn't accurately transcribe the command, the LLM interprets the request correctly and initiates the action to navigate to Yogen's desk. This demonstrates the robustness and contextual understanding capabilities of the LLM, allowing for effective interpretation and execution of user commands, even in scenarios where the speech recognition process may introduce errors.
FunctionalitiesBetsy operates by processing audio input, enabling natural interaction with users.
Betsy boasts four primary functionalities tailored to enhance efficiency within the lab environment:
- Person following
- Moving to set locations in the lab
- Sending messages / memos when requested to one person to another
- Speech Interaction
To build your own drink-carrying robot, you need the following hardware:
- Raspberry Pi, with microSD card, set up following the Raspberry Pi Setup Guide.
- Assembled SCUTTLE rover with the motors and motor driver that comes with it.
- T-slotted framing: 4 single 4 slot rails, 30 mm square, hollow, 3’ long. These are for the height of the robot.
- T-slotted framing: 2 single 4 slot rail, 30 mm square, hollow, 12 inches long. These are to create a base inside the robot to securely hold the drink box.
- T-slotted framing structural brackets: 30mm rail height.
- Three ultrasonic sensors
- A 12V battery with charger
- DC-DC converter, 12V in, 5V out
- USB camera with microphone
- USB speakers
- A box to hold tools
- Optional: waste bin
- Optional: Emergency switch
- Optional: HDMI Capacitive Touch Monitor
- Optional: 3D printer
- Optional : EasySMX ESM-9101 Wireless Controller or a similar gamepad and dongle. This is the controller that comes with the SCUTTLE rover.
To construct your own Betsy, ensure you have the following hardware setup:
i. Refer to VIAM's tutorial for Tipsy by following the link: - https://www.viam.com/post/autonomous-drink-carrying-robot
ii. Optionally, you may customize the design with your own upgrades. For example, we incorporated a touch screen monitor to enable browsing and access to other applications.
iii. Utilize an EasySMX ESM-9101 Wireless Controller or a similar gamepad and dongle from the Scuttle kit. This allows manual control of the robot whenever necessary, even if your device isn't connected to the internet. Alternatively, the VIAM app can be used for remote control.
iv. We ensured safety with an emergency stop switch. This switch instantly disconnects the motors when pressed, providing an immediate halt to operations while keeping the Raspberry Pi online. This prevents the need to reboot the VIAM server each time.
v. For enhanced functionality, consider adding encoders to your motors. Encoders enable precise monitoring of motor position and speed, facilitating feedback control mechanisms for improved machine control. VIAM provides a comprehensive guide on implementing feedback control with their components :-
https://www.viam.com/post/using-feedback-control-with-viam-components
i. Within the Viam app, add a new machine and and assign it a distinctive a name like "Betsy". Then, proceed with the setup instructions to install viam-server on your Raspberry Pi and establish a connection to your machine.
Note: Ensure that your Raspberry Pi is running the Bullseye version of the OS or a newer iteration. My attempts with the Buster version proved incompatible with VIAM.
ii. Configuring Components
a. Board
Click the + icon next to your machine part in the left-hand menu and select Component. Select the board type, then select the pi model. Enter local as the name and click Create.
b. Motors
Click the + icon next to your machine part in the left-hand menu and select Component. Select the motor type, then select the gpio model. Enter right as the name and click Create.
After clicking Create, a panel will pop up with empty sections for Attributes, Component Pin Assignment, and other information.
In the board dropdown within Attributes, choose the name of the board local to which the motor is wired. This will ensure that the board initializes before the motor when the robot boots up.
Then set Max RPM to 50
In the Component Pin Assignment section, type in 16 for a and 15 for b corresponding to the right motor wiring.
Now let’s add the left motor which is similar to the right motor. Add your left motor with the name “left". type in 12 for a and 11
c. Base
Next, add a base component, which describes the geometry of your chassis and wheels so the software can calculate how to steer the rover in a coordinated way:
Click the + icon next to your machine part in the left-hand menu and select Component. Select the base type, then select the wheeled model. Enter scuttle_base as the name or use the suggested name for your base and click Create.
In the right dropdown, select right and in the left dropdown select left. Enter 250 for wheel_circumference_mm and 400 for width_mm. The width describes the distance between the midpoints of the wheels. Add local, right, and left to the Depends on field.
d. Configure the camera
Add the camera component:
Click the + (Create) button next to your main part in the left-hand menu and select Component. Start typing “webcam” and select camera / webcam. Give your camera a name. This tutorial uses the name cam
in all example code. Click Create.
In the configuration panel, click the video path dropdown and select the webcam you’d like to use for this project from the list of suggestions. You can also type in video0
if you are unsure which camera to select.
go to the Control tab to confirm you can see the expected video stream.
Note: It's worth noting that occasionally, even cameras from the same batch may exhibit compatibility issues. While a camera might function correctly within the VIAM control tab, it may not work as expected within the Python-SDK. If you encounter this issue, consider trying a different camera to troubleshoot the problem.
e. Gamepad
Click the + icon next to your machine part in the left-hand menu and select Component. Select the input_controller
type, then select the gamepad
model. Enter a name or use the suggested name for your input controller and click Create.
The controller config adds the gamepad controller to your machine. However, it is not functional yet. To link the controller input to the base functionality, you need to add the base remote control service.
f. Ultrasonic sensors
Add a sensor component:
Click the + icon next to your machine part in the left-hand menu and select Component. Select the sensor type, then select the ultrasonic model. Enter ultrasonic1 as the name and click Create.
Then fill in the attributes: enter 38 for echo_interrupt_pin and 36 for trigger_pin. Enter local for board.
You have to configure the other ultrasonic sensors. For each of the additional ultrasonic sensors, create a new component with a unique name like ultrasonic2 (where “2” indicates it’s the second sensor), type sensor, and model ultrasonic. In the attributes textbox, fill in the trigger_pin and echo_interrupt_pin corresponding to the pins your ultrasonic sensors are connected to.
g. Configure the detection camera
To be able to test that the vision service is working, add a transform camera which will add bounding boxes and labels around the objects the service detects.
Click the + (Create) button next to your main part in the left-hand menu and select Component. Start typing “webcam” and select camera / transform. Give your transform camera the name detectionCam
and click Create.
iii. Configuring services
a. Sensors service
To efficiently gather readings from all sensors simultaneously, you'll need to configure the sensors service in VIAM. Follow these steps:
- Click the + icon next to your machine part in the left-hand menu and select "Service."
- Choose "Sensors" from the list of available services.
- Name your sensors service; for consistency, you can name it "sensors.
b. base remote control service
Click the + icon next to your machine part in the left-hand menu and select Service. Select the base remote control
type. Enter a name or use the suggested name for your service and click Create.
c. Configure the ML model service
Navigate to your machine’s CONFIGURE tab.
Click the + (Create) button next to your main part in the left-hand menu and select Service. Start typing ML model
and select ML model / TFLite CPU from the builtin options.
Enter people
as the name, then click Create.
In the new ML Model service panel, configure your service.
Select Deploy model on machine for the Deployment field. Then select the viam-labs:EfficientDet-COCO
model from the Models dropdown.
d. Configure an ML model detector vision service
Click the + (Create) button next to your main part in the left-hand menu and select Service. Start typing ML model
and select vision / ML model from the builtin options.
Enter myPeopleDetector
as the name, then click Create.
In the new vision service panel, configure your service.
Select people
from the ML Model dropdown.
Note: If you opt to upload or train your custom machine learning model, VIAM provides guidance through their instructional video on YouTube here. I attempted to integrate a quantized version of MobilenetV2 to expedite inference; however, the performance improvement was not substantial. Consequently, I reverted to VIAM's default model for optimal functionality.
e. Speechio
To integrate the speech interface into our project to enable natural interaction with our local LLM. We'll use a speech module from the Viam registry, allowing our machine to:
- Listen to spoken commands and convert them to text for processing by the LLM.
- Convert the LLM's response back into audio for us to hear.
To configure the speech module, we'll navigate to the Services configuration tab in the Viam app and select the speechio module. From there, we can customize its behavior, such as allowing the module to listen to the microphone and setting a trigger prefix for commands. For more detailed information on VIAM's speech service and its integration into your project, please refer to VIAM's GitHub page.
Note: For instructions on setting up your microphone for use with VIAM's speech module, please consult VIAM's guide available on their GitHub page.
iv. Setting up VIAM Python SDK
If you're using Windows like me, VIAM does not support it directly, you can still install VIAM SDK by using WSL (Windows Subsystem for Linux).
To install VIAM SDK via WSL:
- Install WSL on your Windows system. You can find a tutorial on YouTube for this. (https://www.youtube.com/watch?v=HD0ikz7nnMc)
- Once WSL is installed, use VS Studio for ease of use.
- Select the Linux distribution you want to us. I used Ubuntu 22.04 (Jammy).
- Install VIAM and WSL in extensions
- Open a new terminal
Run the following commands in WSL:
sudo apt update
sudo apt upgrade
sudo apt install python3 (3.9 or above)
- Follow the instructions found on VIAM's website (https://python.viam.dev/#installing-from-source) to install all the requirements needed.
By following these steps, you'll be able to set up VIAM SDK on your Windows system via WSL and start deploying Python code to VIAM machines for remote access and control.
- Why use the Python SDK?
With the Python SDK, you can deploy Python code directly to a VIAM machine using its VIAM ID, API key, and address, without the need to SSH into the robot. This method eliminates the constraint of needing to be connected to the same Wi-Fi network as the robot, making it ideal for remote access. For example, we successfully sent a program from a computer in Johor Bahru to a machine in Kuala Lumpur and enabled it to run in real-time.STEP 3: Getting started with robot logic
First, the code imports the required libraries:
import asyncio
import os
import re
import subprocess
from viam.robot.client import RobotClient
from viam.rpc.dial import DialOptions
from chat_service_api import Chat
from speech_service_api import SpeechService
import openai
from viam.components.base import Base
from viam.services.sensors import SensorsClient
from viam.components.sensor import Sensor
Then it connects to our robot using a machine part API key and address. Replace these values with your machine’s own location secret and address
robot_api_key = os.getenv('ROBOT_API_KEY') or ''
robot_api_key_id = os.getenv('ROBOT_API_KEY_ID') or ''
robot_address = os.getenv('ROBOT_ADDRESS') or ''
base_name = os.getenv("ROBOT_BASE") or "scuttle_base"
detector_name = os.getenv("ROBOT_DETECTOR") or "myPeopleDetector"
sensor_service_name = os.getenv("ROBOT_SENSORS") or "sensors"
camera_name = os.getenv("ROBOT_CAMERA") or "cam"
pause_interval = os.getenv("PAUSE_INTERVAL") or 3
STEP 4: Person TrackingThe person-following algorithm leverages VIAM's Vision, Sensors, and Base Services to autonomously track and follow a person while navigating around obstacles. By analyzing the video feed from the robot's camera using the Vision Service, the algorithm detects the presence of a person. Simultaneously, it utilizes sensor data obtained from VIAM's Sensors Service to detect obstacles in the robot's path. Based on these inputs, commands are sent through the Base Service to control the robot's movement, ensuring it adjusts its trajectory to follow the person while avoiding collisions. Here's how it works:
Detection of a Person: Using a pre-trained computer vision model, the algorithm identifies if a person is present in the camera's field of view. It analyzes each frame of the video feed to locate objects that resemble humans.
- Detection of a Person: Using a pre-trained computer vision model, the algorithm identifies if a person is present in the camera's field of view. It analyzes each frame of the video feed to locate objects that resemble humans.
- Calculating Object's Position: Once a person is detected, the algorithm calculates the position of the person within the frame. It determines the object's center coordinates and calculates the distance from the object's center to the middle of the frame along both the x and y axes.
- Adjusting Robot's Movement: Based on the calculated distance from the person to the middle of the frame, the algorithm decides how the robot should move. If the person is not in the middle of the frame and there are no obstacles detected, the robot adjusts its movement to align the person with the center of the frame.
- Obstacle Avoidance: The algorithm also takes into account readings from sensors to detect obstacles in the robot's path. If obstacles are detected, the robot adjusts its movement to avoid collisions while continuing to track the person.
- Continual Tracking: The algorithm continuously runs in a loop, ensuring that the robot continuously adjusts its movement to follow the person as they move within the camera's field of view.
Here's a more abstracted version of the person_detect
async def person_detect(detector, base, camera_name, sensors, sensors_svc, frame_width=640, frame_height=480, tolerance=50):
while True:
# Obtain detections from the vision detector
detections = await detector.detect_objects(camera_name)
# Obtain obstacle readings from sensors
obstacle_readings = await get_obstacle_readings(sensors, sensors_svc)
# Process detections and obstacle readings to determine robot's actions
for detection in detections:
# Process each detected object (e.g., person)
process_detection(detection, base, frame_width, obstacle_readings, tolerance)
# If no person is found and obstacle readings are clear, spin to search for a person
if no_person_found() and all(reading > 0.4 for reading in obstacle_readings):
await base.spin_to_search()
await asyncio.sleep(2) # Adjust the delay time as needed
await base.stop()
STEP 5: Mapping Locations around the labThe code utilizes VIAM's discrete movement functions, such as move_straight
and base.spin
, to establish a predefined map for the robot's movement from its current location to a destination. Here's an improved explanation of how the code works:
await base.move_straight(distance=2000, velocity=10)
await base.spin(angle=45, velocity=20)
1. The code maintains a global variable current_location
to keep track of the robot's current location. This variable is updated whenever the robot moves to a new location.
2. When the user issues a command, the code checks if the command contains the keyword "tell". If it does, the code extracts the message from the command and asks the robot to repeat it after reaching the destination.
3. after successfully moving to a new location, the update_robot_location
function is called to update the global variable current_location
with the new location.
async def yogen_desk_to_fad_desk(robot, user_input, speech):
global current_location
print("Going to Faddulah's desk")
await base.spin(angle=45, velocity=20)
await base.stop()
await base.move_straight(distance=2000, velocity=100)
await base.stop()
if "tell" in user_input.lower():
message_to_repeat = re.search(r'(?<=tell\s)(.*)', user_input.lower()).group(1)
await speech.say("You were told: " + message_to_repeat, True)
current_location = "station b"
await update_robot_location(current_location)
Overall, the mapping of locations around the lab is achieved by maintaining state information about the robot's current location and implementing logic to move the robot between different locations based on user commands. This enables the robot to navigate the lab environment effectively while responding to user requests.
STEP 6: Simple RAG for LLMRetrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model. It’s an architectural approach that improves LLM applications by leveraging custom data. Specifically, RAG retrieves relevant information from external knowledge bases and provides it as context for the LLM [source].
Here, I've utilized OpenAI's GPT-3.5 as the large language model (LLM) for faster inference on my raspberry pi. However, for those with access to boards boasting higher computational power, exploring the configuration of a local LLM using VIAM's local-LLM service is an option worth considering. This service allows for the setup and optimization of a local LLM, harnessing the increased processing capabilities of specialized hardware to further enhance performance and efficiency.
async def query(payload):
completion = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
max_tokens=256,
messages=[{"role": "user", "content": payload}]
)
msg = completion.choices[0].message
out = msg['content']
return out
i. Context and Custom Data: The script initiates an interactive loop, facilitating communication between users and Betsy. An initial prompt string establishes context by outlining Betsy's capabilities and expected responses. This contextual information serves as custom data, akin to the external knowledge incorporated in RAG systems.
prompt = "You are betsy As a personal assistant robot, you have 4 functionalities: 1. Following me, 2.Going from the base to Yogen's desk, 3.Going from Yogen's desk to Fad/Faddulah's desk and exit. When you recive input, you will respond with one of the following options: Function 1, Function 2, Function 3 or Exit. If the input is not related to these functionalities, you will answer appropiately to the question."
ii. User Input Mapping: User commands, such as "Following me" or "Base to Yogen's desk," are captured and processed within the script. These commands are mapped to predefined functionalities, mirroring the way RAG systems integrate retrieved facts with language generation to tailor responses.
iii. Response Generation: Through the query function, user input is transmitted to the language model (GPT-3.5 Turbo) for response generation. The robot processes user input and executes the corresponding functionality. If the input is not related to the predefined functionalities, it responds appropriately to the question.
user_input = await query(prompt + user_input)
print(user_input)
await handle_user_input(user_input, robot, speech, base, sensors_svc)
# Handle user input based on commands
async def handle_user_input(user_input, robot, speech, base, sensors_svc):
global current_location
global following_mode
if user_input.lower() == "exit":
print("Exiting conversation...")
return
if user_input.lower() == "function 1":
print("Following you...")
following_mode = True
subprocess.run(["python3", "bella_yogen.py"])
return
if user_input.lower() == "function 2" and current_location != "station a":
await base_to_yogen_desk(robot, user_input, speech, base, sensors_svc)
return
if user_input.lower() == "function 3" and current_location != "station a":
await yogen_desk_to_fad_desk(robot, user_input, speech)
return
response = user_input
clean_response = re.sub(r'\n', '', response)
print("AI:", clean_response)
await speech.say(clean_response, True)
Note: While the script doesn't directly retrieve external knowledge, the model's responses are influenced by the initial context provided, resembling the impact of retrieved facts in RAG systems.
paper link : Large Language Models for Robotics: A Survey
Essentially, we've fused a powerful large language model (LLM), acting as the "brain," with hardware components serving as the "body," culminating in a versatile and capable system. Through this integration, our model exhibits a range of capabilities:
- Perception and Navigation: It perceives its surroundings and navigates through the environment, leveraging sensory input and decision-making algorithms to move efficiently.
- Reasoning and Decision-Making: Equipped with sophisticated reasoning abilities, our model can analyze information, evaluate options, and make informed decisions to achieve its objectives.
- Control and Execution: With precise control mechanisms, it executes commands and actions, translating high-level instructions into tangible outcomes in the physical world.
- Interaction with the Physical World: Capable of interacting with physical objects and environments, our model interfaces seamlessly with the world around it, enabling meaningful engagement and task completion.
Yet, this is merely the beginning. Through ongoing experimentation and innovation, we have the potential to unlock even greater capabilities and applications. I believe the potential inherent in large language models (LLMs) holds the key to unlocking the next generation of truly intelligent robots. By integrating LLMs into robotics, we empower these machines with the ability to comprehend, reason, and communicate in natural language, bridging the gap between humans and robots.
What did we accomplish?We've achieved a significant milestone with our project, demonstrating the capabilities of VIAM in seamlessly integrating various components, functions, and services. While the project may not be flawless, it serves as a strong foundation, showcasing the potential of VIAM's platform. The intuitive UI for configuration streamlined our development process, making it easy to set up and manage the robot's functionalities even by an amateur. Additionally, the active VIAM community on platforms like Discord provided invaluable support, allowing us to quickly address any challenges we encountered along the way. Overall, this project represents a promising start, highlighting the power and accessibility of VIAM for creating intelligent robotic systems.
We've successfully developed a comprehensive personal assistant solution at an incredibly affordable cost of under $300. This achievement not only highlights the accessibility of robotics technology but also serves as a testament to the innovation and creativity it inspires. By showcasing the capabilities of our personal assistant, we hope to ignite the imaginations of beginners and enthusiasts alike, providing them with a solid foundation to embark on their own robotic projects. With its affordability and versatility, our creation opens doors to countless possibilities and serves as a compelling example of what can be accomplished with determination and ingenuity in the field of robotics.
Future ImprovementsThe beauty of this project is its boundless potential for expansion and enhancement. The versatility of Betsy opens doors to countless future improvements limited only by our imagination. Some of our ideas for further development include:
- Security Patrol: Leveraging VIAM's motion detection and face identification services, Betsy could evolve into a robust security patrol system, autonomously monitoring designated areas and identifying authorized personnel.
- Self-Docking Capability: Inspired by Matt Vella's YouTube video, Betsy could acquire the ability to autonomously dock and recharge, ensuring uninterrupted operation without human intervention.
- Music Player Integration: Betsy could expand to include music playback functionality, seamlessly connecting to platforms like Spotify and fulfilling requests for tunes. Additionally, the creation of a dedicated speaker component through VIAM's module creation tools could further enhance the music listening experience.
These future enhancements not only elevate Betsy's utility but also exemplify the endless possibilities for innovation and advancement within the realm of robotics. With each upgrade, Betsy becomes not just a personal assistant, but a dynamic and indispensable asset in various environments and applications.
Comments