This project details the process of building a robot to find objects in its indoor environment, based on text or image queries supplied by you.
The motivation behind this project is to create an affordable robot that can catalog objects in any general, unstructured indoor environment. This can be useful from small scale applications like finding lost objects in your living room, all the way to large scale applications like inventory tracking in large warehouses.
At a high level, the robot drives around its environment, collecting images through its webcam. It's 2D pose (x, y, and yaw) is also computed by fusing odometry results from the differential steering model, and Apriltag detections set up at known locations around the room.
After collecting these images, the user can then ask the robot if it has seen an object, and where it as seen it, through supplying text or image queries of the object in question. The robot handles each query by displaying the top candidates matching the query to the user, as well as the location where the image was taken, visualized on a map of the environment its in.
All computation is done onboard the Jetson AGX Orin on the robot.
Features:
- Automatic image collection and blurry image filtering
- Teleoperation through keyboard (ssh) or bluetooth
- 2D localization from differential steering robot model
- 2D odometry drift correction from Apriltag detections
- Supports text and image queries for objects
- Location stamp of each picture
The frame for the drivebase is pretty straightforward to build. If you buy the same smart car kit as I did, it comes with an instruction manual for assembly Alternatively, the instructions are online as well, available here: https://www.amazon.ca/YIKESHU-Smart-Chassis-Battery-Encoder/dp/B075LD4FPN (scroll down to "Product Description").
Next, we have to make some 3D prints to hold the Jetson Orin and webcam onto the chassis. The STL files for each print is attached to this project under "Custom Parts and Enclosures".
Print out Jetson Orin Holder (Back Bracket) and Jetson Orin Holder (Front Bracket), and mount them on the back and front of the chassis respectively using M3 screws and nuts. After they are mounted, you should be able to slide the Jetson Orin right on over it.
Next, print out Camera Mount STL. You will also need a 3/4" PVC pipe, cut to an appropriate length depending on how tall you want to mount your camera on your drivebase (mine's 24 inches). Slot the PVC pipe into the camera mount, drill holes through the screw holes on the camera mount into the PVC pipe, and secure the pipe inside the camera mount using M3 screws and nuts. Finally, secure the whole thing onto the chassis using M3 screws and nuts. After that, you should have something similar to below:
To mount the camera, you can simply use tape, velcro, or duo-lock to secure it to the top of the PVC pipe. For the IMU, use the screw hole right next to the front jetson mount, such that one of the IMU's sides is against the mount (preventing it from shifting around).
Finally, one more hardware detail. It turns out the motors for this chassis are quite underpowered for moving a Jetson Orin. In particular, it struggles with point turns, where the motors stall since they can't generate enough torque to rotate the wheels to perform a point turn. A hardware hack to get around this is to mount caster ball bearings on the bottom of the chassis to help alleviate some of the load on the wheels, which mitigated the stalling of the motors and allowed the drivebase to turn. Personally, I saw the best results when I mount ball casters to the front and back of the chassis.
And that's it for the hardware part! Proceed to the next section to wire up all the electronics for the drivebase. You will need to remove the top layer to access the bottom layer where you are putting all the circuitry on. You can then feed the wires to the sensors on top of the chassis through the holes in the top plate.
4. Wiring up the drivebaseFollow the wiring diagram above to connect everything to the ESP32. In terms of actual physical placement of components, you can refer to the picture below:
How I recommend placing the 2 4xAA battery packs on the top of the chassis is shown in the picture below. I used one battery pack to drive each L298N motor driver to squeeze as much performance as I can from the chassis, since it was already struggling to move its own weight with the Jetson Orin on it. Using just 1 4xAA pack (or even 6xAA pack) is not sufficient. However, you may have noticed that the wiring diagrams wants the grounds of both packs to be wired together. A simple hacky way to do this is just to stuff jumper cables in the negative spring terminals of the packs together, and one more cable to connect it to GND on the ESP32. This is also shown below.
To power on all the circuitry, you would power on both 4xAA battery packs, and also the ESP32. Note that at this stage you would power the ESP32 on by connecting it to your computer via a microUSB cable. Once the Jetson is mounted, the cable will be connected to the Jetson instead to power the ESP32.
To power the Jetson, there are 2 methods. The first method is simply connecting the given AC power supply. The drawback of this method is the robot will be tethered to a wall outlet, but the range could still be decent if you use power cord extensions. The second method is through battery power. In particular, I've successfully powered the Jetson off of a 4S LiPO battery using a SBEC to enforce a 12V output into the Jetson. For the latter method, you would have to attach an XT60 connector to the SBEC input, and a 5.5/2.5mm male barrel jack to the 12V output of the SBEC. Then, by plugging the barrel jack into the Jetson, it can be powered entirely off battery power.
Unfortunately, I couldn't run this method of battery power, since the added weight of the battery is enough to cause the chassis to stall during point turns, even with the hardware and electrical optimizations I've made before.
At this point, it would be prudent to test the chassis to ensure its working. Upload drivebase.ino onto your ESP32 using Arduino IDE. Then, download any bluetooth car remote control app on your phone (I'm using "Arduino Car" from the Google Play Store). Verify that you can connect to ESP32_BT_Car, and drive the car around. If you aren't using the same app as me, you will have to modify either the arduino script or the app to ensure the character control commands send by the app matches the ones being monitored by the arduino script.
If you've reached this point, you have finished the entire electrical subsystem for this project! Proceed to the next section to set up your Jetson Orin for software development.
5. Set up Jetson OrinConnect the USB webcam to any one of the Jetson's USB ports. Connect the ESP32 to any of those USB ports as well. Finally, don't forget to connect your chosen power source.
I flashed Jetpack 5.1.2 onto my Jetson Orin (after bricking it by installing too many docker containers without an ssd). I don't think this is strictly necessary as long as you're not running some ancient version of Ubuntu (older than 20.04), since my setup is mostly dockerized. However, in case you need to do this, you just follow https://developer.nvidia.com/embedded/learn/jetson-agx-orin-devkit-user-guide/two_ways_to_set_up_software.html. Note that you will need an Ubuntu 20.04 machine to install the SDK Manager.
Since the Jetson Orin only has 64 GB of space, you willl need extra storage to store the docker containers we are using. The best way to do this is buying a NVMe PCI-e SSD and installing it onto your Jetson. This video (JetsonHacks) is a good reference for the physical installation of the SSD onto your Jetson.
Then, to set up docker so it does everything on your ssd, follow this guide in totality: https://www.jetson-ai-lab.com/tips_ssd-docker.html. Going forwards, make sure to put all your work and code in /ssd
, as the 64 GB on the Orin itself runs out very quickly.
Once you've finished all this, proceed to the next section for a detailed rundown of the software.
6. Software ExplanationThere are several software components required for a fully functional robot that can find your stuff for you from text / image queries. So in this section, I will introduce how each of these components works step by step. The next section after this will detail how to set all of this up on your own Jetson Orin.
6.1ESP32Drivers:
The ESP32 is directly wired to the motor drivers and the IMU. It serves as the middleman, to ultimately allow the Jetson to read IMU data, and command the motors through ROS2. To achieve this, I first used micro_ros_platformio
to connect the ROS2 running on the Jetson to the ROS2 on the ESP32, allowing all communication between ESP32 and Jetson to happen over ROS2.
Then, to control the motors through ROS2, I set up /drivebase_subscriber
as an integer topic, with each 1 mapped to forwards motion, 2 to backwards... etc.
To read IMU data in a stable and computationally efficient way, I leveraged the builtin Digital Motion Processor on the MPU6050 IMU, which averts the need to perform external filtering on noisy raw sensor data. This allows us to obtain the quaternion orietnation, linear acceleration, and angular velocity data from the IMU. I implemented this in Platformio using the MPU6050 library by ElectronicCats. Once IMU data is obtained, I then publish it as an IMU message so it can be used by the rest of our ROS2 stack.
6.2Teleoperation
We need a way to drive the robot around wirelessly. Our ROS2 setup on ESP32 provides a convenient medium to make a keyboard based controller from your laptop over ssh. With the ESP32 integrated with ROS2, we can supply motion commands to the drivebase by publishing the appropriate message over /drivebase_subscriber
. This is leveraged by our teleop_keyboard
node, which simply translates wasd from your keyboard to the corresponding command to /drivebase_subscriber
. If you have a wifi connection, you can ssh into the Jetson from your laptop, and control the robot over wifi through your laptop keyboard!
6.32D Localization
To provide location stamps of collected images, the robot must know its pose (x, y, and yaw). The robot derives its 2D pose through 2 sources (differential steering model and Apriltags), and fuses them together via a weighted average. The final odometry reading is used to location stamp each image via the /odom
topic.
6.3.1Differential Steering Model
The drivebase can be modelled as a differentially steered robot, since it has parallel facing wheels on each side. By using this model, we can derive the discrete state update equations for x and y position (see below). Note that we will always use the live IMU reading for yaw, since it is quite stable using DMP.
Here, a denotes wheel radius, v denote left and right wheel velocities, θ denotes yaw, and Δt denotes the timestep. If you are interested my derivation, you can refer to page 1 of this doc. If you are using the same chassis as me you don't need to change any of the parameters (wheel speed v). However, if you want to measure it for yourself, a quick and dirty way to do it is just to slap a piece of tape on the wheel and film a video of it, dividing the number of revolutions over the time elapsed.
6.3.2Isaac ROS Apriltag Detections
Since just computing 2D odometry from the differential steering model has no feedback component, it is prone to drift over time as errors accumulate, a common issue in robot localization.
In an effort to correct drift, the robot pose can also be computed when the robot detects and identifies an Apriltag through its camera, where the pose of the Apriltag is known in the world.
To implement this, I used Isaac ROS Apriltags. These are an implementation of the original apriltag detection library, but tailored for NVIDIA hardware. This is perfect for our Jetson Orin, as it allows hardware accelerated Apriltag detections, returning the pose and id of any Apriltags detected by the camera. Running their Apriltag detection gave the following results for me:
This code allows us to access the pose of detected Apriltags relative to the world. However, since we set up the Apriltags, we know the locations of all Apriltags in the world. Hnece, we can use the detected pose of the Apriltags to solve for the robot pose. This robot pose can be calculated from this information by obtaining the camera extrinsics relative to the IMU. Then, the robot pose can be computed from a series of transformations between frames, as implemented in the tag_odom
node. The extrinsics are obtained by measuring the x, y, and z offsets of the webcam from the IMU, as detailed in pages 2 and 3 of this doc. Your numbers will vary depending on your relative placement of the camera relative to the IMU.
Combining model based odometry and Apriltag based odometry, we arrive at our final 2D position estimates. In this video, red arrows denotes the model based odometry, pink arrows denotes Apriltag based odometry, and the resultant fused odometry is denoted by the small coordinate axis, representing our final estimate for the robot 2D pose.
The tag detections are a little more unstable than I'd like, but I believe this is largely due to the limitations of using an old USB webcam with low resolution. But overall, it does the job in correcting the accumulated odometry drift from the model based approach, which it itself is quite stable.
6.4Image Collection
The robot periodically saves an image of the live webcam feed, storing the images as raw data to construct our NanoDB over it to support text / image queries for any object.
For an image to be saved, it must pass a Laplacian blur check which filters out images that are too blurry. This is necessary as motion blur commonly happens when the robot is making swift movements. Then, for each image, the most recent odometry pose of the robot from /odom
is recorded for this image, to allow location stamping of every collected image.
6.5NanoDB Queries
Once the embeddings are computed for the image database, we can use NanoDB to perform text and image queries, from which it will return from all images in the database the closest matches to the query. For each image, we also stored a location stamp, so we can also visualize the location it was taken on a 2D map detailing the robot's environment. Although this is less compelling for a few rooms when its visually clear where the image came from, this could be quite useful for large scale operations like warehouses!
Now that we've covered what each software component does, proceed to the next section to set all this up!
7. Software SetupDetailed steps to set up, test, and run the whole software pipeline is documented in the project Wiki, but I've listed them here as well for completeness.
To begin, get all the code for this project by cloning the repo:
cd /ssd
git clone --recursive https://github.com/allenapplehead/eyeforanitem
7.1. Project Setup (from Build your Setup Environment wiki page)7.1.1 Build Docker Containers
First, we build our ROS2 Humble docker container, in which we will run all the scripts related to our project (excluding programs like NanoDB or Isaac ROS Apriltags which we will run in their respective containers).
docker pull dustynv/ros:humble-desktop-l4t-r35.2.1
cd docker
docker build -t ${USER}/ros:humble-desktop-l4t-r35.2.1 .
Next, follow instructions on: https://www.jetson-ai-lab.com/tutorial_nanodb.html to build the NanoDB docker container. You don't need the COCO dataset, so you can skip downloading and indexing it.
7.1.2 Setup ESP32
The Jetson Orin is connected to an ESP32 via microusb to interface with our IMU and DC motors.
This guide is really good to set up Platformio to build your own ESP32 scripts, and set up ESP32 for ros2 integration. Just follow this Youtube guide exactly but use the files in /drivebase
instead of his starter code.
After successful setup, you should have /microros_ws
on the root level of this repository, and main.cpp
built and uploaded to your ESP32. You should be able to see the /imu
and /drivebase_subscriber
topics via ros2 topic list
.
7.1.3 Setup Isaac ROS (with CUDA accelerated Apriltags)
Complete these 4 guides in this order:
- https://nvidia-isaac-ros.github.io/getting_started/dev_env_setup.html
- https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_apriltag/isaac_ros_apriltag/index.html#quickstart (Assuming you are using a USB, monocular camera)
- https://nvidia-isaac-ros.github.io/getting_started/hardware_setup/sensors/camera_calibration.html
- https://nvidia-isaac-ros.github.io/concepts/fiducials/apriltag/tutorial_usb_cam.html (If you are using a different camera, follow the corresponding Isaac ROS guide for your hardware)
After successful setup, you should have /isaac_ros-dev
on the root level of this repository. You should also add export ISAAC_ROS_WS=/ssd/eyeforanitem/isaac_ros-dev/
to your ~/.bashrc
file.
7.2.1 Data Collection
You will have to spin up several terminals (I recommend terminator
on Ubuntu for a clean way to manage several terminals at once). Each individual chunk of commands needs to be run in its own terminal.
7.2.1.1 Run ESP32 serial communication, IMU, and teleop drivers
./run_ros.sh # enter docker container
./run_drivers.sh
7.2.1.2 Run usb-camera driver and Isaac ROS Apriltag Detections
cd ${ISAAC_ROS_WS}/src/isaac_ros_common && \
./scripts/run_dev.sh /ssd/eyeforanitem/isaac_ros-dev
sudo apt-get install -y ros-humble-isaac-ros-apriltag
cd /workspaces/isaac_ros-dev && \
colcon build --symlink-install && \
source install/setup.bash
ros2 launch isaac_ros_apriltag isaac_ros_apriltag_usb_cam.launch.py
If you want to visualize detections, run:
cd ${ISAAC_ROS_WS}/src/isaac_ros_common && \
./scripts/run_dev.sh /ssd/eyeforanitem/isaac_ros-dev
rviz2 -d /workspaces/isaac_ros-dev/src/isaac_ros_apriltag/isaac_ros_apriltag/rviz/usb_cam.rviz
7.2.1.3 Run 2D localization
./run_ros.sh
source robot_ws/install/setup.bash
ros2 launch localizer localizer_launch.py
If you wish to view fused 2D odometry results (between model based and apriltag based localization):
./run_ros.sh
rviz2
open loc.rviz in localizer package
7.2.1.4 Teleoperation
./run_ros.sh
cd robot_ws
source install/setup.bash
ros2 run localizer teleop_keyboard
7.2.1.5 Start Collecting Images
./run_ros.sh
source /opt/ros/humble/install/setup.bash
cd robot_ws
source install/local_setup.bash
ros2 launch image_collector image_collector_launch.py
Saves the images to .../eyeforanitem/jetson-containers/data/datasets/image_collector/...
7.2.2 Post-processing and finding your item through NanoDB
7.2.2.1 NanoDB commands
If you've just collected your dataset, you need to build your embeddings first. This only needs to be once per data collection
./run.sh -v ${PWD}/data/datasets/image_collector/train:/my_dataset $(./autotag nanodb) \
python3 -m nanodb \
--scan /my_dataset \
--path /my_dataset/nanodb \
--autosave --validate
To spin up the gradio webserver (allows text or image queries) and the command line query environment to enter your missing object, run this:
cd /ssd/eyeforanitem/jetson-containers && ./run.sh -v ${PWD}/data/datasets/image_collector/train:/my_dataset $(./autotag nanodb) \
python3 -m nanodb \
--path /my_dataset/nanodb \
--server --port=7860 --k 8 | tee /ssd/eyeforanitem/scripts/out.txt
To process your most recent command line query (saved into out.txt
by previous command), and get location stamps for where the image was taken, run this:
./run_ros.sh
cd scripts
python3 find_obj.py
You will then see the top 8 image matches corresponding to your query on the screen. If you see an image that matches your object, click on it, and another window will pop up displaying the location and orientation of the robot at which the image was taken.
7.3. Calibration and Apriltag Configuration (from Calibration and Apriltag Configuration wiki page)There are a few configuration files you need to modify. I go through each of them in the sections below.
7.3.1 Camera Extrinsics and Apriltag Params
robot_ws/src/localizer/config/tags.yaml:
To set up camera extrinsics, you will have to obtain t_B_CB (the translation from body to camera in body frame) and R_B_C (rotation matrix from camera frame to body frame). These don't have to be super accurate, just good enough. The frames for my setup are defined in the picture below. Note that the frame convention for the camera and IMU are predetermined, so the only difference would be how you mount one relative to the other. For detailed descriptions about the frames, refer to page 2 and 3 of https://drive.google.com/file/d/1re9Iy_McZaQV1AxbQdOVJHL6NPVnaKzJ/view?usp=sharing.
To set up Apriltags, first print them. I used the tag36h11 family from https://github.com/AprilRobotics/apriltag-imgs. Make sure each tag's ID is unique when you print them out. Personally, I printed tags out with these settings:
To then enter each tag into the configuration file, you will need to specify its id, its position, and rotation. For id, its simply the id of the tag. For position, you have to measure, from the center of the Apriltag, the x, y, and z displacement to the origin of the world (assumed to be the robot starting point).
For rotation, you can leave the roll to be 90 degrees, pitch to be 0, and adjust the yaw accordingly. For example, if the tag is facing the robot, the yaw is 180 degrees.
Once you've entered all your tags into the yaml file, it is useful to visualize them. To do this, run the following visualization script to plot the tag locations and orientations in the room:
./run_ros.sh # enter our docker container
cd scripts && python3 visualize_tags.py
Note that you can run the whole pipeline wtihout apriltags as well, just set num_tags to 0. Note that then your odometry will accumulate error as the robot runs around for longer, but overall its pretty stable for at least a dozen meters or so.
7.3.2 Image Collector Params
robot_ws/src/image_collector/config/image_collector_config.yaml:
You are okay with leaving the defaults for pretty much everything here. However, useful parameters to tweak include capture_period
and blur_thres
that affects how the images are collected.
7.3.3 Visualization
visualize_tags.py:
On L70-72, you may have to play around with the room size depending on the environment you are deploying your robot into.
...
room_size = [5, 3] # [m]
ax.set_xlim([-1, room_size[1]]) # [lower limit, upper limit]
ax.set_ylim([-1, room_size[0]])
...
Once you complete these steps, you should have a fully working robot that can help you find lost objects!
8. ConclusionIn general, the robot was quite effective in finding queried objects, thanks to NanoDB. When the object is in the scene, such as slippers, bags, water bottles, soda, backpacks, etc, several images of said object always shows up in the top 8 results displayed to the user.
The feedforward differential steering model for odometry also worked surprisingly well, largely due to how stable the MPU6050 IMU was when using the DMP integrated into the sensor itself. Combined with Apriltag localization to correct drift, the 2D odometry system on the system is quite robust, and can work for much larger scale inventory tracking problems beyond just finding personal items, such as warehouse operations.
However, the robot still struggles on very small objects (e.g. keys), mostly due to the poor resolution / FOV of an off the shelf usb webcam I've found at home from several years ago. For the key query specifically, it actually displays the keyboard I placed on the ground to start the drivers on the robot. Hence ambiguous meanings for objects like keys can confuse the user in terms of the image outputs returned by NanoDB.
Next steps:
- Using a stereo camera (allows the robot to map ou and autonomously navigate in unknown environments (SLAM)). Right now I have to hardcode my map, but having a sensor appropriate for SLAM like a LiDAR or Stereo Camera will allow these maps to be created automatically through the ROS2 navigation package (for example)
- A more powerful chassis that can better sustain the weight of the Jetson Orin. Maybe just custom build a chassis with stronger motors, resembling a turtlebot3 or similar
- Allow extra DOF on the camera (pan and tilt), lets the robot cover a greater field of view
- Handle more queries (beyond just finding objects). For example, "drive into the kitchen: did I leave my fridge door open?" The image taken by the robot at that location can then be passed to a LLM (LLAVA, GPT4, etc) to answer this query for the user
MPU6050 library by ElectronicCats
10. BonusSome miscellaneous tips to help streamline your workflow :)
Record a rosbag (for offline processing)
This is helpful as you can just run 7.2.1.1
and 7.2.1.2
(necessary drivers) as well as 7.2.1.5
to drive the robot around, save everything the robot sees, and perform the rest of the steps in 7.2
by replaying back this data through ros2 bag play <name of your bag>
./run_ros.sh
source robot_ws/install/setup.bash
cd /rosbags # or where-ever you put your rosbags
ros2 bag record -o tag_test /drivebase_subscriber /image_raw /imu /tag_detections
Graphics passthrough with ssh
For some reason, applications like teamviewer for VNC won't work on the Jetson unless an external monitor is attached to it. I think you can solve this by buying a dummy plug or doing fancy linux tricks but this didn't impact me enough to warrant those methods.
Instead, to get gui applications over an ssh connection on Windows, I used the ssh extension on vscode to do my development, and used VcXsrv to display graphical applications. This is a good guide I followed to set up VcXsrv on my Windows machine https://yunusmuhammad007.medium.com/jetson-nano-vs-code-x11-forwarding-over-ssh-d97fd2290973.
Debugging
Some common bugs I ran into, and how to fix them: https://astonishing-cent-401.notion.site/Errata-da65f60bc02746cf85c1e03068670a59?pvs=4
Comments