The Hobbyist’s Guide to Building Bots That Think
Creativity and a Jetson Orin Nano Super are all you need to make an accessible robot that can reason and interact with the world — and you.
Taking the time to learn about robotics can be very rewarding, after all, you can make anything from a robotic servant to an autonomous race car or a companion bot once you master the basics. But getting that knowledge is easier said than done. Robotics is a highly multidisciplinary field, where expertise in mechanical, electrical, and software engineering are necessary to build anything of much consequence. That is a very high hurdle to clear, and it leaves many armchair engineers and other hobbyists feeling like there is no room for them in the world of robotics.
While it is true that robotics can be an exceptionally challenging field to work in, that does not have to be the case. YouTuber Nikodem Bartnik has made a hobby out of smashing the assumption that robotics is just plain too difficult to get involved in. Along the way, he has built robots using parts scavenged from toys, old RC cars, and even bits of cardboard. These robots have typically been controlled by low-cost and hobbyist-friendly hardware platforms like Arduino and Raspberry Pi, making them very accessible.
There are, of course, limits to what can be accomplished using these types of tools, however. So in a recent project, Bartnik made it his goal to build a much more capable robot — but while still using only highly accessible hardware, if not a bit more on the pricey side. The plan was to run a multimodal large language model (LLM) onboard the robot to give it the ability to interact with its surroundings, and maybe even some personality.
Wait, a multimodal LLM? Surely that must be too computationally-intensive for any relatively inexpensive development board that can fit on a hobbyist's robot, and also too hard to figure out how to get it running, you say? Did you hear that ringing? 2023 just called, and they want their LLM back. Yeah, it’s easy now.
The initial build consisted of a simple wheeled robot that is powered by a Raspberry Pi 3 Model B equipped with a camera, which handles the basic functions, like motor control. With a few added sensors, this is a fairly capable platform for learning about robotics. And with wireless access to a nearby server running an LLM on a GeForce RTX 4060 GPU, the robot could analyze its surroundings and make decisions about the best course of action to take when it was given a task to complete.
But this is a somewhat complicated setup, and all of the hardware is not located on the robot which adds to latency during operation. So, Bartnik replaced the external server with the brand spanking new Jetson Orin Nano Super Developer Kit, which can handle running the LLM locally, while coming along for the ride. Getting the LLM up and running was a snap with the help of Ollama. Specifically, a LLaVA model was selected for use, as these models combine language understanding with a vision encoder.
This setup worked flawlessly and appeared to be quite responsive. As far as the function of the robot was concerned, it ranged from somewhere between impressive to funny, depending on the task thrown at it. But most importantly, this very capable robot was simple to build, and you might want to consider giving it a try yourself if you want to break into the field of robotics.