Our group wanted to integrate artificial intelligence into the robot platform we had developed throughout the semester to give it the ability to solve a maze without using an algorithm. In the demonstration video, you can see the robot solves the maze represented by the grid on the carpet floor. To do this, our group fine-tuned chatGPT's 4o-mini model and deployed the model using OpenAI's API. This model was responsible for solving the maze.
Fine tuning is a feature of any open source LLM, such as Meta's Llama models, that enables you to train a model on your own dataset. Fine tuning one of these models is essentially training the model to generate outputs, or answers, in a formatted way for a specific application. You're essentially making a model really good at one thing.
Fine tuning models does require compute resources such as GPUs. This creates a significant barrier to entry as these GPUs can be quite expensive. Fortunately, OpenAI offers an easy way to fine tune their models at a relatively cheap cost.
Making the DatasetThe first step towards fine tuning an LLM is to create the training dataset. This dataset must be formatted in a certain way to be compatible with the existing LLM. For the case of tuning with OpenAI the dataset should be formatted as a jsonl file with a list of messages where each message has a role, content, and optional name. Here is an example of what a message might look like. I've removed the prompt to keep this example short, however you can view this in the source code. Essentially, the prompt asks the system to solve a specific maze and the assistant provides the solution in a formatted output. This example happens to have no solution and therefore the assistant says "no path".
To create this dataset an automation script was used to randomly generate any number of mazes and their respective solutions. This script is set up to create any size square maze and any split of mazes with existing solutions and those where no path exists from start to end. The initial goal was to create a generalized training set with mazes ranging in size from 2x2 matrices up to 10x10 matrices, with the start and end points placed anywhere within the boundaries. Making this dataset is no problem, and in an ideal world, we would make this dataset a large as possible. However, we do not live in an ideal world where money is no object with plentiful compute resources. We quickly realized training a chatGPT model can be expensive. There is a positive relationship between cost of training and the size of dataset. We wanted to find a balance between the size of training set and cost of training. To keep costs to a minimum ($20.00) we restricted the number of entries, or mazes, in our training set to 5000. This forced us to limit certain features of the dataset in order to maximize the affect 5000 examples would have on a model. We decided to only train on 5x5 mazes with the same start and end points.
Therefore, our training set contained 5000 messages, each containing a 5x5 maze and its solution. Each of these mazes were formatted in a certain way. They only contain four characters: 's', 'e', 'x', and 'o'. The 's' character represents the start and is always located in the top left corner, while the 'e' character represents the end and is always located in the bottom right corner. Here is a visual representation of what a maze might look like.
The solutions to all 5000 mazes were solved with an implementation of the A* algorithm. This algorithm gives a deterministic solution for each maze, giving us confidence our training dataset is accurate.
Fine tuningAfter making the dataset, it is time to train your model. OpenAI provides a fine tuning feature for their API users. You can write a script using API calls to start the training or you can simply create a fine tuning job and drop your formatted training set there on OpenAI's web platform. It will take some time to train the model. For our training set, it took roughly one hour to train. Training time will increase dramatically with the size of dataset. Once training is complete, OpenAI will provide you with an output model ID. This is a string that is now able to be used as you would use any other LLM chatbot with a simple API call specifying this model ID.
The above code is implemented in a python notebook to easily run and receive outputs. Again, the prompt is referencing a string variable for readability and best code practice.
Transferring output to the robotThe output of the maze-solving LLM is exported as a file in JSON format. This JSON file is imported to the LabView program to transfer the output commands to the robot car and execute them accordingly. The output is read by the LabView program as soon as the “READ” button is pressed in the UI of the running program, and the text in the JSON file is shown at the window inside the UI.
The example of the output is shown below:
{
"instructions": [
"forward",
"left",
"forward",
"forward",
"forward",
"forward",
"right",
"forward",
"forward",
"forward"
]
}
Using function blocks in the string category in LabView, the text in the JSON file is manipulated into a string array, to sort out only the distinguishable commands from it. Then, the string array is handed over to the for-loop to execute each command sequentially. Each command is examined by the case structure, to send out the corresponding velocity and turning parameters to the robot at each step. There are four types of moving commands in the JSON file:
- forward: the robot drives forward by the length of a unit grid
- backward: the robot drives backward by the length of a unit grid
- right: the robot turns right by 90 degrees, staying at a unit grid
- left: the robot turns left by 90 degrees, staying at a unit grid
After executing all the moving commands, which indicate that the robot car has arrived at the destination, the “drop” command is executed to drop the load that has been carried by the robot car.
Controlling the bucket (load-carrying structure)The "drop" feature of the robot is implemented by connecting a servo motor to the F28379D Launchpad. The servo motor is initially set at an angle of -90 degrees. When the drop command is executed, the servo motor would rotate 90 degrees clockwise in order to release the bottom latch of the bucket. The Launchpad communicates with the servo motor by using the EPWM peripheral (EPWM8).
The drop command would be initiated when the robot reads the value 1 from LabView. LabView would output a value of 1 to the variable "drop", after all of the move commands are executed. Consequently, the robot would read this data from LabView and execute the servo rotation code. The drop command is set to rotate the servo 90 degrees, wait for 2 seconds, and then return back to its original position. Once the drop command is executed, the drop variable will be reset back to 0 in order to prepare for the next set of commands.
Building the bucket (load-carrying structure)The load pickup/dropoff mechanism functions by rotating an HS-311 servo motor 90 degrees to open and close the bucket base. The mechanism was designed by first taking measurements of the dimensions of one of the sides of the robot car to create a mounting plate for the mechanism in CAD (Autodesk Fusion 360). A housing/mold of the HS-311 servo motor was created and then combined with the mounting plate to create one singular part. This part was then 3D Printed (with PLA) and attached to the robot via adhesive velcro patches. The bucket base was attached to a shaft connecting to the servo flange and was designed to be fitted around the flange attached to the HS-311 servo motor. This part along with the bucket/chute were 3D printed (with PLA).
It is important to understand the broader implications and opportunities our project suggests for the future of robotics and AI integration. Although classical algorithms like A* and BFS remain the go-to solutions for well-defined path finding problems, large language models offer an added layer of flexibility and adaptability. As demonstrated in this project, they have the potential to learn from more complex, less structured data sources and could be adapted to handle tasks beyond standard maze-solving, such as reacting to it surroundings.
As the cost of fine-tuning goes down and open-source LLMs become more accessible, it will likely become easier to experiment with larger datasets and new problems. In the future this project would likely expand to using computer vision techniques to provide the robot will real time feedback of its surroundings to then determine a path forward.
Comments
Please log in or sign up to comment.