We have an issue. When lunchtime rolls around, we usually have two choices:
- Buy something (either from the cafeteria or a fast food place)
- Pack your own.
The former is obviously much more convenient, but is often not nutritionally ideal and can get expensive.
The latter is much more economical and nutritionally ideal, but having lived by myself for three months at an internship, I know how difficult and stressful it is to get up at 5 am every morning, scramble to prepare food and yourself, and be out the door by 7:30am to beat rush hour or risk being late. And let's not forget when you get home at 7pm after a whole day of work, you still need to make dinner for yourself!
Let's face it, most people do not have the luxury of time to get up early in the morning and prepare a nutritionally ideal meal before heading out to class/work. Thus, majority of people tend to choose the former option, and over time, may encounter nutrient deficiency or develop other health problems.
I want to build an automated food preparation machine, aka Robot Chef, that will solve this problem of time and make the latter option much more accessible for everyone.
Methodology: But how??However, building a robot that can cook anything for you is a daunting task, and is impossible to build in all at once, so I've decided to take it one step at a time.
They say knowing 100 ways to cook an egg is an indicator of a good chef, so why don't we start there?
Our goal:
Let's build a robot that is physically capable of handling the general case, but initially write software that will allow it to cook an egg!
We will first need some way to manipulate objects in space, including but not limited to:
- Finding stuff that we want
- Picking stuff up (without breaking it)
- Putting stuff down (again, without breaking it!)
It would be really easy to build a giant 3-axis CNC-type gantry to manipulate objects with millimeter precision, but that's overkill and would frankly limit what the robot could do due to the nature of the robot's construction. Since the ultimate goal is to make a robot chef for the general case, we can sacrifice a bit of precision for more flexibility.
And what better way to do that than a robot arm?
Aside: Hold up, didn't you originally say you wanted to make dumplings?I knew that making dumplings was already going to be a huge challenge especially while simultaneously juggling university classes, but it was feasible. Since the contest was going to end during my summer break, I had also planned on spending my summer finishing up whatever I couldn't finish at school.
However, after I received the hardware, I was also received an internship offer that had me working across the country in a completely different time zone for the whole summer.
Rather than giving up and be doomed cooking things myself for the rest of my life, I decided to build my robot remotely by having my extremely motivated brother, who didn't really know much about AI or electrical engineering, physically build and debug the robot according to instructions written here, while I remoted into my setup at home to write the software. This also allowed me to make sure that everything I write in here is clear enough so that any beginner can understand.
This unexpected event added another layer of complexity to this already complex project, so I decided to make the difficult choice of modifying the task that I wanted the robot to do in order to keep everything within the realm of feasibility for both my brother and I, but still implement the majority of the features that the dumpling bot would have had, just on a smaller scale.
Also, this post is already getting super super long, and I'm not sure if covering the entire dumpling robot in a single post would be a good idea anyways.
Though do expect a Part 2 documenting it making dumplings in the near future!
Ok understood, continue...This project is complex and definitely not for the faint of heart, and only those who are prepared to really advance their skills in mechanical engineering, electrical engineering, coding, and debugging should really attempt this project.
My journey will be divided into 3 sections, the mechanical portion, the electrical portion, and the software portion.
PrelimI lied, there are actually 4 sections. I've decided to create an extra section that goes through installing all the third-party software we need and properly downloading all of our project files to make sure we are all on the same page.
Hold onto your hats folks, there's a whole lot of things to download! Starting with software tools. Note that these all get downloaded on the computer you normally use for development.
Step 1: Download Git Bash. We will use git bash as our terminal to ensure consistency and to make downloading from Github easier. Also, knowing how to use git is a great skill to have in your toolkit.
Step 2: Once Git installs, execute it and navigate to the location where you want to store all your project files. For me, I like to create a file in my Documents folder called Projects and store my code and project files there. Since I have a Windows machine, I do:
cd /c/Users/<your username>/Documents
mkdir Projects
cd Projects
For Macs or Linux machines, run this instead:
cd ~/Documents
mkdir Projects
cd Projects
Clone the project repo. This will download all the files necessary to do this project. You use this command in Git Bash:
git clone https://github.com/dakche/RobotChef/
Once the file downloads, run this command to go inside the folder you just downloaded:
cd RobotChef
Step 3: Download VS Code (NOT VISUAL STUDIO) if you don't have it already. This is undeniably one of the best and cleanest code editors ever made. To install, just follow the steps given by the installation wizard.
Step 4: We need a Linux computer to do run Vitis-AI and if you don't want to dual boot, I highly recommend using a virtual machine. There are some guides that say WSL works (if you are using windows), but in my experience a lot of things do not work, so to save yourself some headache, let's create a virtual machine, and VMWare is a very good free option that I will be using. This is for Windows only.
Step 5: Get Ubuntu 20.04. We will run this inside VMWare, which I will get to in Part 3: Software.
Step 6: We also need Balena Etcher, which we will use to flash the KR260's OS onto the SD card. Note that this is a fairly old version of Etcher, I've been having some issues with the newer versions, but I have verified that this version works so I'm sticking with it.
Step 7: Download and install the Arduino IDE 2.0 found here.
Ok, we are done with downloading software for the Software portion, let's download some software for the Mechanical portion.
Step 8: Download Cura. Cura is a piece of software that prepares our STL (essentially our 3D models) for 3D printing. It is a very popular, easy to use software and I highly recommend it.
Ok now let's begin the actual build!
Part 1: MechanicalI first needed a robot arm, and not those small ones you find on Amazon and the likes, I needed one that could lift a couple of pounds without my wallet losing a lot of pounds, so I opted to spend hundreds of hours designing and building one completely from scratch.
3D printing Steps:
Step 1: Open up the Part_1_Mechanical folder in the project folder.
Step 2: Inside you will see that there are three folders:
- 3D Print Parts: Contains the STL files that are ready for 3D printing.
- Laser Cut Parts: Contains the DXF files that are ready for laser cutting.
- Non_printable_mockups: Contains the parts that are not 3D printed (i.e. motors, servo, PVC pipes, etc.). This folder is mainly used as a list to keep track of the parts that need to be bought and not made.
Step 3: Open up Cura, follow the instructions to set up your 3D printer (for me, I have a Creality CR-10), and then once you see the main screen, click on the folder icon on the top left hand corner.
Step 4: Import the files that need to be 3D printed (they cannot all be printed in one go, so you will need to split it up into multiple batches). The quantity of each part will be in the file name (ex: baseplate_x4 --> make 4 of these).
Step 5: IMPORTANT! Because my CAD software is set to inches and Cura operates in mm by default, ALL MODELS NEED TO BE SCALED UP BY 2540% (10x, convert to inches) IN ALL DIMENSIONS IN ORDER FOR EVERYTHING TO BE THE RIGHT SIZE.
Step 6: In order to decrease the weight, save PLA, and increase print speed while still maintaining the strength of all the 3D printed parts, in Cura, go to the Print Settings > Custom located on the top right and make these changes:
- Profile > Standard Quality
- Walls > Wall thickness > Wall line count = 4
- Material > Printing Temperature = 220
- Material > Build Plate Temperature = 60
- Infill > Infill Density = 10%
- Infill > Infill Pattern = Cubic Subdivision
- Speed > Print Speed = 60
- Speed > Infill Speed = 80
- Speed > Wall Speed = 60
- Support > Generate Support
- Support > Support Structure = Tree
- Support > Support Overhang Angle = 50
- Build Adhesion Type > None
Step 7: Click the big blue Slice button and plug in your 3D printer's microSD card. When it is finished slicing, the button will change into "Save on Removeable Device." Click on that button and then you can remove your microSD card and put it in your 3D printer for printing.
Laser Cutting Steps:
Step 1: Laser cutting the parts in the Laser Cut Parts folder is very simple, as there is no special scaling or modifications that need to be made to the files. However, just make sure you import them with a 1:1 scale and the units are in inches.
Recap: The whole robot arm is designed in CAD, built from 3D printed parts, laser cut parts, and PVC pipes and then assembled using 4-40 machine screws.
Structural assembly is pretty straightforward, with some minor quirks that I will get to later, but here are some exploded views of the structural portion of the robot arm:
Here is the real arm completed up to this step, note that I had not cut the upper arm PVC pipe yet.
The real thing is way cooler:
Motor Selection
- The turntable is powered from a 130VDC Bodine worm gear motor which is absolutely overkill for this project, but had the right mass and form factor for its job (and I had one lying around that was waiting to be used).
- The shoulder and elbow 12V motors are salvaged from an old massage pad that was doomed for the trash, but had a high quality nylon worm gear drive that I took full advantage of.
- The wrists are controlled telescopically using two drill motors and will be explained in more detail in the next subsection.
- Finally, the claw/gripper is controlled with a standard metal gear servo.
Wrist Control
I am extremely proud of this aspect of the robot, which is how the wrist movement was achieved. I was limited to drill motors as my set of "powerful motors" due to my budget, but obviously those motors are way too bulky and heavy to be directly attached to the wrist, so I had to get really creative with how I was going to get wrist movement. But then I came across this video and got inspired:
Instead of stepper motors and fishing line, I used my drill motors and bike shifter cable. I designed pulleys that could be attached to the chuck of the drill motors, meticulously wrapped and tensioned the stiff bike cables around the pulleys and the robot, and mounted the two drill motors to acrylic base.
The way it works is that the pulley is broken up into two sections, one where the pulley is wound clockwise and the other, counterclockwise. When the motor spins in one direction, it applies tension one bike cable and releases tension on the other, allowing the connected pulley to spin in the direction of the motor.
The image above shows the proper winding of the pulley system. The image on the right gives a front view of the pulley. The circles indicate the cable is coming towards you (out of the page) and the X's indicate the cable is going away from you (into the page).
As you can see in the above photo, if the red cable gets pulled, the wrist will bend up, and if the blue cable gets pulled, the wrist will bend down.
The tension in the cables comes from the cable housing. Similar to bikes, you compress the cable housing to apply tension to the cables by sliding these back and tightening them:
Similarly, there are also tensioners near the drill motors that can be adjusted if you need more tension.
I was extremely proud with the end result and was surprised at how little backlash there was in the system!
Gripper
Initial tests revealed that the fingers on the gripper weren't grippy enough to grab an egg without dropping it. Since the gripper was going to go near boiling water, it should be reasonably heat resistant as well.
We had an old hot water bottle that was starting to leak, so a piece of it was cut up and zip-tied over the fingers of the claw, which was the perfect solution.
Mechanical: What I would do if I did it again- Use something other than bike cables for telescopic arm control. Bike cables are a little too stiff for this application (and too dangerous for my fingers, those little cables are sharp!)
- Don't use PVC pipe. The upper arm portion (elbow to wrist) is extremely stiff-- the lower arm portion (elbow to shoulder), not so much. Though I used a PVC pipe for both sections, the PVC pipe in the upper arm portion basically served as a connector while the 3D printed parts took all the bending forces (and it took them very well, thanks to the 4 walls). However, the lower arm relied subjected the PVC pipe to handle some bending force, simply due to the fact that more of it is exposed. This meant that overall, the robot arm was a little bit floppy, due to the compliance of the lower arm, which is not really ideal.
- Use real bearings. Yes, large inner diameter bearings are expensive, but your robot will work a whole lot better.
Time spent in this section: 167 hours
Part 2: ElectricalNow the electrical aspect of the robot arm can be broken down into three main parts:
- Control - High level overview of how everything is controlled, the nitty-gritty aspects will be covered in the programming section
- Power - The circuits doing the heavy lifting and driving the motors
- Feedback - The sensors closing the control loop and keeping the power circuits in check, as well as the main (top-level) circuit layout
Control Architecture
Deciding how to approach the motor control was one of the hardest decisions I had to make. The original plan was to use the KR260 to control all the H-bridges and interface with the position sensors because of the huge advantages the KR260 provides:
- The KR260's onboard FPGA means I can create specialized motor controller accelerators that can take care of the PWM-ing and other motor control logic without sacrificing processor bandwidth.
- I can have many instances of hardware PID loops that can run in real time, giving me ultimate flexibility and tight motor control integration.
Unfortunately like I mentioned before, at the time of writing and making of this project, I am not anywhere near the KR260 and do not have the full freedom to get creative, experiment, and push the KR260 to its limits when it comes to hardware, so I decided to play it safe and work with what I am familiar with for now. However, stick around the dumpling robot, where I will definitely take advantage of the KR260's FPGA for motor control.
Ultimately, I went with this what I like to call a "the brains and the brawns" setup:
- Brains: Have the KR260 do the video processing, inverse kinematics calculations, AI processing, and sending commands to move the arm.
- Brawns: Have a Teensy 4.0 as an external real-time co-processor that closely interfaces with the motor controllers, feedback potentiometers, PID loops, etc. and control the arm based off of the KR260's commands.
Communication
The communication between the AMD Kria KR260 and the Teensy will be done via UART communication aka USB. This is chosen over more complex, but faster communication protocols like I2C or SPI due to reliability reasons. With a project this complex, it is best to keep it simple and stick with something that will work out of the box without too much tinkering and debugging.
The communication protocol is dead simple. The KR260 sends data in this format:
90;90;90;90;3000;1500
- First number: Shoulder Motor desired angle in degrees
- Second number: Elbow Motor desired angle in degrees
- Third number: Wrist Rotation Motor desired angle in degrees
- Fourth number: Wrist Bend Motor desired angle in degrees
- Fifth number: Turntable motor power
- Sixth number: Claw servo power
Here we go, my favorite part!
I elected to go with DC brushed motors for a few reasons:
- I already have some
- A powerful DC brushed motor is much cheaper than a powerful stepper motor
- Some already come with a gearbox!
- Controlling them is really really easy
DC brushed motors are pretty to understand.
1. You connect a DC source (say a battery) to the two terminals and it starts spinning.
2. You flip the polarity and it spins the other way.
3. You short the two contacts of the DC motor and it applies a strong braking force.
Manually connecting, flipping, and shorting the connections on a DC motor by hand is impractical to say the least, so we can design a circuit that a microcontroller can interface with so it can do all that for us. This is achieved with an H-bridge.
This little circuit is pretty clever. Here's how it works:
If SW1 and SW4 are turned on and SW2 and SW3 are turned off, the motor sees + on its left and - on its right and spins in one direction.
If SW2 and SW3 are turned on and SW1 and SW4 are turned off, the motor now sees - on its left and + on its right and spins in the other direction.
If SW3 and SW4 are on and SW1 and SW2 are off (or vice-versa), the terminals of the motor are shorted and the motor is braked.
But that's too much to worry about, so I chose to use a powerful commercial H-bridge module.
Interfacing this module is fairly simple:
- Vcc: Gets connected to the Teensy's 3.3V, supplies power to the logic circuitry on the H-bridge
- GND: Gets connected to the Teensy's GND, common ground
- L_IS and H_IS: Not used
- L_PWM: Gets connected to a PWM pin on the Teensy, if you look at the high level schematic, this pin controls SW1 and SW3 (If L_PWM is high, SW1 is on, SW3 is off and vice versa)
- R_PWM: Also gets connected to a PWM pin on the Teensy, controls SW2 and SW4
- L_EN and R_EN: Both get connected to Teensy's 3.3V, just enables all the H-bridge internal switches.
The H-bridges are supplied with 12V from a desktop ATX power supply, but the turntable motor which normally requires 130VDC runs far too slowly on 12V, so I decided to add an extra boost converter that converts 12V --> 25V before feeding that power into that H-bridge, as you will see in the schematic at the bottom of this Power section.
Power Supply
Each motor on the robot arm, except for the claw and turntable motors, pull more than 10A when stalled, so it was crucial to have an extremely powerful 12V source power the entire contraption. Thus, I am using a powerful desktop computer ATX power supply that is capable of outputting over 40A of current on the 12V rail and as an added bonus, also outputs 3.3V and 5V for auxiliary devices.
ATX power supplies are a great way to get regulated 12V, 5V, and 3.3V outputs, just short the green wire to ground and the power supply will fire right up and you have access to all the power rails!
Here's what the schematic of the robot looks like right now. Note that the little flags on the left indicate what pin on the Teensy 4.0 to connect the pin to.
Note: If you are new to reading schematics, it's very simple. Just make the connections shown in green! If two wires cross but THERE IS NO DOT, then they DO NOT CONNECT (they simply cross over each other). However, if there IS A DOT, then those two wires DO CONNECT. The 3.3V flag at the top simply means it gets connected to the Teensy's 3.3V pin and the two GND flags mean that they get connected to the Teensy's GND pin.
I highly recommend starting to have some sort of labeling and wire management, otherwise things will get super confusing very very quickly.
We need a way to for the robot arm to know where it is, and there are two ways to approach this:
- Absolute position measurement: "The forearm is at exactly 69 degrees with respect to the upper arm." --> typically used for measuring constrained motion
- Relative position measurement: "The forearm is exactly 69 degrees from its starting position." --> typically used for measuring of unlimited movement (i.e. continuous rotation)
Which type of measurement you choose application-dependent. In this project, since the movement of each axis of the robot arm is constrained between 0 to 180 degrees, and we need to be able to accurately position the robot arm, absolute position measurement is the obvious choice here.
To achieve this, I simply used potentiometers on each axis of the robot arm and wired them to the analog input pins of the Teensy according to the following schematic:
On the arm, the potentiometers are press-fit onto the shaft of each joint in the robot arm:
Note that the turntable does not have a feedback potentiometer. I will close the turntable feedback loop using video and visual processing on the KR260.
Electrical: What I would do if I did it again- Get a more powerful power supply. You can't go wrong with a little more amps in the reserve. Downside is cost.
- Use brushless motors for actuation. They are way more powerful, precise, quiet....with the only downside is increased control complexity and cost.
- Use rotary encoders instead of potentiometers for feedback. More immune to noise and age. The output of encoders are digital, which means the data can be seamlessly processed by the KR260's FPGA fabric, allowing me to perform all motor control using the KR260 via hardware, and ditch the Teensy 4.0. The only downside is that most rotary encoders are used for relative positioning, which means the robot arm will have to perform a homing every time it boots up to ensure everything resets.
- Ditch the Teensy. USB/Serial communication has a lot of associated latency, but it is simple and can provide a physical separation between low-level and high-level code, allowing for modularity. However, in high performance robotic systems, tighter integration is always preferred.
Time spent in this section: 63 hours
Part 3: SoftwareThis software portion will be divided into three parts:
- Overview: Control Flow
- Low-level: Robot arm control software that runs on the Teensy 4.0
- High-level: AI code that runs on the KR260.
The control flow kind of goes something like this:
On the most fundamental level:
- KR260: The camera records some video
- KR260: Scans each frame for the desired object using Vitis-accelerated YOLOv5
- KR260: Calculates desired position based off location data
- KR260: Maps a path from current arm position to desired position and discretizes the path
- KR260: Sends the series of arm positions to the Teensy via USB in the form of individual joint angles
- Teensy: Takes those individual joint angles and moves each arm joint to that angle using a PID algorithm
Let's first tackle the last part of the block diagram: Teensy+Robot Arm
As mentioned previously, the Teensy constantly runs a loop that tries to make the physical orientation of the robot arm equal to the commanded orientation. At the heart of all this is something called a proportional-integral-derivative loop (PID).
If you want to learn more about the theory and tuning of PID loops, check out my super detailed Instructables post where I explained possibly everything you would ever want to know about drones, built one, and coded a basic PID loop from scratch: https://www.instructables.com/The-Ultimate-Guide-to-Building-a-Quadcopter-From-S/
Unlike my drone project where I used a relatively slow Arduino, we are using the super powerful Teensy 4.0 this time-- that means we don't need to write super-optimized, unreadable code!
To make the code super human-readable, reliable, and clean, I opted to use the PID library written by Brett Beauregard instead of writing it from scratch.
A very short crash course on PID ----------------------------------------------
PID control is very simple to understand. In short, if you have some sort of contraption where there is a motor and a sensor, you can use PID control to carefully control the motor to go accurately get to some position.
There are three variables and three parameters. The three variables are:
- Current setpoint
- Desired setpoint
- Control
What PID does is find the difference between the current and desired setpoint, more commonly known as the error and find a suitable control signal to drive the motor at in order to get the error to go to zero. How it gets there depends on the three parameters:
- Proportional (kp) --> Your driving force, applies a control proportional to your error. Kp controls how aggressive you want the motor to respond to errors (bigger = more aggressive). But don't make it too big otherwise you will overshoot and start oscillating.
- Derivative (kd) --> Keeps the P term under control by applying damping as your error gets closer and closer to zero, it will resist harder the closer you are to the setpoint and the faster you are moving. A bigger D means higher damping, but don't make it too big, otherwise your system will be too slow.
- Integral (ki) --> There comes a point where P and D get you 99% to your setpoint, but you aren't there yet: P is practically zero, since the error is almost zero. D doesn't help, as it just resists P. The I term gives the system that final kick and takes care of that 1%. Generally, just a small ki is needed. Too big, it kicks too hard and can actually make a pretty good system go unstable.
Finding these values is a game of trial and error, but you generally tune P first (with the others set to 0), then D, then finally I.
You don't need to do this for this project, as I have graciously gone through and painstakingly tuned all the values for you already 😊.
--------------------------------------------------------------------------------------------------
Enough theory, let's get the Teensy set up and coded.
Software: Low Level: Setting up the Arduino IDE and TeensyStep 1: Open the Arduino IDE that you've downloaded in the Prelim section.
Step 2: Install the Teensy Add-on drivers by going to Files > Preferences and then next to "Additional Board Manager URLs" enter:
https://www.pjrc.com/teensy/package_teensy_index.json
Like this:
Step 3: Hit OK and then go to Tools > Board > Boards Manager and look for "Teensy" by Paul Stoffregen. Hit Install.
Step 4: Plug in your Teensy and go to Tools > Port and select the port that says Teensy. Or you can use the drop down menu and select your board and port.
Step 5: Upload the Blink sketch in Files > Examples > Basic > Blink by clicking on the arrow.
Step 6: Verify that the orange LED on the Teensy is blinking, and you are all set up!
Step 7: Now navigate to the folder RobotChef > Part_3_Software > Low_Level and open up arm_ctrl.ino. Upload the code to the Teensy 4.0.
Software: Low Level: AlgorithmNow, let's take a closer look at the code used to drive the robot arm. The code is very simple to understand, so let me first give a very quick overview of how it works and then discuss some cool features and areas of future improvement in the code.
Algorithm Overview
The Teensy's code takes the form of a standard Arduino program: there is a setup function which runs once on startup, and then there is a loop function that runs continuously forever.
The setup just contains code that configures all our parameters and assigns the Teensy's general purpose input output (GPIO) pins as either input or output.
The algorithm is the interesting part and runs in the loop function:
- Check to see if there are new arm movement commands from the KR260 (these are our setpoint values for each axis)
- Read* all the potentiometers and get current position data^ (these are our current values of each axis)
- Send the current position data to the KR260
- Crunch the numbers using the PID library (using the data above and the method described in the crash course to PID control.
- Apply the throttle values returned by the PID function to each motor
- Repeat
Pretty simple
* ^: There are little interesting features (quirks) in the low level code that are worth pointing out.
Software: Low Level: QuirksSensor Low Pass Filter
In a perfect world, if you want to get a sense of your environment, you would read from a sensor and you will get a perfect value, every time.
Well in case you haven't noticed, we do not live in a perfect world, thus sensors do not give consistent values. Instead, they typically return values that fluctuate around the true value. If you were to take these numbers at face value however, your robot will be very jittery because it thinks that the arm is constantly vibrating or shifting around, when in reality it is completely still. This is called noise.
A common way to reduce this noise is to use a low pass filter. Plainly speaking, it's just a fancy way of saying taking the average of something. We assume that the noise has an average of zero, meaning that if we take the average of a lot of samples, we can get a value that is very close to the true value. Obviously you can't take too many samples, otherwise your robot will react too slowly to real changes.
The way the sensors get low-passed in software is through the use of a ring buffer. This is how it works:
- An array of X elements is created, where X is a value that you pick (in this case, X=20)
- The array gets pre-filled with sensor readings on startup
- Every time the main loop repeats and sensor values get read, the new value replaces the oldest value in the array (for you software folks, what I'm making is a FIFO [first in, first out])
- When the current position data is polled, it actually takes the average of all X elements in the array, not the value it just read.
Now let's go back to the beginning of the block diagram: object --> camera.
The high level software revolves around object detection and uses Yolov5 at its core. Yolov5 allows us to quickly train a CNN (convolutional neural network) and then feed a video source into that CNN to find objects. However, deploying CNNs on a low-thread count device like a CPU is not ideal, and it would run very very slowly. How do we increase the performance of running this CNN? This is where the AMD Kria KR260 comes in.
Essentially a CNN is a software structure that looks something like this:
The "NN" or neural network part of "CNN" is simply a bunch of little computational nodes that get connected to each other.
What the "C" or convolutional means is that a small frame gets slid over the whole image and its pixel values are fed into the network, and hopefully when the frame captures something interesting, the right output nodes will activate.
The AMD Kria KR260 consists of a standard ARM CPU, a dedicated real-time processor, and the star of the show: a field programmable gate array (FPGA). In simple terms, an FPGA is a matrix of lookup tables that are chained together and can be configured via software to simulate circuits in hardware! In other words, you can use code to design hardware. Here's what an FPGA looks like:
The squares are lookup tables that can be reconfigured to perform any logic operation you want and the circles are switch matrices that control how the squares are connected to each other.
It doesn't take a genius to see the similarities between the physical structure of an FPGA and a CNN.
The idea is to take a CNN and modify it in some sort of way to make it fit on an FPGA, and ultimately use the speed and low latency of hardware to increase AI performance. So, let's get to it!
Software: High Level: Kria KR260 SetupStep 1: Download the latest AMD Kria KR260 image.
Step 2: Plug your microSD card into your computer and open Balena Etcher. Select target (the second step) should automatically populate with your SD card.
Step 3: Click Flash from file and select the Kria-specific Ubuntu OS you downloaded.
Step 4: Hit Flash! and wait.
Step 5: If flashing fails on validation, don't worry, you can just carry on. This sometimes happens on Windows and is not a cause for concern.
Step 6: Plug your microSD card into the KR260; USB WiFi adapter, keyboard, and mouse into the right-hand set of USB ports; monitor to DisplayPort; and power plug into the power jack but DO NOT PLUG THE POWER BRICK INTO THE WALL YET.
Why the right-hand set of USBs? At the time of writing, the AMD Kria KR260 has a firmware bug where the left-hand ports do not work, and so to resolve this the firmware needs to be upgraded. In other words, the KR260 needs a BIOS update, which is what we are going to do next.
Step 7: Steps 7-15, unless noted, will be performed ON THE KR260. Now plug the power brick into the wall and boot up Ubuntu. At first startup, the username and password for Ubuntu is
- Username: ubuntu
- Password: ubuntu
It will then proceed to ask you to change the password to something else.
Step 8: Once you see the desktop, go ahead and open up Terminal.
Step 9: Run this command to see what the boot firmware configuration is right now.
sudo xmutil bootfw_status
Something like this will appear:
Image A: Bootable
Image B: Bootable
Requested Boot Image: Image A
Last Booted Image: Image A
.... <-- (Whatever your current image name is)
The KR260 stores its boot firmware onboard in two separate locations. This is so in case one gets corrupted (uploaded wrong file, power goes out during firmware update, etc.), the Kria does not become a very expensive paperweight because it will simply switch to the other one.
When you receive a new Kria KR260, it comes pre-shipped with the same bootloader in both A and B locations. What we want to do is:
- Overwrite Image B with the updated firmware
- Tell the KR260 to boot off of Image B
- Verify it works
- Make it permanent
Step 10: To get the updated firmware, clone the RobotChef repo similar to how you did it in Prelim:
cd ~
git clone "https://github.com/dakche/RobotChef"
Go the RobotChef folder and navigate to Part_3_Software > High_Level. There should be a file called:
BOOT-k26-starter-kit-20230516185703.bin
This is the new firmware we want to load onto the KR260. Keep note of the file path location (chances are it will be ~/RobotChef/Part_3_Software/High_Level, but double check, this is very important!)
Step 11: Overwriting Image B. To do this, run this in Terminal:
sudo xmutil bootfw_update -i <YOUR_PATH_to_firmware_file>
So if you ran all the previous commands as instructed, you should be executing this command:
sudo xmutil bootfw_update -i ~/RobotChef/Part_3_Software/High_Level/BOOT-k26-starter-kit-20230516185703.bin
Step 12: Tell the KR260 to boot off of Image B. This should be done automatically after completing Step 11. Double check the status again by running:
sudo xmutil bootfw_status
This time, it should display something like this:
Image A: Bootable
Image B: Non Bootable
Requested Boot Image: Image B
Last Booted Image: Image A
BOOT-k26-starter-kit-20230516185703
The most important things to note are that now the "Requested Boot Image" has been changed to Image B, so the next time we restart, the KR260 will boot using Image B.
Step 13: Verify that it works. Reboot the KR260 either using the GUI or by typing the following in Terminal:
sudo reboot
After Ubuntu has again loaded, open up Terminal and run the following again:
sudo xmutil bootfw_status
This time you should see:
Image A: Bootable
Image B: Bootable
Requested Boot Image: Image A
Last Booted Image: Image B
BOOT-k26-starter-kit-20230516185703
Note how this time, the requested boot image is Image A. This means if you were to reboot again, the KR260 will revert back to Image A and use that to boot up. This is because let's say you made a mistake when changing Image B, the KR260 will automatically switch back to Image A (which we know already works), and from there you can fix Image B and try again.
Step 14: Make it permanent. Now that we know Image B is safe to use and works, we have to make it the default boot image! To do this, we run this command in Terminal:
sudo xmutil bootfw_update -v
Make sure to run this in the session right after the reboot from Step 13, otherwise the KR260 will revert back to Image A and you will have to restart from Step 11.
Step 15: Verify and test it out! Run the bootfw_status command again and this time we should see that the Requested Boot Image is Image B (not A). Now that we are on the updated image, you are free to use any USB port you want.
Whooh! That was a lot. You still here? Great! Let's get this thing to detect eggs!
Software: High Level: Cameras, Custom Datasets, and Yolov5YOLOv5 (you only look once) is uses a high performance convolutional neural network that can quickly recognize and classify objects. However, to use YOLOv5 to its fullest potential, we first have to train it.
Training any requires two things:
- Lots of labeled data
- Lots of computing power
Let's take care of the first bullet point. Labeled data is essentially a bunch of random pictures of what you want YOLOv5 to recognize, with a bounding box drawn over the object in the picture and identified as the object of interest. We have two options:
- Go online and find a pre-labeled dataset
- Make our own custom dataset
The first option is the easiest....if the data exists. Unfortunately for some reason the internet does not have a large, high quality, labeled dataset of raw eggs, we were forced to use Option B.
Creating a custom dataset
This process is the same even if you are training Yolov5 to detect and classify many objects, but for now we are going to train Yolov5 to detect just one thing: an egg. I asked my brother to help do this for me, so I'll let him explain what he did:
Step 1: We are going to use the online service called RoboFlow to make our dataset, so to start training your custom model, create an account with RoboFlow.
Step 2: Select “Continue with Google” so that RoboFlow would be linked to your Google Account
Agree to the Terms and Services and Privacy Policy
Step 3: Once you log in, select "Create a New Workspace."
Create your own Workspace Name and select “Public Plan”. Click “Create Workspace”.
Step 4: Add Additional Members. If none, select “Skip”
Step 5: Start creating your project by making a Project Name and Annotation Group (In my case it would be Egg Detection and Egg). Select “Object Detection” and then “Create Public Project.” Because we are on the free version, all our datasets are made public.
Step 6: Now upload your images to train your model. You may also use a premade dataset by searching on RoboFlow Universe (but like mentioned before, in our case there are no high quality premade datasets)
Step 7: I took 51 images of eggs, which for training purposes, is very very small, so I didn't really expect our model to perform all that well. The only way to improve performance is by taking and labeling a lot more pictures. So after uploading your images to RoboFlow, select "Save and Continue."
Step 8: Select "Start Manual Labeling"
Now is your chance to invite other members to help...sorry I mean do all the work yourself cause I don't have a team in training the images. Select “Assign to Myself.” It's gonna be a long day.
Step 9: Let the fun begin by selecting “Start Annotating”!
Step 10: Using your cursor, click and drag a box over the object you are training the robot to recognize. Then hit Save. You may select multiple objects within the same picture. Use the top arrow key to navigate throughout all your images.
Good Luck, Have Fun! :)
Step 11: Once you complete annotating all the images, select the Back button on the top left side of your screen. Navigate to the left side of your screen and select Generate.
Step 12: Create a New Version and select the images you want to include in your dataset. (If you want them all, just click “Continue”)
Step 13: Now you are able to Train/Test Split. This means selecting the amount of pictures you want the robot to train on and the amount of pictures you want the robot to be tested on. Kind of like saying "here are your study materials" and "here is the test."
To do that, select “Rebalance”. Please note that to have an accurate model, have more than 40 images for the robot to train on.
In my case, I used 45 images to train and 6 to Validate and Test.
Step 14: Continue to the next step, Preprocessing. This standardizes all the images in the dataset to make the model faster. I used Auto-Orient and Resize.
Step 15: Next is Augmentation. This is where RoboFlow would change the orientation and visual effects of your images to 1) make them "imperfect" and 2) create more training images for your model. If done well, this can allow the model to start generalizing and become more accurate in the general case. Hit "Continue."
Step 16: Select “Create” and click the maximum images allowed without requiring an upgrade. (Unless you want to pay)
Step 17: Then “Export Dataset”. Ensure that you have selected “show download code” and deselected “Also train a model for Label Assist with Roboflow Train”.
Click Continue
Step 18: You would see some code provided to you. Save this code somewhere as it will be important later. DO NOT share this code with anyone.
Ok we are now done with labeling and creating the dataset our robot will train on.
Training the model
To train Yolov5 on our custom dataset, we will use Google Colab. Google Colab is based off of the very popular Jupyter Notebook that allows you to run your code one block at a time instead of all at once.
This is advantageous because say if one part of your code takes a while to run and you only need to run it once, but it runs before another piece of code you want to test, instead of running all the code every single time you want to test a change, you can separate the two blocks of code, only run the long block of code once, and then only run the buggy one over and over again until you fix the error.
However, the other advantage that Colab brings, and the reason why we are using it here, is because we can use their powerful servers to do the training. Unlike some people, I do not have a powerful GPU handy to do Yolov5 training locally, so if I were to train Yolov5 locally on my 13-year old CPU, it would take 2 hours (I accidentally did that).
But on Colab, it would only take less than 5 minutes, even with the weakest GPU they have.
Training and Testing
Step 1: Go to Google Colab and sign in. Select "New Notebook."
Step 2: Name the notebook anything you want, and then navigate to Runtime > Change runtime type. This step is very important, because we need to tell Colab to use a GPU to run our program, not the CPU, otherwise it will take over 1 hour to train all of Yolov5. The GPU will only take less than 15 minutes.
Select T4 GPU and then Save.
Step 3: Now let's load all our files. On the left, select the folder button
There are currently no files except for Sample Data
Go ahead and type the following code into Colab and run it by clicking the play button.
!git clone "https://github.com/dakche/RobotChef"
This is what you should see:
After it completes, you should see a file named “RobotChef” on the left hand side of your screen. If you don’t see it, you may have to refresh.
Step 4: After you see the RobotChef folder, open a new code block by clicking on "+ Code" on the top left or by hovering over where the red arrow is.
Step 5: Change directory into the Yolov5 folder by typing the following into a new code block:
%cd RobotChef/Part_3_Software/High_Level/yolov5
Step 6: In another new block, instruct Colab to download all dependencies using this code:
# install dependencies as necessary
!pip install -qr requirements.txt # install dependencies (ignore errors)
import torch
from IPython.display import Image, clear_output # to display images
from utils.downloads import attempt_download # to download models/datasets
Step 7: To grab the training images from Roboflow, we need install Roboflow using this command in Colab:
!pip install -q roboflow
Step 8: Now copy your download code from RoboFlow into the next line.
Then run these commands in a new code block to change the .yaml file to the right number of classes:
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def writetemplate(line, cell):
with open(line, 'w') as f:
f.write(cell.format(**globals()))
%%writetemplate /content/RobotChef/Part_3_Software/High_Level/ref/models/yolov5s.yaml
# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license
# Parameters
nc: {num_classes} # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# YOLOv5 v6.0 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
Step 9: Great, now that we have downloaded our dataset and changed the configuration files, it's time to train it! Run the following command in a new code block:
%cd /content/RobotChef/Part_3_Software/High_Level/yolov5/
!python train.py --img 416 --batch 16 --epochs 50 --data {dataset.location}/data.yaml --cfg /content/RobotChef/Part_3_Software/High_Level/yolov5/models/yolov5s.yaml --weights 'yolov5s.pt' --name yolov5s_results --cache
Training will take about 5-10 minutes on the T4 GPU
Step 10: Verify that your model works using this code in a new code block:
%cd /content/RobotChef/Part_3_Software/High_Level/yolov5/
!python detect.py --weights /content/RobotChef/Part_3_Software/High_Level/yolov5/runs/train/yolov5s_results/weights/best.pt --img 416 --conf 0.4 --source /content/RobotChef/Part_3_Software/High_Level/yolov5/Egg-Detector-4/test/images%cd /content/RobotChef/Part_3_Software/High_Level/yolov5/
To see the resulting test images, follow the path it gave you via the folders on the left side of the screen.
As you can see, with just 45 training images, my robot can detect brown eggs in all my testing images (since I only trained it on detecting brown eggs):
Step 11: Download and save the weights file that was created after training. The points are saved in the following directory. It is important to save this, as now this will allow us to deploy the model on the KR260 without re-training everything.
/content/RobotChef/Part_3_Software/High_Level/yolov5/runs/train/yolov5s_results/weights/best.pt
Deploying non-Vitis-accelerated Yolov5 Egg Detector on the KR260
Let's run some real-time egg detection using a webcam on the KR260. do a quick little test and grab a baseline performance measurement on the Kria KR260.
Step 1: Mount the webcam on the robot arm by the claw so the camera sees what the claw is grabbing. I just used a small piece of double-sided tape.
Step 2: Clone the RobotChef repo on the Kria.
Step 3: If you were training a custom model, you will need to upload the best.pt file that you downloaded from Step 11 of the Training and Testing tutorial. However, for this Egg Detector, my best.pt file is already saved in the RobotChef repo, so just completing Step 2 is enough.
Step 4: Run the following command to start the detection:
cd ~/RobotChef/Part_3_Software/High_Level/yolov5
python3 detect.py --weights ./best.pt --img 416 --conf 0.4 --source 0
Note that we have to use python3 instead of python!
0.56 FPS is unacceptable performance for the Robot Chef, so we are going to need to speed things up a bit.
This is where Vitis-AI comes in.
In short, Vitis-AI allows us to take advantage of and use the Kria KR260's FPGA fabric and deep learning processing unit (DPU) by taking Pytorch models and compiling them into something that can be run on the KR260's hardware.
In summary, Vitis AI performs two steps:
- Quantization - Turns a normally floating-point CNN into an integer net. By reducing the precision of the CNN, this allows for faster computation and also allows the model to be deployed on the KR260's DPU.
- Compilation - Turns the quantized CNN into something that can be deployed on the KR260's DPU.
To start, it's important to know that Vitis-AI only runs on Linux (on Ubuntu 20.04), so if you are running Windows like me, we will need to jump through a couple of hoops.
Step 1: Open up VMware, which you should have downloaded in Prelim. VMware is a virtual machine that allows us to simulate a full desktop in another operating system. Some other sources indicate that Vitis-AI works on WSL as well, but in my experience that is not the case, so a virtual machine is absolutely necessary.
The biggest disadvantage of using a virtual machine is that the free ones do not support GPU passthrough (not that it mattered for me...), hence you can only use the CPU version of Vitis-AI, which unfortunately will be slow. If you are lucky enough to have a desktop running Ubuntu and a dedicated GPU, you can follow AMD's tutorial to install Vitis-AI and run the GPU version.
Step 2: You should have a copy of Ubuntu 20.04LTS downloaded already from Prelim, so now what we need to do in VMware is to create a new OS. Click on "Create a New Virtual Machine"
Step 3: In the installer wizard, click on "Installer disc image file (iso)" and use the "Browse..." button to navigate to your downloaded OS. Hit next.
Step 4: Follow the rest of the wizard to set up Ubuntu, everything else should be self-explanatory. Once you are done, you should see the Ubuntu home screen.
Now, let's set up the Vitis-AI docker environment and compile the .pt files!
Step 5: Open up Terminal by clicking Ctrl+Shift+T or by navigating to the grid of dots on the bottom left hand corner > Settings > Terminal.
Step 6: Clone and cd into AMD's Vitis-AI repo using:
git clone "https://github.com/Xilinx/Vitis-AI"
cd Vitis-AI
Step 7: Next, we have to download docker using these commands (pulled from the official Docker page):
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Step 8: After the environment is set up, run these extra commands to prevent getting an error when trying to download and run the Vitis-AI docker environment:
sudo groupadd docker
sudo usermod -aG docker <your ubuntu username>
newgrp docker
Step 9: Verify that docker works:
docker run hello-world
You should see this:
Step 10: Download and run the pre-compiled CPU Vitis-AI docker environment:
cd ~/Vitis-AI/
docker pull xilinx.vitis-ai-pytorch-cpu:latest
./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest
The Vitis-AI environment should start right up and you should see this:
Step 11: Now, Vitis-AI currently doesn't support the SiLU activation function among other things as used in Yolov5. Instead, LeakyReLU will have to be used instead of SiLU, so this modification along with some other things had to be made with the Yolov5 scripts. If you want to learn more about this, you can read this article, but I've already gone ahead and did all that for you, as well as re-trained the model and obtained the .pt file (ULTIMATE.pt). All that can be found in the RobotChef repo under yolov5_quantized.....
So go ahead and clone the RobotChef repo inside the docker environment so we have access to all the files and navigate to RobotChef > Part_3_Software > High_Level > yolov5_quantized!
Step 12: (Optional) Here are the details for retraining, as using LeakyReLU requires some modification to the training commands
In Step 8 of the Colab training, use this code instead:
from IPython.core.magic import register_line_cell_magic
@register_line_cell_magic
def writetemplate(line, cell):
with open(line, 'w') as f:
f.write(cell.format(**globals()))
%%writetemplate /content/RobotChef/Part_3_Software/High_Level/yolov5_quantized/models/hub/yolov5s.yaml
# Ultralytics YOLOv5 🚀, AGPL-3.0 license
# Parameters
nc: {num_classes} # number of classes
# activation is set to a weird number, because Vitis-AI only supports this number
activation: nn.LeakyReLU(0.1015625) # <----- Conv() activation used throughout entire YOLOv5 model
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
anchors:
- [10, 13, 16, 30, 33, 23] # P3/8
- [30, 61, 62, 45, 59, 119] # P4/16
- [116, 90, 156, 198, 373, 326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[
[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# YOLOv5 v6.0 head
head: [
[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, "nearest"]],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, "nearest"]],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
And in Step 9, run this instead:
%cd /content/RobotChef/Part_3_Software/High_Level/yolov5_quantized/
!python train.py --img 416 --batch 16 --epochs 50 --data {dataset.location}/data.yaml --cfg /content/RobotChef/Part_3_Software/High_Level/yolov5_quantized/models/hub/yolov5s-LeakyReLU.yaml --weights 'yolov5s.pt' --name yolov5sLeakyReLU_results --cache
Step 13: Next, it's time to start the quantization process. The quantizer.py is created by analyzing AMD's Vitis-AI demo code.
There are two steps: calibration, and testing. Calibration is run first, and it essentially profiles the Yolov5 model to check and see what is the best way to optimize it. Testing does the actual modifications and exports the quantized model. Let's start with calibration. To use it, in the virtual machine's Vitis-AI docker, run:
cd /workspace/RobotChef/Part_3_Software/High_Level/yolov5_quantized
python3 quantizer.py --quant_mode calib --weights ULTIMATE.pt --dataset Egg-Detector-4/train/
Warning, because we are in a virtual machine and using the CPU, this will take a while.
Step 14: Run this piece of code to export the xmodel file:
python3 quantizer.py --quant_mode test --weights ULTIMATE.pt --dataset Egg-Detector-4/train/
By default, the finished model will be found in build/quant_model. Now we have to compile everything.
Step 15: Compilation. We will use Vitis-AI's prebuilt function called vai_c_xir to compile the code:
vai_c_xir --xmodel build/quant_model/DetectMultiBackend_int.xmodel --arch /opt/vitis_ai/compiler/arch/DPUCZDX8G/KV260/arch.json --net_name yolov5_kr260 --output_dir ./KR260
Note that after the --arch tag, I selected KV260 instead of KR260. That's because at the time of writing, there is no KR260 .json file available. However, this will still work because the KV260 and KR260 have the same SoC, thus the same architecture, so the json file will work to both. Even better, the json files of all the other boards listed in /opt/vitis_ai/compiler/arch/DPUCZDX8G contain the same information, essentially guaranteeing that this will work.
Step 16: Grab and deploy! Time to grab the xmodel file and run it. However, the xmodel is inside a docker environment which is supposedly isolated from the rest of our system. So we need to do some trickery. First, open a new terminal and let's find out the name of the docker container that's running:
docker ps --format "{{.Names}}"
Next, to transfer the file out of the docker environment and to your local Ubuntu Documents folder, we have to use docker's special cp command:
docker cp <INSERT YOUR DOCKER CONTAINER NAME HERE>:/workspace/RobotChef/Part_3_Software/High_Level/yolov5_quantized/build/quant_model/KR260/yolov5_kr260.xmodel ~/Documents
Now when you check the Documents folder of your local virtual machine, the xmodel file should be these.
Now to get it out of VMware, you can use Firefox to manually email it to yourself, as the free version of VMware does not support transferring files to and from a virtual machine to your local computer.
Step 17: To run the xmodel, we have to install AMD's PYNQ libraries, which allows you to program the DPU/FPGA in Python. Install PYNQ using the instructions found here (just Step 3) and then use this command to run the new accelerated vision program.
python3 vitis-egg-detect.py
Step 18: Evaluation
So the egg detection code runs significantly faster with a higher FPS, but the egg detection accuracy has gone down a bit, which is to be expected after the quantization. However, this will be good enough for this application.
Software: High Level: Inverse Kinematics and ROS?Ok I must confess something here.
I tried using ROS.
I really really tried.
But remoting into a laptop 3000 miles away from where I was and using X11 forwarding to view rqt while simultaneously SSH-ing into the KR260 was not going to cut it.
The GUI was so slow, to the point where every click took nearly 30 seconds to register and had me so frustrated and stressed out to the point where I nearly gave up the project entirely.
So for the sake of my sanity, I decided to write my own bare-bones inverse kinematics (IK) solver from scratch, and implement ROS when I have physical access to the KR260 again.
Writing my own IK solver wasn't actually that bad. Let me explain how it works in the simplest way possible.
If you were to look at my robot from the side view, it is a simple 3 degree of freedom (DOF) machine, meaning there are essentially 3 moving joints on the robot.
And lucky for us, the 3-DOF robot arm IK control is solved problem in the first chapter of every robotics textbook!
Let's simplify the problem:
The essence of solving this problem is the humble triangle. We know the lengths of the robot arm and we know where to go. So if find the distance from the robot pivot to the object, we have three legs of a triangle:
By the side-side-side triangle congruence theorem, we know that if two triangles have the same side lengths, they must be the same triangle-- in other words, for every given set of 3 side lengths, there is exactly one triangle that matches the description!
This means we can solve for all angles analytically....using the law of cosines!
I'm not going to bore you with the math from here, the proof is left to the reader, but for those who are interested, you can open up the chef.py code and take a look at the formulas (though it won't be very interesting!).
Software: High Level: AlgorithmWe are finally in the home stretch.
Double check that the KR260 is connected to the Teensy via USB and that the robot arm is powered.
Place the pot + inductor cooker in front of the robot and carton of eggs to the left of the robot arm.
On the Kria, navigate to RobotChef > Part_3_Software > High_Level and run chef.py using:
python3 chef.py
Like what we did in Low Level, let's briefly go over what chef.py does.
- Find and locate some eggs
- Find the center of the bounding box of the egg to approximate the X-Y location of the egg
- Rotate the robot arm to center the egg in the camera's view
- Read current robot arm state from the Teensy
- Perform inverse kinematics calculations and interpolate between current position and the desired position
- Send the interpolated positions
- Rotate until the robot arm sees the pot
- Put the egg in the pot
- Wait for it to cook
- Take the egg out of the pot by finding the strainer handle and interpolating a path.
I re-trained the Yolov5 model with images of the pot and the strainer that I was using the same way as described above. I decided not to include the .pt file here, as it is specific to the tools I am using and will likely not generalize. The only way for a Yolo model to generalize is if you had a lot of pictures of different types of pots and strainers, which I do not have the resources nor the time to do so. However, chef.py can be easily modified to accept a new .pt/.xmodel file.
Software: What I would do if I did it again- Don't code remotely.
- Make some money so I can afford a dedicated Linux desktop with a good GPU.
Total time spent in this section: 215 hours
ConclusionI have to admit, this project was a lot, and considering the length of this post, I am proud to see what I have accomplished in the past 3 months without being able to touch my robot. I want to thank my brother for helping me because I most definitely would not have been able to do this without his help.
When my finances replenish, I hope to get myself a ZED 2 stereo camera and play around with the depth perception model in the Vitis-AI model zoo. Maybe getting a GPU might be worth it if I plan to continue doing a lot of training.
But at this point in time, I think I have the framework all laid out to make RobotChef make me some dumplings.
I'm tired, we're tired, but when I get back home to my robot, I'm going to ditch the Teensy 4.0 and take a crack at controlling all 5 motors from the KR260 itself plus fire up Vivado and teach the community how to really program the FPGA, and you will hear all about it!
Signing off for now, but stay tuned for RobotChef Part 2: DumplingBot!
Comments