Published September 16, 2021 © MIT

Statek UAV

First attempt to create autonomous cart. Also my master's thesis.

AdvancedShowcase (no instructions)1,390

Things used in this project

Hardware components

NVIDIA Jetson Nano Developer Kit

STMicroelectronics STM32 Nucleo-64 Board

Ydlidar X4

SparkFun IMU Breakout - MPU-9250

GPS NEO-6M

Raspberry Pi Camera Module

Monster motor shield

AM4096 magnetic encoder

DC motor (generic)

GEL battery

300W Step down converter

Software apps and online services

ROS Robot Operating System

Microsoft Visual Studio Code Extension for Arduino

Autodesk Inventor

KiCad

Ubuntu on windows

Rviz

Gazebo

Hand tools and fabrication machines

Soldering iron (generic)

Saw

3D Printer (generic)

Wire Stripper / Crimper, 10AWG - 20AWG Strip

Multitool, Screwdriver

Story

Considering that my master's thesis is currently being assessed by my promoter and I have nothing to do right now, I guess that's a good idea to roughly write about it's content here. By roughly I mean that I'll write what I did without going too deep into it as it took me around 120 A4 pages to describe it neatly (and I'm kinda tired after writing it :D). If you want to hear more then ask in the comments, or look into attached Github repo with all design files and source code (it's kinda messy).

All of the italicized text are my thoughts about the project that appeared while I was writting this post.

General idea

So, the general idea was to create autonomous platform that will be able to navigate in outdoor environment in safe manner for the robot itself, but also for the people in it's victinity. What was the result? Welp, I think it was so-so, as the mechanical part requires some (big) design changes, but the navigation stack created for this project is working as expected and I think that the direction of the project itself (which will happen hopefully) will be evolution rather than revolution.

The project consists of three main parts:

Direct (real time) control
Short term navigation
Long term navigation

And their relationship can be seen here:

General control structure

The difference between short term and long term navigation can be easily described by this picture:

Definition of long and short term navigation stacks

The physical platform

I think there is nothing to say about the frame itself. It was designed as a bunch of blocks using Autodesk Inventor and mostry 3D printed. All of the blocks were connected together using four TH35 rails, which also allowed for a lot of places to mount the electronics:

The chassis is held by four wheelbarrow connected in differential configuration by bicycle chains and gears as you can see in the cover photo. Below you can see how the platform moves:

Looks good enough right? Well, nope. I'll start describing this project with the bad news first. Apparently I was a bit stupid and didn't considered the fact that the platform will be so heavy and the shear forces generated in the motors will be SO big that the plastic connectors connectiong bicycle gears with motors will break spectaculary during fast accelerations or when the frame was placed in environment with big friction (like rough concrete). At the end I've come to the conslusion that the frame will need to be redesigned, but due to stupic sexy time constraints of my master thesis it will be done outside of it (like, in my free time). I have rough concept for it that you can see in the conclusions of this post.

The hardware

As it was described in "General idea" the project is split into three parts. Such is the hardware.

For direct control I've used STM32 microcontroller along with AM4096 magnetic encoders. The controller supports rosserial so it's really easy to integrate in into any ROS based project. The encoders are digital sensors that communicate through I2C bus. These sensors allow to read angular position and velocity of the shaft along with other cool diagnostic info such as whether shaft's magnet is too close or too far from the IC. Also these IC's are cheap for what they offer (4096 bit position measurement, so around 0.09 deg). Velocity measurement allowed me to create closed PID motor control loop that ensures roughly equal velocity of both motors so when we command forward movement to the vehicle it will move forward instead of moving in an arc and moving away from the desired trajectory, which happens in open loop control due to differences in motors used in the project. The positional measurement will be helpful in odometry algorithms to compute distance travelled between short time intervals.

For the short term navigation I've decided to use Ydlidar X4 for obstacle avoidance algorithms. This sensor allows for 2D obstacle detection up to radius of 12 meters away from the center of the platform.

For long term navigation I've used popular NEO-6M GPS module for global localization along with also popular MPU9250 9 DOF IMU sensor for global orientation (by using it as compass).

Both short and long term navigation stack is implemented in Jetson Nano B01 SBC. Additionaly, there are two RPI cameras v2 mounted at the front that I've planned to use them as depth camera, but due to kinda big noise from (even calibrated) cameras, it didn't work too well. But the cameras still provide nice preview from Rviz vizualization application.

The project is being powered using two 12V gel batteries connected in series and two DC-DC buck converters lowering the voltage to 16V for the motors and 5V for the digital stuff.

Real time core

For real time part of the project I've used STM32F103RBT6 microcontroller in the form of Nucleo64 board. The real time part of this project is responsible for closed loop motor control (direct control block in the diagram in "General idea"), IMU reading for orientation estimation and incremental odometry algorithms for continuous position reading which will be used in short term navigation and long term navigation parts of this project.

I'll start description of this part starting with motor control. The general idea of the controller for single motor is that it's just simple C++ object that allows to control the motor using different control modes:

Max velocity test - Called as ROS service. Ignores setpoint from ROS stack and writes maximum control value to the motor's H bridge for user defined time, then computes cumulative average and returns it as result of said service. Useful to test physical limits of the vehicle.
Step response identification - Called as ROS service. Ignores setpoint from ROS stack and writes maximum control value to the motor's H bridge for user defined time, then takes 100 samples of motor's velocity in constant intervals and returns them as service's response. Useful for PID autotuning, which will be described later :).
Direct mode - Writes setpoint from ROS stack directly to the motor (open loop control). Kind of useless, I've only used it for testing purposes.
PID mode - Classic closed loop with PID controller (see block diagram below).

PID control loop

Apart from standard PID components such as the controller itself, the actuator (DC motor) and the sensor (AM4096 magnetic encoder) you can see two strange blocks. The first one is "Watchdog like algorithm". Well, this block does what it says. But more specifically it watches over communication with high level system (Jetson Nano). The general rule in mixed projects like this is to never trust communication means such as USB cable or high level systems. Well written code for microcontroller can work for years, but well written code in high level system can work as long as the system allows it to. And in high level systems a lot of bad situations can happen such as the system going out of memory, general system hang due to other running program, etc. Putting aside software problems, not sure how it works in Jetson Nano, but the first thing that comes to my mind when I think about problems in microSD powered SBCs is the card's corruption. When the card is corruted a lot of bad stuff can happen. The best situation is when the system won't start as it won't do any harm to people or things. The worser situation is when the system will behave incorrectly and throw funny letters, such as "kernel panic" in the terminal as it happened for me while working with RPI once. We need to be prepared for such situations and stop the vehicle when something bad happens with high level system so it won't crash into people or move onto busy street. In the project I've assumed that the setpoint will be transmitted over rosserial at least once per 500ms. When the setpoint arrives to real time core, then STOP_MOTORS flag is being reset. Every 500ms the flag is checked by watchdog block and if it's reset, then it means that setpoint arrived and it can be set again. When the flag is still set, then watchdog assumes that communication with high level system is broken and setpoint is being overwritten to 0. This state is not permanent and communication flow can be resumed by high level system by sending new setpoint value.

The next block is "fake inertia" block. It's just first order low pass filter that is placed to slow down controller's control value so the motor's will start slower even given the maximum control value (think about soft start in three phase AC motors). It's added to the project due to lack of my imagination as those motors generate enormous shear force (as I said in the "Physical platform" part of this project) at max allowed voltage (16V in the project due to motor drive's limitations, and those motors are for 24V!) that pretty much destroys adapter that connects motor with wheels through chains. Well, a bit unfortunate, but this block prevents that a bit so we can play with the real platform in some environments.

The next thing that should be described is how to tune PID controllers, and that's the neat part, ̶y̶o̶u̶ ̶d̶o̶n̶t̶ We'll use the magic of ROS, rosserial and optimization algorithms to fine tune our controllers. Remember the step response identification motor control mode? That's where it becomes useful as we can just process those samples through recursive least squares algorithm to identify it's discrete dynamic model (second order model in the project). By having the model we can attach it along with PID controller and create simple simulation. in which we can evaluate the control quality using some choosen criteria. In the project I've used average of IAE (sum of absolute errors) and ISE (sum of square errors) with added some penalty for overshoot (measurement bigger than setpoint) and saturation (control value bigger than physically possible due to electical constraints). The simulation function (written in Python) can be seen below:

def simulate_object(pi_params, *args):
    steps = 100

    order = args[0]
    model = args[1]
    sampling_time = args[2]

    reg = PI(pi_params[0], pi_params[1], sampling_time)
    object_outputs = [0]
    iae = 0
    ise = 0
    penalty = 0

    for i in range(steps):
        error = 1 - object_outputs[i]
        ise += error**2
        iae += np.abs(error)
        result = reg.read(error)

        # Punish PI saturation
        penalty += np.abs(result["saturation"])

        control_val = result["control_val"]
        object_output = np.asscalar(np.dot(np.transpose(get_model_state(order, object_outputs, i, control_val)), model))

        # Punish for overshoot
        if object_output > 1:
            penalty += (object_output - 1)

        object_outputs.append(object_output)

    res = np.sum(0.5 * iae + 0.5 * ise)
    return res + penalty

With this function we can run SciPy's least squares optimization algorithm to fine tune our PID (PI in the project) controller:

def tune_pid(order, model, sampling_time):
    print("Starting PI optimizer...")
    solution = opt.least_squares(simulate_object, [0, 0], bounds=([0,0],[10,10]), max_nfev=1000, ftol=None, xtol=None, args=(order, model, sampling_time))
    print("Optimal solution found: {solution}, Performance criteria: {ise}".format(solution=solution.x, ise=solution.cost))
    return np.append(solution., 0)

Whole procedure looks like this:

In video below you can see that the tuning is pretty fast:

The next part of the real time core is the IMU. In this aspect I wasn't too ambitious and just used Hideaki Tai's MPU9250 (https://github.com/hideakitai/MPU9250) library modified a bit so it will work on STM32. Data from IMU along with orientation estimated through Madgwick filter is being sent to other parts of the ROS stack through IMU message.

The last part of the real time core is the odometry. For odometry I've created simple C++ object that continously reads position of the encoders and converts it into positional offset since start of the system. To compute odometry I've used Odometry Lectures by Chris Clark of University of Malta that can be found on the internet. Odometry is being sent to ROS stack using Odometry message.

Short term navigator

Obstaclemapper

The first element of the short term navigator is obstacle maper. For this purpose I've decided to gave up on SLAM algorithms because maps generated by those could be out of date after some short period of time due to dynamics of the environment (At the end I think that some sort of SLAM would be useful). Instead, I've created map that combines static detections from lidar sensor with dynamic detections of human legs that come from fully convolutional neural network with lidar's detections (as an image) as input to this network.

There is nothing fancy about the static map. It's simple map with detections from Lidar as you can see below:

The white tiles define the free space that is raytraced from the center to the edges.

The dynamic map is slightly more complicated. The first thing that is required to create dynamic detections is some sort of measurement. For this puprose I, too decided for lidar's measurements, but this time I've mapped them onto image of size 256x256 pixels. During this mapping, the lidar's signal is conditioned a bit. 2D lidar looks for detections by spinning single laser range sensor (similar as popular ultrasonic sensor but faster) around. The sensor starts at angle 0 deg, makes the measure, then increases the angle a bit and measures again, and so on, until it hits 360 degrees, then repeats the procedure. The change of angle is constant, which means that the closer the obstacles are, then the detections will be more reliable. Detections that are too far away will just appear as a bunch of unconnected points:

Look at difference between top wall and left wall.

So, to somehow fix it, I've decided to connect "close" pixels together using Bresenham's line drawing algorithm. By "close" pixels I've defined simple linear equation that increases the allowed distance between pixels to be considered as "close" depending on the distance of these pixels from the center of the image (from the lidar). This means that pixels close to the lidar will be connected i.e if these pixels are 3 pixels away at most, but pixels around the edges of the generated image (farther from the lidar) will be connected even when those are 7 pixels apart from themselves Conditioned image looks like this:

See the difference?

This concludes generation of the input to the neural network leg detector.

For the detector itself I've decited to use PeTraNet model from Tracking People in Mobile Robot From 2D LIDAR Scans Using Full Convolutional Neural Networks for Security in Cluttered Environments, which is an modified a bit U-Net network. I've modified this network a bit too, by removing one convolution from each down/up sampling layer, and cutting the number of filters in each layer by half:

New network model

For the cost function I've used modification of the cost from U-Net's original article, but replaced distance from two closest cell's borders in weight generation by distance from closest labeled leg:

Labeled leg (left) and it's weight map (right)

Weight equation (see Unet's paper for reference)

Also I've replaced softmax with sigmoid in cost function as my model appeared unstable for some reason. For the training I've used dataset provided by authors of PeTraNet that was provided in linked article. The training was fast enough, like in hours as I've replaced HDD with NVME drive since AI Based Touch Screen article :D. The network works good enough to detect legs (although with some false positives that will be fixed in the next step):

(I know I should've create newdataset for outdoor use with conditioned lidar's readings instead of provided dataset but my time constraits to hand over the thesis were tight enough D:)

Now, we have our legs detected in form of simple binary image. This image will be processed through OpenCV's contour detection algorithm and center of each contour will be returned giving us an array od 2D coordinates with leg candidates.

Each of those candidates will be processed through existing Kalman filters. Each Kalman filter will look over the array of candidates finding the best one (closest to estimated position. If there is none close enough (around 1m) then filter's update won't occur. If there is close candidate, then the update will occur and the candidate will be removed. If the update won't occur for too long (around 2s), then the filter will be removed from list of filters. If there are some kandidates after this whole procedure, then new Kalman filters will be generated for them. The Kalman filter is responsible for filtering leg's velocity and position and uses simple linear movement model with estimated velocities as inputs. Whole procedure for the leg detector looks like this:

# Preprocess data.
with msg_lock:
    frame = msg_to_img(msg)
    msg_arrived = False
frame = preprocessor(frame)

# Make prediction.
prediction = net(frame)
prediction = prediction_to_cv(prediction)

# Extract legs.
leg_candidates = get_legs(prediction)

if len(leg_candidates) == 0:
    rate.sleep()

leg_candidates = to_meters_arr(leg_candidates, height_pixels, width_pixels, height_meters, width_meters)

# Update legs.
cntr = 0
rviz_markers = MarkerArray()
leg_markers = DynamicDetectionArray()
leg_markers.header.frame_id = "statek/laser/laser_link"

for leg in legs:
    leg_candidates = leg.update(leg_candidates)

    if math.hypot(leg.velocity()[0], leg.velocity()[1]) > leg_hysteresis:
        leg_markers.detections.append(get_leg_marker(leg))

    rviz_markers.markers.append(get_rviz_marker(leg.position(), cntr))
    cntr+=1

rviz_publisher.publish(rviz_markers)
marker_publisher.publish(leg_markers)

# Remove dead legs.
legs = [leg for leg in legs if leg.alive()]

# Generate new legs from candidates that were not consumed by already
# existing legs.
for candidate in leg_candidates:
    legs.append(Leg(candidate, forget_time, max_distance, process_variance, measurement_variance, filter_alpha))

Position and velocity of each leg is then sent to other parts of the system through ROS messages. Then, in the obstacle mapper this info is used to generate line from detected leg all the way to the end of the obstacle map in the direction where the leg is moving to assure that the vehicle won't cross moving human's path. This line will be generated only if velocity of given leg is above some threshold, which allows us to get rid of false positive detections mentioned above. You can see the dynamic mapper in action here:

Final map, which can be considered as logical sum of static and dynamic maps can be seen here:

Voronoimapper

The voronoi is based on dynamic obstacle map described above. There is nothing fancy in this part of the project to be honest. The Voronoi graph is generated using OpenCV's Subdiv2D object. Next, the location of the robot and it's defined target are added to this graph, and then simple anchor points around the edges of obstacle map (only on white tiles) are added so goal can be reached through path planning algorithms even if it's outside of relatively small obstacle map:

Voronoi graph in green, target marked in blue and anchor points in red

Pathplanner

For the path planning algorithm I've used available A* library for Python. Again, nothing fancy:

(Looking back at this project I think I should transform generated path from bunch of points into spline for performance reasons in "Navigator" part of short term navigation stack.)

I have tried to play with the idea of A* with memory (like rewarding nodes that are close to previously computed path) to prevent massive path jumps like in the picture below but it didn't yielded any fruits for now.

Two potential paths. Depending on small differences or movement of the vehicle it can change the path fron one to another in rapid succession

Navigator

The navigator is the fancy thing! For the purpose of navigation through generated short-term path I've used NMPC which stands for Nonlinear Model Predictive Controller. Simply putting, if we have right movement model of the robot (simple differential one in this case), this controller can predict the "perfect" set of motor controls in given prediction horizon (which is defined in samples). By perfect I meant the one that will minimize this scary function:

so scary

Looks kinda scary, but it's easy to understand if you look at this ̶f̶o̶r̶ ̶t̶h̶e̶ ̶r̶e̶s̶t̶ ̶o̶f̶ ̶y̶o̶u̶r̶ ̶l̶i̶v̶e̶ for some time. Simply putting, this function is just weighted sum over prediction horizon of squares of controll errors (euclidean distance from current position to desired position), cross track error (distance of the vehicle to planned path), orientation error (angular difference between vehicle and planned path), uses of actuators (the motors, so these will be used as little as possibie for power saving) and also changes in actuators (so the motors will change speed in smooth manner instead of rough fast changes). The weights allows us to define what we want most from our controller and the square removes the sign from the result (so minimalization will mean put it as close to 0). The differential model of the robot used in the controller looks like this:

x, y coordinates and orientation

The NMPC requires some optimization algorithm to perform the dirty job, and it needs to be fast, so simple use of SciPy's algorithms won't make it here sadly. For this purpose it was required to install external fast optimizer designed for large computations. For this purpose I've decided to use Ipopt which allowed to perform control in intervals of about 300ms, depending on the complexity of given path and robot's state.

The prediction horizon in the project is defined as prediction time (in seconds) divided by some sampling time (the same used for differential model on the image above). During tests I've observed that if the robot needs to rotate to minimize orientation error, too big sampling time causes it to rotate chaotically. After reducing the horizon I've seen that after the rotation, the robot tends to move slightly off the planned track, like the orientation error wasn't reduced to 0 in NMPC's prediction. To fix it, again, just like in lidar's conditioning, I've created linear relationship between the orientation error of the robot and prediction horizon. The horizon (sampling time and prediction time) increases, when orientation error decreases allowing for smooth switching between orientation and cross track error correction (red arrows):

Long term navigator

The long term navigation is a lot smaller topic than overly big short term navigation stack. It consists of only two scripts. The global mapper and the navigator itself.

Global mapper

ROS requires that it's kinematic connections are defined in cartesian coordinates. Due to this we need to cast our GPS coordinates onto some defined plane. I've used simple Geodetic to ENU conversion. Due to the fact that the GPS measurement isn't noiseless, a Kalman filter was added, that filters casted coordinates using simple linear movement model that uses IMU's accelerations (transformed to north/east accelerations) as inputs. Filtered coordinates are used in ROS's kinematic chain. Additionaly, those filtered coordinates are transformed back onto GPS system (using Zhu's algorithm). Those filtered GPS coordinates were used in simple React application to show robot's location without GPS's dot jumping all around the place:

I've had better plans for this app, like control and path planning, or manage many vehicles (as you can see on the left menu of this image) but I pretty much can't stand Javascript :| I'll work on this someday, ⁱ ᵖʳᵒᵐⁱˢᵉ.

The navigator

Well, there is not a lot of things to be said there. It's just small Python script that waits for an array of GPS coordinates, then executes the first coordinates and checks in regular intervals whether the platform is close enough to them (like 1m) if so, then the scripts sets next coordinates. That's why it was necessary to use Kalman filter in global mapper. Without it the platform was going crazy close to the desired point as it jumped around (in like 5-10m radius from real point, depending on the visibility of satellites) and in example the platform was trying to correct it's position to the left, then path jumped to the right, so it had to stop and rotate to the right, etc.

Testing

Due do the problems described in "The physical platform" part of this post, I've decided to recreate real testing environment, in the form of the park that is placed close to me (and cool pub) in Gazebo. Then, I've ran all of the software described above and hoped for the best. Well, I guess that the algorithms are working okay:

The only problem I've encountered was the leg detector. Looks like the animated humans in Gazebo have strange collision boxes and during the movement they happen to pass through other collisions, so I've decided to disable moving humans and the detector in this environment. The Isaac sim would be probably the better option as devs at Nvidia designed the boards around machine learning, but my GPU is just one generation too old for it :D and I think that most people here know what is currently happening in GPU market, so I've had no choice.

The conclusions

To conclude, I would say that I'm kinda proud of this project. Can't really describe how much I've learned about robotics, ROS or optimization algorithms while working on navigation algorithms, but I can for sure that all of this effort won't be in vain. The algorithms could use some code cleanup, optimization and development, but as I said earlier, this would be an evolution rather than revolution. When I write this post I already have some ideas how to develop the project. Some of these were stated in italicized text through the article, like spline based path instead of points, new dataset for leg detector. But the first thing to fix would be the frame itself, and there's some good news as I have designed rough concept of the new frame which fixes all of the stuff that I didn't like about the first one, such as weight balance, motor placement (now it's all metal, connected to shaft of one wheel in pair), waterproofing, etc:

The cooler frame

I think I'll work on this project some time after I graudate as I'd use some rest for my brain.

Welp, thats all from me, I hope I didn't bore you too much with this post :D

Below you can find source code along with some instructions how to run the project (those are in kinda draft state too), but I were you I wouldn't make this project due to the problems described above.