Our Software

This page details some aspects of Kókánybot's software stack, and goes into greater detail regarding some changes made since the previous year.

Kókánybot's software stack is written in a mix of Python (the object detection bits) and C (everything else).

The backend

There are a few daemons running on Kókánybot's main computer, a Raspberry Pi 4B: A few FFmpeg scripts for streaming video (for the front and rear cameras) and audio and a C program which controls all the actuators using various kernel APIs that will be elaborated upon below.

Communication

We interact with our robot using a few custom programs, named KókányControl (kokanyctl) and KókányRecognize (kokanyrecognize).

KókányControl has a graphical interface for displaying the video and the sensor data it receives from Kókánybot. It takes keyboard input, and sends commands to the Raspberry Pi. It also recognizes QR codes that appear on Kókánybot’s cameras.

KókányRecognize was written to reduce the complexity of KókányControl, since image recognition functionality is only needed in a few runs, and we can just launch KókányRecognize whenever we need it. This also enabled us to build KókányControl in pure C, since we would have needed to use C++ to build the image recognition bits (which uses OpenCV).

The robot can be controlled via WiFi (2.4Ghz) or Ethernet, by changing the name of the network interface used in kokanybot's code.

Training an object detection model

Last time, one of the most challenging aspects of the competition was the object detection feature robots needed. Back in Bordeaux, we failed to detect anything and got 0 points for object detection. We definitely needed to improve on that.

Firstly, we found a large enough dataset on the internet (1k+ photos), which was a godsend because it spared us from having to manually take pictures and tag them.

Secondly, we switched from the outdated YoloV5 to the up-to-date YoloV8 which is supposedly better in every single way. We chose the small version, because the larger the model, the slower inference is, and while performance doesn’t really matter when training, one can simply leave their computer running while they are not home, it is still vital during inference, because our laptops are not on par with our workstations at home.

To use our model, we wrote a script named kokanyrecognize at the last minute. It wasn’t very performant nor really clean, so we spent a significant amount of time working on it. So far it was rewritten to use the new model, however it still has a long way to go.

Here is the result so far:

Image showcasing the accuracy of our model

Simplifying our software with device trees

Last year we used a library called libi2c to control some of our stepper motors using an external MCP23017 GPIO expander. Since then we have switched to using servos for our robotic arm. We also wanted to make our software more flexible, and the best way to do this was ripping out the useless code and using the right tools for the job.

While programs are allowed to just use libi2c, it adds (often redundant) extra code to a project. The Kernel actually has drivers for a lot of common ICs such as GPIO expanders—like the MCP23017 we used—or PWM controllers like the PCA9685.

One improvement we've made was deleting the I2C parts from our code, and instead using the kernel driver instead. This is great because we can just interact with the userspace PWM API via something like libhwpwm (more on that later).

To achieve this, we wrote a device tree overlay. Device tree overlays are kind of like patch files for device trees. This file allows us to tell Linux what chips are available on certain I2C addresses.

Switching from TCP to UDP

Robotics taught us just how fragile computers are when subjected to harsher environments. Short circuits may occur, components might get knocked against a robot's frame, boards can overheat, all of which can result in a robot rebooting or shutting down entirely.

In our previous competitions, we have always had faults like this occur. If things were going too well, then our UTP cable slipped out of our control station's Ethernet port. In these cases, we almost always had to reboot our robot and do the software side of the setup again and then reconnect, wasting precious time.

Mitigating these issues while sticking with TCP sockets would mean having to handle potential connection issues every time we send/receive data. Instead of doing that, we've been making steady progress to switch over to UDP, which is a different communications protocol. As of now, the code which receives keyboard input uses no TCP sockets, it's all stateless.

The main advantage of UDP compared to TCP is its statelessness: networked programs do not need to connect(), listen() nor accept(), which also means no need to track clients; you create a socket, and send/receive data over it using sendto() and recvfrom(). This eliminates the need for handling reconnections.

The main downsides of UDP are that it does not guarantee data packets arriving in the correct order nor does it guarantee that they arrive at all. This is not really an issue for us, since the entire network is point-to-point with no more than two hosts, and we have yet to experience any problems.

Removing gas sensing code

The rulebook's latest draft does not mention CO2 sensing, therefore we've removed all code related to it. If a later draft brings it back, we can just reuse a previous commit, as we've often done during development.

Video and audio streaming

During tests, operators are only allowed to see the arenas from their robot’s point of view. This meant we needed a way to find a way to display the video data from the robot’s cameras.

We've learnt from our mistakes last year, and have opted for using multiple cameras so that we have better peripheral vision while controlling Kókánybot.

Multimedia related tasks are surprisingly computation heavy when one is working with embedded systems. The CPU in the Raspberry Pi 4B+ is fairly capable, however we also had to consider power draw and thermal related problems. We considered several video formats: H.265, AV1 and H.264, but in the end we settled on using raw frames from our cameras to minimize latency as much as possible, since at the 2023 RoboCup, our camera's high latency caused a lot of trouble.

One of our cameras is currently a Raspberry Pi Official Camera Module. This uses the new and improved libcamera stack which does not play well with regular V4L2 programs—such as FFmpeg—hence we wrote a modified script to work with the Pi Camera.

Linux assigns /dev/videoN to every camera. Since our cameras have to be handled by separate programs, and have different output formats, we must be able to tell them apart in a consistent way. Linux provides a way to do this using udev rules. We gathered the attributes of the cameras using udevadm—a standard udev utility—and wrote rules to assign the /dev/front-camera and /dev/rear-camera names to our cameras.

On the client side, we implemented the decoding of the video data using FFmpeg’s libavformat and libavcodec libraries. Rendering the video frames was tricky to figure out because pretty much all video encoders store pixels in YCbCr colour space, which SDL isn’t the best for.

The APIs of the libav* libraries are huge. Thankfully we only really needed the high level decoding API. The Learn FFmpeg libav the Hard Way tutorial combined with the examples in the project's documentation also made things much easier.

libhwpwm

As we've stated above, one of our main goals was to reduce the number of barely working DIY solutions, and rely on what's tried and tested: The kernel drivers. To that end, we did not want to bring back our old PCA9685 driver.

There was an issue however: There was no thin, low-level C library for poking the sysfs PWM files safely (or at least as safely as C code can). As it's written in the libhwpwm announcement, the existing solutions were too complex for our taste, so contrary to what we set out to do, we rolled our own solution, but the result is less hacky than the aforementioned old PWM code.

The implementation is really simple as the library only needs to poke ~8 or so files in total. The library has been rewritten entirely when we uncovered what troubles holding onto critical file descriptors—sysfs files—cause.

We've also encountered a rather exotic bug in the Raspberry Pi and the camera modules, so we will reconsider using those modules.

Setup and packup of the robot and operator station

To set up the robot, we slide a battery into our battery holder, and flip a switch on our robot.

For the operating station, we use a laptop. We connect the laptop to the robot using an Ethernet cable, and run kokanyctl 192.168.69.1.

Mission strategy

We use standard, easily acquirable parts, which makes our robot truly rapidly manufacturable.

Experiments and Testing

We have conducted image recognition tests, and tested the movement of our robot in our school. Our image recognition model successfully recognized hazmat labels with an 80% accuracy. We have also tested our robotic arm controls, they worked good enough to gain us a decent amount of points at the 2024 Hungarian competition (see the main page).

Software packages used

libgpiod DC motor control, sensor control, stepper control
OpenCV Image recognition
FFmpeg Video streaming, Image recognition backend (used by OpenCV)
SDL2 Used by KókányControl to process keyboard input and display video
SDL2_ttf Used by KókányControl to draw text for displaying sensor data
SDL2_net Used by KókányControl to handle networking
libzbar Used for QR code detection
YoloV8 (small) The model we use for object detection
RMRC Dataset Dataset used for training our image recognition model
undefined medium The font used in kokanyctl
libhwpwm A C library to interface with the Linux PWM userspace API
rpicam-apps The programs used to capture the Pi Camera's video data

What the team has learned so far