Secure Hashing Algorithm (SHA-256) is a cryptographic hashing function used for security such as securing passwords or website servers. A famous use of it is in Bitcoin's proof-of-work which is based on the SHA256 algorithm to verify transactions.
I think many you will hear that GPUs being used for mining because GPUs are more suitable for number crunching as needed in hash algorithms. However, did you know that, a more customized acceleration can be done using an FPGA? By configuring the FPGA to accelerate it in hardware, it will be even more performant than GPUs. In fact, bitcoin miners started using FPGAs to mine from as early as year 2011.
Nowadays, the complexity of mining a bitcoin is harder and harder and FPGA mining may not be profitable any more. Nevertheless, we can take it as a useful case study to learn how acceleration can be done easily using Xilinx FPGAs. And perhaps next time we know what to do when we see another potential use-case for algorithmic acceleration.
We will be using the hardware kit of the Xilinx Kria KV260 together with the newly released Ubuntu 20.04 LTS OS. We will install PYNQ so that we can have the benefits of hardware acceleration from a Python notebook.
On our PC, we will use Vitis HLS to convert open-source C code implementation into a hardware IP module. After which, Vivado is used to create the connections to the processor and generate the bitstream.
Finally, we transfer the bitstream to the Xilinx Kria and benchmark the performance gains in the PYNQ environment.
First we need to prepare the SD card for the Kria KV260 Vision AI Starter Kit.
In the box, a 16GB SD card is provided, but I recommend using at least 32GB instead, since the setup may exceed 16GB of space.
We will download using Ubuntu 20.04.3 LTS. Download the image from the website and save it on your computer.
On your PC, download the Balena Etcher to write it to your SD card.
Once done, your SD card is ready and you can insert it into your Kria to set up Xilinx Ubuntu! Connect a USB Keyboard, USB Mouse, HDMI/DisplayPort and Ethernet to the Kria.
Connect the power supply to turn on the Kria and you will see the Ubuntu login screen.
The default login credentials areusername: ubuntupassword: ubuntu
Upon booting, the interface can be very slow, so I ran these commands to disable animations tweaks to speed things up.
gsettings set org.gnome.desktop.interface enable-animations false
gsettings set org.gnome.shell.extensions.dash-to-dock animate-show-apps false
Next, update the system to the latest by doing system updates and calling this command
sudo apt upgrade
Install the xlnx-config snap for system management and configure it (More information on the Xilinx wiki):
sudo snap install xlnx-config --classic
xlnx-config.sysinit
Now check that the device configuration is working fine.
sudo xlnx-config --xmutil boardid -b som
Install the latest Kria-PYNQ package. This will take up to 30min.
git clone https://github.com/Xilinx/Kria-PYNQ.git
cd Kria-PYNQ/
sudo bash install.sh
After installation, you can go to "kria:9090" in the web browser to see the Jupyter notebooks. The default password is xilinx.
The Kria system is ready. Now let's go back to our PC to create the PYNQ overlay bitstream.
Accelerator IP in Vitis HLSUsing Vitis HLS, we can convert many existing C/C++ codes into a hardware IP module. I will be adapting this SHA256 C code implementation without any modifications.
Launch Vitis HLS and create a new project.
On the next page, select the target device. For the Kria KV260 Vision AI Starter Kit, it is using the part: xck26-sfvc784-2lv-c
.
Once you are in the workspace, create a source file. Choose that file as the top function in the project synthesis settings.
In the code, I created a main function called hash()
. It essentially only calls the SHA256 hashing functions.
Now take note that this function is implemented in the Programmable Logic (PL). This means that parameters of hash()
are actually input and outputs which need to be transferred to and from the Processing System (PS). Hence, the need to choose the appropriate communication interface.
For the small variable like text_length
and result
, I chose s_axilite
which is a serial protocol suitable for small variables. It is also relatively easy to access from PYNQ later on.
For large buffers like text_input[1024]
, I chose m_axi
which is a parallel protocol. It takes up a lot more logic and interconnections, but it is necessary so that transferring data is fast enough.
For more information, the code is provided below at the end of this project.
Start the C synthesis under Flow Navigator.
After synthesis, you can verify the arguments that we will later access from in PYNQ.
Lastly, choose Export RTL and choose the location to save it to. It will output a zip file which contains the IP module to be imported in Vivado.
Open Vivado and create a new project
Choose the Kria KV260 Vision AI Starter Kit. Continue with all the defaults until you reach the project workspace.
Before we do anything else, we need to add the IP that we have created earlier.
Go to Project Manager > Settings > IP > Repository, and add the folder containing the zip file.
Under IP Integrator, choose Create Block Diagram. Add the following blocks:
- Zynq UltraScale+ MPSoc (This is the PS)
- Hash (The IP that we have generated from Vitis HLS)
- AXI Interconnect (To interconnect to the
m_axi
bus from our IP)
After that, run Connection Automation. Choose all the possible automations and accept the default settings.
Notice that the interconnect bus for m_axi
(Master) is still unconnected. This is because I forgot to enable the Slave interface on the PS.
Double click on the Zynq UltraScale+ MPSoc block. Enable AXI HP0 FPD (high performance). Check that the data width is 32 bits which is to match what was synthesized in the HLS.
Run connection automation again. This is the final block diagram.
Under Sources, right click the design and choose Create HDL Wrapper. It will automatically wrap the design for you to be ready for synthesis.
Generate the bitstream. It may take up to an hour for synthesis and implementation to complete.
Finally, to get the PYNQ overlay, we have to retrieve 2 files: the .bit
file and the .hwh
file.
The bitstream file can be found under: *.runs/impl_1/design_1_wrapper.bit
The hardware handoff file can be found under: *.gen/sources_1/bd/design_1/hw_handoff/design_1.hwh
Copy these files to the Kria.
Interfacing in PYNQI copied the 2 files and renamed it to sha256accelerator.bit
and sha256accelerator.hwh
. Note that both files must have the same name to be used properly as a PYNQ overlay.
I will explain some snippets of my code. You can find the full code attached below.
Create a new Jupyter notebook and we can program the bitstream in.
Here I define the hardware function which is the write the input data into the buffer and start the process. After completion, the function will return. I also define a software function which called a built-in library.
Running some tests, we can compare the inputs of both functions to verify that everything is implemented correctly on hardware.
Running some benchmarks, we see that the hardware implementation is faster by about 14 times compared to running on software.
To conclude, it was very easy to accelerate a C function without any modifications using Vitis HLS. Although there needs to be some familiarity and knowledge of the hardware protocols and its variants, the software process was relatively straightforward.
From here, we understand how FPGAs were used in the past for accelerating these repetitive computations such as blockchain mining. By directly targeting the algorithm, the hardware is more specific and thus more performant.
Comments
Please log in or sign up to comment.