Employ reconfigurable FPGAs to execute the Fully Homomorphic Encryption over the Torus (TFHE). In particular, the Fast Furrier Transform and Lagrange Multiplication.
DisclaimerTo fully comprehend the terminology behind Homomorphic Encryption, requires a deep understanding of cryptography, signal processing, and mathematics. Nevertheless, thanks to the high-level libraries provided by Vitis, it is possible to abstract the most complicated parts of it.
The overall project might seem like a daunting task to grasp at first; however, it is possible to run the examples below without a full understanding of the inner details, even with a surprise a the end of this article.
Remember, it is not just about the destination, but about enjoying the journey to discovery.
BackgroundOn an everyday basis, everyone employs cryptography seamlessly: surfing the web using HTTPS, buying merchandise at a store using a credit card, or by sending messages in Signal. At the core, these examples take advantage of the Public Key Cryptography scheme (PKC). PKC is also known as "Asymmetric Cryptography" because it employs a set of two (or more) keys to encrypt/decrypt data using public/private keys respectively. The image below shows an example of how PKC is used to transfer a message across an "insecure" channel, which in this case is the internet.
Blockchain implementations like Bitcoin, Ethereum, or Cardano employ the PCK principle extensively in their wallet functionality, where a user creates a public address (where funds are deposited) and the private key serves to sign transactions. Something most people do not realize is all transactions in those blockchains are public. In other words, the movement of funds between public addresses is visible to all participants in the network, which makes every move traceable and possibly identifiable to individuals. That applies as well to "smart contracts" on Ethereum, Cardano, or Solana, where all the transactions performed require "plain data" to operate.
A similar case occurs with public cloud services. Whenever a company uses its services (like AWS Lambda) the information transformation happens with clear data. For example, a user access a corporate site hosted on AWS, foreign communication happens over encrypted channels. However, the user's information needs to be decrypted to have it visible to the company as well as AWS, unless everything within the AWS servers is encrypted at every operation by the business, which in most cases is not true.
Therefore, in order to preserve the privacy of the transactions, data transformation and verifiable results with high confidence of no leakage of important information, Fully Homomorphic Encryption is the next step.
¿What is Fully Homomorphic Encryption?Fully Homomorphic Encryption (FHE) is a peculiar cryptographic technique that carries out computations on encrypted data. This entitles a complete paradigm shift on data manipulation, user privacy, and blockchain services on the public internet.
As alluded to earlier, a generic application (ex AWS Lambda or a smart contract) receives encrypted data, first, it is necessary to decrypt the payload, perform the desired computations on the clear, and then re-encrypt the data. On the other hand, FHE removes the need to perform decryption-encryption at once. An example of this can be seen in the image below:
From the image, in the first case, all data received by the smart contract is received ciphered, then a decryption function obtains the original data, processes it in the smart contract and the results are encrypted to be sent to the recipient of that information. In the second case, thanks to FHE, the smart contract only needs to receive ciphered data, then apply all the necessary operations and output a still ciphered payload that can only be decrypted by the individual with the corresponding private key.
In the real world, when an app needs to execute a computation F on encrypted data, the FHE scheme proportionates an "homomorphic" computation F' that, once performed over the encrypted data, will be equivalent to the operation "decryption-encryption" required by current applications over clear data. In mathematical terms: F ( clear_data ) = Decrypt ( F' ( encrypted_data )).
The achievement of FHE could have a broad impact on society. It could transform the perspective on everyday computations with primordial end-to-end privacy. For example, FHE users would offload numeric-intensive calculations to cloud services with the certainty that the results would not be accessible by anyone but the private key owner. The same could be stated about public blockchains, where smart contracts execute all their operations over encrypted data without revealing the inputs-transaction-outputs.
The major drawback in the FHE adoption is its slow performance. Even though there is active research, computations over FHE encrypted data perform orders of magnitude slower than operations on plain data. In addition to that, app transformation from unencrypted data to FHE-enabled on encrypted payloads is not a trivial translation. When that translation is not properly engineered, it significantly increases the performance difference between computing on clear data and FHE encrypted data, diminishing the benefits of FHE adoption.
This video summarizes the impact FHE will have on multiple industries, like healthcare, finance, or airlines. Another example video shows the possibility of running a Linear Machine Learning model to calculate a prediction over encrypted data using FHE. This other video reviews the FHE landscape.
Fully Homomorphic Encryption over the TorusMultiple implementations employ advanced mathematical techniques to comply with the FHE protocol. Research around this topic preceded four generations plus a Prelude stage that established the foundational technologies. This project focused on a particular implementation named Fully Homomorphic Encryption over the Torus (TFHE)". TFHE is a C/C++ framework that implements a very fast gate-by-gate bootstrapping with an open-source library that evaluates arbitrary boolean circuits composed of binary gates, over encrypted data, without revealing any information on the data.
Among its capabilities, the TFHE framework supports homomorphic evaluation of basic binary gates: AND, OR, XOR, NAND, NOR, NOT, MUX, XNOR. The foundational operation is named "Gate-Bootstrapping". This process is necessary to reduce the amount of "noise" after applying multiple TFHE gates to ciphered data. The image below provides an example of this:
TFHE has no restriction on the number of gates or composition. Therefore, it can perform any computation over encrypted data, even if the applied function is unknown when data is encrypted. The library can operate with manually created circuits or automated circuit generation tools. This video explains in detail the THFE algorithm.
Project implementationThe focus of this project explored the execution of certain functions used by the TFHE library on the Varium C1100 FPGA board. As a background, TFHE is utilized by the NuCypher Blockchain project in two implementations here & here. These give this project a perspective projection into its future integration on a public blockchain.
Thanks to TFHE source code in C/C++ and Vitis HLS developer tools, this project offloaded processing to the Varium C1100. In particular, operations around the Fast Fourier Transform (FFT) and polynomial multiplications. Given the hackathon time constraints and the FHE algorithm understanding, only these operations deploy to the HLS kernel to reach the deadline with a product that takes advantage of the FPGA. Future work will deploy additional gates and employ a larger hardware footprint.
With an overview of the FPGA-TFHE github structure:
As shown earlier, TFHE needs to calculate a "bootstrap key" (BSK) to perform operations over encrypted data while keeping acceptable noise. Even though the BSK creation happens once, it is a time-consuming operation. This project focused its efforts on this execution into the FPGA provided by the need for FFT and fixed data length at the time of execution.
FPGA Poly KernelTFHE is a complex protocol with multiple layers of cryptographic algorithms and mathematical functions. The FPGA Poly Kernel developed for this project focused on the deployment-specific functions required in the BSK creation, as listed below:
- iFFT over integer polynomials
- iFFT over torus polynomials
- Lagrange polynomial multiplications
- FFT over torus polynomials
- Torus polynomial summation
Despite their fancy names, they involve a lot of sums/multiplications and fixed-sized loops over complex and real numbers, which are excellent candidates to deploy on the Varium C1100 FPGA. Note: if you need a refresher on FFT, here is a video that quickly explains what it means and its importance in areas of communication and data analysis.
Vitis HLS is a high-level synthesis tool that enables C/C++ to hardwire onto the device logic fabric and RAM/DSP blocks. This project exploits Vitis HLS to implement hardware kernels in the Vitis IDE flow based on the TFHE C++ code.
The structure of representative kernel files within the xclbin bitstream is in the image below:
There are two important parts related to this project: the PolyKernel.xclbin and the VitisPolynomial (h & cpp). The first is the compiled bitstream composed of three cpp files PolyKernel, PolyProc & FFTTables. The second part takes responsibility to create the XRT-OpenCL objects needed to load the bitstream, communicate with the FPGA, and bridge existing code to the TFHE framework. The image below provides a high-level representation of these two parts:
Inside the PolyKernel.cpp file, there are abstract functions that perform FFT, iFFT, or polynomial operations to help the calculation of the BSK. The following representation provides an overview of their pipeline structure:
The source code, as well as compiled binaries, are in the Github repository here. Here is an outline of the repository:
- Bitstream: Binary files compiled for FPGA, bitstream info and log output
- Competition entry: This project entry as an MD format as well as all the images
- fpga-tfhe: The CPU portion with the rest of the TFHE framework code
- fpga-tfhe_kernels: Source code for the HLS kernel used to compile the bitstream
- fpga-tfhe_system & fpga-tfhe_system_hw_link: Vitis auto-generated files to link the compilation of HLS kernels and CPU XRT code.
- Vitis_Libraries: A submodule reference to the Vitis Libraries, initially considered for the FFT level 2.
Using the Vitis Analyzer over the FPGA kernel, it is possible to see the overall structure of the implementation:
Here it shows how the two inputs and one output connect directly to an HBM module inside the Varium C1100. Also, it shows the hardware utilization with very low percentage numbers, which leaves room for future improvements over multiple cores or deployment of the whole BSK algorithm.
ResultsThe log output of the execution between FPGA-CPU is in the repository inside Bitstream/ResultLogs. It displays three runs: KernelTest, FPGAExeFull, CPUExeFull in a txt format. Below are two graphs that summarize the results obtained from these executions:
There is certainly an overhead to execute on the FPGA, especially when moving data between the Varium C1100 and the CPU. There are still areas of improvement that future work would consider to minimize such overhead while keeping the CPU free to perform other tasks. Among those changes is the full deployment of the BSK into the FPGA; then the whole TFHE algorithm into it. With the experience gained with this project and access to the hardware (thanks Xilinx), that objective is reachable.
Docker DeploymentWith the documentation offered by Xilinx here and the repository with code samples.
With that in context, this project has a docker file that allows users to create their image using the bitstream and compiled binary to run the executable example. It is a prelude to FPGA containerization apps that deploy easily on the Xilinx App Store.
## Create the image from the Dockerfile located in the FPGA-TFHE repository
docker build -t VariumC1100-TFHE .
## Run container with a link to the device
DEVICE="--device=/dev/xclmgmt256:/dev/xclmgmt256 --device=/dev/dri/renderD128:/dev/dri/renderD128"
docker run --name xillTest --rm -t -i $DEVICE VariumC1100-TFHE bash
ChallengesInitially, the main trouble was to define the scope of "what to work on" for the hackathon. The first approach was the development of zero-knowledge proofs on the FPGA, however open source code (ex. 1, 2) was hard to understand and difficult to decode towards a working prototype.
Afterward, another project related to blockchain arose on the developer's radar: nucypher TFHE. Even though the code did not compile and was not testable, it provided the insights to deploy an FHE system onto the FPGA.
Fortunately, the TFHE repository bases its code in C/C++. Thanks to Vitis HLS, as a developer tool to translate code to hardware, it was possible to transfer FFT functions to a kernel that executes on the Varium C1100. This achievement was possible within three months. The main challenge was to test differences between software-hardware emulation, which behaved differently when using static variables, a challenge of two weeks debugging.
Thanks to Hackster and Xilinx's support, this project deployed its bitstream in physical hardware and even a docker container. This step is critical in any hardware development. For example, software and hardware emulations compiled the kernels. However, when creating the xclbin to the Varium C1100, it could not compile because of library restrictions. Nevertheless, after many other challenges, the project reached a deliverable stage.
ConclusionThis project is a precursor to the full deployment of TFHE circuits on reconfigurable hardware. The execution of FFT/polynomial operations only scratches the surface of the potential this line of research has for private computations in the cloud and specially public blockchains.
Even though the initial results require further improvements, the fact that Vitis HLS simplified code translation from a dynamic memory to a static-length circuit environment within the time constraints of this hackathon is an achievement in itself.
Despite Vitis support for software/hardware emulation, tests on real hardware are fundamental to align the project's objectives and its execution.
Future workThis project shows how to deploy C++ code to an FPGA. The steps necessary to take this project to the next level could involve the following:
- Seek to increase the number of functions that execute concurrently on hardware.
- Deploy TFHE fixed circuits in hardware
- Exploit partial reconfiguration to replace TFHE circuits on demand
There are multiple practical applications of FHE, once FHE becomes more relevant in the blockchain and cloud applications, Xilinx FPGA would be well-positioned to be the reference hardware in the deployment of these payloads.
Acronyms- FHE: Fully Homomorphic Encryption
- TFHE: FHE over the Torus
- FPGA: Field Programmable Gate Array
- CPU: Central Processing Unit
- GPU: Graphics Processing Unit
- GPGPU: General Purpose GPU
- ALU: Arithmetic Logic Unit
- DSP: Digital Signal Processor
- HBM: High Bandwidth Memory
- LUT: Look-Up Table
- OCL: Open Compute Language (OpenCL)
- CU: Compute Unit
- SW: Software
- HW: Hardware
- PCI Express (PCI-E): Peripheral Component Interconnect Express
- MUX: Multiplexor
- FFT: Fast Fourier Transform
- iFFT: Inverse FFT
- IDE: Integrated Development Environment
Ph.D. rval735 is a programmer based in Auckland, New Zealand. He focuses his research on Binary Neural Networks. This project serves as the first brick to construct a bridge between blockchain technologies and Artificial Neural Networks.
He is very passionate about topics around BNN, Blockchain, and FPGA acceleration and looks forward to taking better advantage of reconfigurable hardware to make public ledger networks more efficient.
If you would like to contribute to his projects in any possible way, visit the GitHub repository, check his crypto address if you would like help with anything that you can 😉.
Comments