Our team has two sub-projects, namely computer vision care for the elderly who fall down, and cryptocurrency miners. Because the care for the elderly is still under development, this project will describe the part of cryptocurrency miners.
We know that cryptocurrencies have been around for more than 10 years, and the POW method is bulky, so we target Litecoin, which was published in 2011 and uses the algorithm Scrypt for workload authentication.
Next, we will actually verify the availability of miners. Since the core algorithm does not intend to open the source code, the system will be clearly described here, and the viewers will be able to understand our ideas and implementation, so we will focus on pictures.
System flowWe have designed a working pipelined core for Scrypt. It could be briefly describe as follows:
The output of result hash is compared with target hash. If target hash is higher than result hash , the result hash output meets the Litecoin system’s difficulty, and the current nonce is determined to be the Golden Nonce. In contrast, the nonce is increased by one by the increment logic and processed by Scrypt again until the comparison condition is satisfied.
Block dataThe miner will receive the data from the mining pool, which includes block data header (80 bytes), target hash (32 bytes) and max_nounce (4 bytes).
The miner’s goal is to find a solution to send back the mining pool.
HW ImplementThat is our top module, which is describe as follows:
The output logic and input logic will attach XDMA via PCI-E.
That snap code of top module is as here:
MMCM u_MMCM
(
clk_in1 (clk),
.clk_out1 (sys_clk),
.clk_out2 (sys_clk_i),
.locked (LOCKED)
);
mem u_mem
(
.a (bram_addr),
.d (bram_in),
.clk (sys_clk),
.we (bram_write),
.spo (bram_out)
);
controller u_controller
(
.clk(sys_clk),
.n_rst(n_rst),
.xdma_clk(sys_clk_i),
.busy(busy),
.rx_done(rx_done),
.hash_in(scrypt_hash),
.golden_hash(golden_hash),
.nonce_ready(nonce_ready),
.hash_done(hash_done),
.start_hash(start_hash)
);
scrypt_top u_scrypt_top
(
.clk(sys_clk),
.n_rst(n_rst),
.enable(start_hash),
.data(data),
.hash(scrypt_hash),
.hash_done(hash_done),
.bram_read(bram_read),
.bram_write(bram_write),
.bram_addr(bram_addr),
.bram_in(bram_in),
.bram_out(bram_out)
);
xdma_transmit u_xdma
(
.clk(sys_clk_i),
.busy(busy),
.golden_hash(golden_hash),
.data(data)
);
We use IPs as follows:
Mixed-Mode Clock Manager.
Block Memory Generator.
DMA for PCI Express (PCIe) Subsystem
ALG modules connection:
pbkdf2_128 pbf (
.clk(clk),
.n_rst(n_rst),
.pass(data),
.salt(data),
.enable(enable),
.hash(pbf_out),
.hash_done(pbf_done)
);
scrypt_smix scrypt_s (
.clk(clk),
.n_rst(n_rst),
.data(main_in),
.enable(pbf_done),
.hash(main_out),
.hash_done(main_done),
.bram_read(bram_read),
.bram_write(bram_write),
.bram_addr(bram_addr),
.bram_in(bram_in),
.bram_out(bram_out)
);
pbkdf2_32 pbs (
.clk(clk),
.n_rst(n_rst),
.pass(data_copy),
.salt(pbs_in),
.enable(main_done),
.hash(hash_temp),
.hash_done(pbs_done)
);
Test benchBecause VIVADO simulation is pretty slow, we use Questa to verify our design.
The verification plays an important role in hardware design, it will indicate erroneous logic or timing. It is useful to debug our design and improve performance.
//input
640'h0000000c0000000d0000000e0000000f000000100000001100000012000000130000001400000015000000160000001700000018000000190000001a0000001b0000001c0000001d0000001e0000001f
//output
256'hd3ed8f939ba5151a8d78ce3f47fe054396d945e1742b5b406e376106a09af0fe
Let’s try the other datas from real mining pool:
//input
640'h000000209a22c5345212cd2360c49405ced9f81ff17acaf261a9c11975e666dbcd3f9cfb41a1a0bbe52ba0040509b3e32851f62d0dd1332516128a8b5c9e745798c8645d7b920262e442011ad560e5ca
//output
256'h0000003e86b4dfb5f62d2a064965058b194f5e3eaacff30617669fdc7660b87d
And this one:
//input
640'h00000020c6f7a1b5b5533e22eac6adcb36bac9313249e612c88d84a448fe9252e6ce8b46f90c16ce2b2bb18f892df7e39cbcf03007bdd9643c4b8ee77facab9961ecedf58bf00362e442011ad559ebd2
//output
256'h000000011af727e96fb2a8087958aabfd22c0a900ff66891801706ddd24b894f
Finally, our design satisfies the requirements and passes the tests.
SW ImplementThe software side is to hook up the data streaming between pool and local Scrypt hardware core via PCI-E, We use the cpu-miner, which embeds the stratum V2 protocol. We could merge the XDAM code with the cpu-miner, then all is done.
Our snap code provides the idea of hooking up HW and cpu-miner:
do {
for (i = 0; i < throughput; i++) {
data[i * 20 + 19] = ++n;
}
uint32_t num[20];
uint32_t read_buf [8];
for (int j = 0; j < 20; j++) {
num[j] = data[j];
}
//do XDMA streaming
fpga_do(num, read_buf);
for (int j = 0; j < 8; j++) {
hash[j] = read_buf[j];
}
for (i = 0; i < throughput; i++) {
if (hash[i * 8 + 7] <= Htarg && fulltest(hash + i * 8, ptarget)) {
*hashes_done = n - pdata[19] + 1;
pdata[19] = data[i * 20 + 19];
return 1;
}
}
} while (n < max_nonce && !work_restart[thr_id].restart);
The last step is to try mining with a real pool.
Performance of one coreWe clock the FPGA HW at about 300MHz and only test one Scrypt core.After downloading the bitstream to the card, we could test our design with an online mining pool.
Then the performance is about 1.5k hashrate of one core.
Future workWe will be involved with multi-core and share all memory of the Scrypt cores, then performance will increase noticeable.
On the other hand, we also continue to train our schematic model of computer vision for elderly care.
Comments