I designed Space Invaders in full hardware using high-level synthesis with BSV (Bluespec SystemVerilog) for study purposes. There are two types of FSMs (Finite State Machines): the GameFSM, which controls the game scenario, and the SoundFSM, which receives sound codes from the GameFSM and plays various sounds. Between the two FSMs, there is a single-stage queue OneStage that buffers the sound codes.
Figure 1. GameFSM and SoundFSM
We happened to use the following design approach for each FSM.
- SoundFSM ---- State-based design
- GameFSM ---- Sequence-based design
Here, state-based design means writing each state one by one in BSV, while sequence-based design means designing using high-level synthesis without state decomposition and leaving the implementation of the state machine to the high-level synthesis compiler.
The reason for this difference in design approach is that when we designed the Verilog version, we used state-based design for everything, so SoundFSM faithfully designed its relatively simple state transitions in BSV. GameFSM, on the other hand, has a very large number of states, and we wanted to leave the state transitions to the compiler.
We also changed SoundFSM to a sequence-based design this time. The reason, of course, was that state-based design does not offer the advantages of high-level languages because state decomposition is done manually, and we wanted to see the difference in results between state-based design and sequence-based design for the same functionality.
System ConfigurationThe SoundFSM soft block hierarchy is shown in Figure 2. The soft block is a hierarchy whose contents are visible and consists of mkSoundFSM, an instance of SoundFSM, and an 8-bit x 32K word single-port ROM. The upper sound hierarchy uses 4 channels of this hierarchy as before, synthesizing them with a mixer and then converting them into data for the serial DAC by making a parallel-serial conversion.
Figure 2. Hierarchy including SoundFSM and dedicated ROM
SoundFSMBasically, as in the previous article, removing the rule parts that made up the state machine, and the declaration
import StmtFSM::*;
calls a library that automatically designs a state machine. Processes are written in an infinite loop
while(True) seq
and those processes are converted into a state machine by a high-level synthesis compiler.
HandshakingFigure 1 shows the handshaking between GameFSM and SoundFSM, It is a so-called 2-wire handshake and works by the following algorithm.
- The GameFSM waits for
empty
and outputs a code andwr_en
ifempty
. - The OneStage will then see
wr_en
to set the empty flag to!emtpy
. - The GameFSM moves on to the next process after outputting the code.
- The SoundFSM waits for a valid code and
!empty
, and returnsrd_en
upon!empty
input. - The OneStage returns to
empty
state byrd_en
. - The SoundFSM waits for
empty
and output!rd_en
. - The SoundFSM turns off the
UFO flag
if it is a UFO off command and exits. - The SoundFSM decodes the format if it is not a UFO off command.
- The SoundFSM plays the sound for the count value.
However, once the UFO flight sound is turned on, it is designed to continue sounding until it is turned off, so it must be performed independently of the above handshake. Therefore, when the UFO flight sound comes, the UFO flag is set internally and the FSM is activated by inputting a command using the above handshake or by turning the UFO flag ON at startup.
Special Algorithm for UFO Flight Sounds
- The GameFSM does nothing.
- SoundFSM executes if
UFO flag == True
, but does not watchempty
and does not returnrd_en
. - The SoundFSM internally treats the
UFO flag
as a UFO flight sound code. - The SoundFSM decodes the format.
- The SoundFSM plays the sound for the count value.
Basically the same as in the previous article, but we have readjusted the volume balance, extending it by about 40% and fading it out further, as shown in Figure 5, because the self-destruct sound was too short.
Figure 3. Extension of self-destructing sound
The ROM configuration is identical to the previous version, we use four 8bit x 32Kword ROMs for the number of FSM channels.
Test caseTest cases are important when developing code. When we find a bug and fix the algorithm, we do not only need to check the case where we found the bug, but we need to make sure that we have not degraded in all cases. Therefore, once we fix one part, we basically have to flush all the test cases. Since this cannot be done manually, we have no choice but to automate the process to run the test cases.
We found the bugs with these test cases, fixed them, and then re-ran the whole flowing loop to develop the source code. In reality, the test case may have some omissions, and even if it works on the test case, we may find a bug after baking it into FPGA and executing it. In such cases, we first created a test case that reproduced the bug, fixed the bug, and confirmed it in the test case and FPGA.
Source CodeWe attach the source code of the completed BSV. We have tried to absorb the differences between FSM0, 1, 2, and 3 with defined pseudo-instructions to make the source common.
import StmtFSM::*;
`define SOUND1_ON 1 // 自弾発射音_ON
`define SOUND2_ON 2 // 自機爆発音_ON
`define SOUND3_ON 3 // インベーダ爆発音_ON
`define SOUND4_ON 4 // インベーダ歩行音1_ON
`define SOUND5_ON 5 // インベーダ歩行音2_ON
`define SOUND6_ON 6 // インベーダ歩行音3_ON
`define SOUND7_ON 7 // インベーダ歩行音4_ON
`define SOUND8_ON 8 // UFO爆発音_ON
`define SOUND9_ON 9 // 自機増加音_ON
`define SOUND10_ON 10 // UFO飛行音_ON
`define SOUND10_OFF 11 // UFO飛行音_OFF
`define NULL 'h80
`define COND_FSM0 !emptyf && (code == `SOUND1_ON || code == `SOUND2_ON || code == `SOUND9_ON)
`define COND_FSM1 !emptyf && (code == `SOUND3_ON)
`define COND_FSM2 !emptyf && (code == `SOUND4_ON || code == `SOUND5_ON || code == `SOUND6_ON || code == `SOUND7_ON)
`define COND_FSM3 !emptyf && (code == `SOUND8_ON || code == `SOUND10_ON || code == `SOUND10_OFF)
typedef UInt#(15) Addr_t;
typedef UInt#(8) Data_t;
typedef Bit#(4) Code_t;
interface FSM_ifc; method Action sound(Code_t code); method Action rom_data(Data_t indata);
method Action sync(Bool lrclk);
method Action empty(Bool flag);
method Addr_t rom_address();
method Data_t sdout();
method Bool soundon();
method Bool fifo_ren();
endinterface
(* synthesize,always_ready,always_enabled *)
`ifdef FSM0
module mkSoundFSM0(FSM_ifc);
`elsif FSM1
module mkSoundFSM1(FSM_ifc);
`elsif FSM2
module mkSoundFSM2(FSM_ifc);
`elsif FSM3
module mkSoundFSM3(FSM_ifc);
`endif
Wire#(Code_t) code <- mkWire, current <- mkRegU;
Wire#(Bool) lrclk <- mkWire;
Reg#(Data_t) romdata <- mkRegU, data <- mkRegU, dout <- mkReg(`NULL);
Reg#(UInt#(32)) workd <- mkRegU;
Reg#(UInt#(15)) dcount <- mkRegU;
Reg#(Addr_t) worka <- mkRegU, romaddr <- mkRegU, addr <- mkRegU;
Reg#(UInt#(8)) ii <- mkReg(0);
Reg#(Bool) son <- mkReg(False), sonEarly <- mkReg(False), ren <- mkReg(False), emptyf <- mkReg(True);
`ifdef FSM3
Reg#(Bool) fUFO <- mkReg(False);
`endif
// subfunctions
// READ MEM
// input: worka
// output: romdata;
function Stmt readmem;
return (seq
addr <= worka;
noAction;
data <= romdata;
endseq);
endfunction
// READ COUNT
// input: romaddr
// output: (romaddr,...,romaddr+3) => dcount;
// romaddr + 4 => romaddr;
function Stmt readcount;
return (seq
workd <= 0;
for (ii <= 0; ii <= 3; ii <= ii + 1) seq
worka <= romaddr + extend(3-ii);
readmem;
if (ii == 3) dcount <= truncate(workd<<8) | extend(romdata);
else workd <= workd<<8 | extend(romdata);
endseq
romaddr <= romaddr + 4;
endseq);
endfunction
Stmt main = seq
while(True) seq
action
dout <= `NULL;
sonEarly <= False;
son <= False;
ren <= False;
endaction
`ifdef FSM0
await(`COND_FSM0);
action
ren <= True;
current <= code;
endaction
`elsif FSM1
await(`COND_FSM1);
action
ren <= True;
current <= code;
endaction
`elsif FSM2
await(`COND_FSM2);
action
ren <= True;
current <= code;
endaction
`elsif FSM3
await(`COND_FSM3 || fUFO);
if (`COND_FSM3) action
fUFO <= (code == `SOUND10_ON);
ren <= True;
current <= code;
endaction else if (fUFO) action
current <= `SOUND10_ON;
endaction
`endif
await(emptyf);
ren <= False;
`ifdef FSM3
if (code == `SOUND10_OFF) continue;
`endif
await(lrclk);
await(!lrclk);
delay(4);
action
case (current)
`ifdef FSM0
`SOUND1_ON: romaddr <= 0 + 16;
`SOUND2_ON: romaddr <= 3422 + 16;
`SOUND9_ON: romaddr <= 16150 + 16;
`elsif FSM1
`SOUND3_ON: romaddr <= 0 + 16;
`elsif FSM2
`SOUND4_ON: romaddr <= 0 + 16;
`SOUND5_ON: romaddr <= 1266 + 16;
`SOUND6_ON: romaddr <= 2836 + 16;
`SOUND7_ON: romaddr <= 4406 + 16;
`elsif FSM3
`SOUND8_ON: romaddr <= 0 + 16;
`SOUND10_ON: romaddr <= 25968 + 16;
`endif
endcase
endaction
readcount;
romaddr <= romaddr + extend(dcount) + 4;
readcount;
romaddr <= romaddr - 1;
while (!((dcount == 0) ||
`ifdef FSM0
(`COND_FSM0 && current !=`SOUND9_ON))) seq
`elsif FSM1
(`COND_FSM1))) seq
`elsif FSM2
(`COND_FSM2))) seq
`elsif FSM3
(`COND_FSM3))) seq
`endif
if (sonEarly == False) seq
readmem;
action
sonEarly <= True;
son <= False;
dout <= `NULL;
endaction
endseq else seq
readmem;
action
son <= True;
dout <= romdata;
endaction
endseq
delay(11);
action
romaddr <= romaddr + 1;
worka <= romaddr + 1;
dcount <= dcount - 1;
endaction
endseq
`ifdef FSM3
if ((code == `SOUND8_ON || code == `SOUND10_OFF) && !emptyf) fUFO <= False;
`endif
endseq
endseq;
mkAutoFSM(main);
method Action sound(Code_t incode);
code <= incode;
endmethod
method Action rom_data(Data_t indata);
romdata <= indata;
endmethod
method Addr_t rom_address();
return addr;
endmethod
method Data_t sdout();
return dout;
endmethod
method Bool soundon();
return son;
endmethod
method Action sync(Bool inlrclk);
lrclk <= inlrclk;
endmethod method Bool fifo_ren();
return ren;
endmethod
method Action empty(Bool flag);
emptyf <= flag;
endmethod
`ifdef FSM0
endmodule: mkSoundFSM0
`elsif FSM1
endmodule: mkSoundFSM1
`elsif FSM2
endmodule: mkSoundFSM2
`elsif FSM3
endmodule: mkSoundFSM3
`endif
Result ComparisonTables 2 and 3 show the results for one channel of SoundFSM with the same specifications designed using two different design methods: state-based design and sequence-based design. The reason for the wide range in the number of rows and FPGA LUTs is that the specifications are slightly different for channels 0 to 3.
- The number of BSV lines decreased by 66%, from 437 to 288.
- The number of generated Verilog lines increased significantly, from an average of 913 to 1, 892, almost 200%. However, looking inside, the number of error indications unrelated to logic synthesis was about 1, 000 lines, and the number of logic synthesis targets was about 800 lines, not much different from the manual design.
- Compared to FPGA LUTs, the average number of LUTs increased slightly by 18%, from 191 to 226, but the amount of material did not increase as much as the number of Verilog lines increased. The reason for this is as noted above.
- Development man-hours do not increase linearly as the number of lines increases but are expected to increase exponentially due to the combination of the two. Therefore, the decrease in the number of BSV lines is considered to have resulted in a 1/3 reduction in development man-hours.
We designed the same sound FSM using two different design methods: state (rule)-based design in the previous case and sequence-based design in this case. Looking at the ratio of the number of BSV to Verilog rows in Tables 2 and 3, the rule-based design (Table 2) increases by a factor of 2.09, while the sequence-based design (Table 3) increases by a factor of 6.57, an increase of about 3.14. However, the cause of the increase appears to be due to the error indication of the automatic state machine generation. Since the final FPGA LUT does not change much, we can evaluate that the automatic state machine design (sequence-based design) is superior enough with respect to the tradeoff with man-hour reduction.
Comments