In this tutorial, we continue to make refinements and improvements to my 16-bit CPU and associated peripherals. This is a follow-up to the original tutorial: Design and Build a 16 Bit CPU.
The changes and new hardware are introduced through a series of new FPGA projects for the Alchitry Au FPGA board. All of the HDL code for every project is contained in the attached ZIP file. All of the CPU code is also there, stored in ROM within the hardware.
I have ended this tutorial with a Tic Tac Toe game. I found myself playing against a CPU that I designed myself! Interesting!!
Changes to the HardwareI am making several changes to the hardware in this next generation version of my 16-bit CPU. A new version of the text ROM is discussed below. We have also added a timed interrupt, which is also discussed below. But first, we have made a few slight changes to the instruction set.
I have replaced PSHI (push immediate onto stack) with JSR (jump to subroutine). PSHI was only there to facilitate subroutines, but required two instructions to call a subroutine. It turns out everything can be done with a single instruction in a single clock cycle.
I also have added a SMAI (set memory address immediate) which allows direct entry of a fixed memory address. I have added a BEQ (branch if equal) which is self-explanatory.
I changed BRZ to BZR (branch if zero), just because I kept trying to write it that way! I have added a CMPI (compare immediate) which allows a register to be compared directly with a 16-bit number. I have deleted the SEF and CEF (set and clear the equal flag), as I never used them. We now have the full complement of 32 instructions.
These instruction set changes are not implemented until our final project, the Tic Tac Toe game. The code for Tic Tac Toe is quite long with lots of references to RAM and lots of subroutines. These new changes to the instruction set significantly reduced the total number of instructions in the program.
CPU16_ pcNamesThis project is about improving the instruction ROM. As I said in my previous article, our global constants table provides easy-to-remember names for instructions, registers, ports, and calls to peripherals, so the coding is pretty much like assembly language. The exception is that program counter addressing is all manual. We can, however, build a small table of named program counter addresses inside the instruction ROM, and thereby significantly improve the readability of our program code.
The example I chose to try this on is my previous “serial greeting” program. The new version has all referenced addresses within the program named and labelled. Here's a sample, first of the named addresses and then how they appear in the code itself.
One other thing I started doing is always keeping numbers that are part of the code to the left and numbers just filling out the 24-bit instruction width to the right. Again, it improves readability!
Although assembly with this system is still manual, and more time consuming and error prone than an assembler, the resulting code listing is very readable. For anyone familiar with assembly language, this code is every bit as readable as assembler output. I plan to use this approach on all future projects/programs.
Note: For previous serial interface programs, I had the hardware set to 9600 baud. Alchitry Labs built-in Serial Monitor, however, defaults to 1, 000, 000 baud. So, for this project and all future projects using the serial interface, I am going with 1, 000, 000 baud!
CPU16_newTextROMThis is a change to the hardware! Up until now, the build-in ROM, designed for text output, was one dimension, storing a single 8-bit ASCII character in each memory location. And I was putting in the ASCII codes themselves, although Lucid / Verilog actually handles text directly, allowing us the put the actual text in quotes. It would also be a big improvement to have each address in ROM refer to a whole collection of characters.
So, I have re-written the ROM and its interface to the CPU. The inputs to the ROM are “address” and “select”. Select specifies a character within the text at that address – it’s basically a cursor! The output is still one character.
A line of text at a given address is fixed at 64 characters, so select is 6 bits wide. I also changed the ROM address to 6 bits. That gives us room for 4096 characters, which should be enough for anything I am going to do!
Above is the old text ROM and below is the new one:
I didn’t want to tie up three registers outputting text, so the LDR (load from ROM) instruction only specifies the destination register for the character and one other for the select input. The address itself is preloaded using the SMA (set memory address). SMA was originally designed for RAM only, but has now been modified to preset a ROM address as well. So, a single register can first set the address, and then be used to index through the characters at that address.
We are also creating a couple of standard subroutines that will output a line of text, the first automatically terminating when a line feed (\n) is detected, and a second that outputs a specified number of characters. This second routine lets you push the return address onto the stack, and then push the number of characters you want outputted on to the stack.
One feature worth noting about the ROM itself is that each line of text is preceded by the $reverse function. This is necessary because HDLs index arrays from right to left. Without the $reverse, the 0 index would refer to the last character of text instead of the first!
CPU16_timed_interruptThis is another change to the hardware! Here I am adding a timed interrupt to the CPU. The reason we need one is that sometimes we need a process running in the background - it needs to happen regardless of what else we are doing, like perhaps waiting for a serial input. This interrupt also makes time delays much easier to implement!
Adding the timed interrupt was surprisingly easy – 8 new lines of code. A new counter interrupts normal processing every 100 microseconds, stores the program counter on the stack, and jumps to a subroutine at pc10000. This is the hard-wired location of our interrupt. At minimum, we need one instruction at this location to generate an RTN (return from subroutine).
Our interrupts are generated every 100 microseconds. We can implement any desirable interrupt rate less often than that by simply putting a counter there and taking action only when the counter reaches its target.
For an initial test of our interrupt system, we are going to use our previous “switch counter” program and add the interrupt to make an LED blink once per second, at the same time as our switch counter runs. This system seems to work perfectly, so are next project will be to actually try this out with our 4-digit 7 segment display!
CPU16_Serial_to 7SegmentThis project takes numbers from a keyboard entered through the serial input and puts them on the Alchitry IO board’s 7-segment display. It doesn’t really serve a purpose other than to demonstrate how our timed interrupt can be used to allow our processor to do multiple things at once. Our timed interrupt operates the 4-digit, 7-segment display on the Alchitry IO board. It’s 4 digits are multiplexed and all 4 need to be refreshed every 20 msec. to make them all appear to be on at the same time. So, I store the numbers to be displayed in RAM, and every 5 msec. the interrupt updates one digit of the display. This is done in the background by our timed interrupt, independent of what the main program is doing.
The main program gets characters from the serial interface. If they are a 0 – 9, they are converted from the character to the number itself. This can be done by simply subtracting hex 30 from the character, as the ASCII codes for numbers are hex 30 - hex 39. If the character is not a number, it is simply ignored. The one exception is the backspace. If the backspace code is entered, the display is cleared and all digits are set back to 0.
When a new number is entered from the keyboard, the numbers in memory are shifted one digit to the left and the new number is stored in RAM at the right-most or least significant digit.
We need a small table to encode numbers in their 7-segment format. Fortunately, out text ROM is a perfect place to put it. To make this as simple as possible, I disabled the character select feature and dropped the 7-segment encoding into the ROM as single characters selected by an address.
You might notice in the code that the RAM addresses used to store the 4 digits are 1, 2, 4 and 8. The register used to select which digit is being addressed uses these 4 values (known as a “one hot”) to select the digit on the Alchitry IO board. Since I already had those numbers for the digit select, I decided to use them “as is” for the RAM addresses.
CPU16_LCD_displayFor this project, we use the Br Prototype board to interface our CPU with an LCD display. Details on the schematic and connections are shown is a previous tutorial, LCD Controlled by a FPGA. The connections here are identical, but the program to control the display is written in our 16-bit CPU assembly language. Also note that I have removed the io board and its constraint file for this project, because some of the pins I am using for the LCD display conflict with io board pins.
We need several different time delays to program the HD44780 LCD controller, so I put our timed interrupt to work for them. Every 100 microseconds, it decrements register X5. Then we have Delay1 which counts 10 decrements for a 1 msec. delay and Delay10 which counts 100 decrements for a 10 msec delay. We then use these two delays to create the timing required by the LCD controller.
We are using the 4-line LCD display in 4-bit mode, meaning we only have data lines D4, D5, D6 and D7 connected. We start by sending a startup sequence which puts the controller in 4-bit mode. Instructions and characters are all 8-bit, so from here on we send the 4 upper order bits followed by the 4 lower order bits. Next, we send a number of instructions to configure the display and turn it on. Finally, we send it our text message, which is stored in the text ROM.
There is quite a bit of code (about 120 lines of assembly code) when setting up the LCD display, but this was much easier to implement in assembly language than our previous project with the LCD display, where we did the whole thing in hardware!
CPU16_TicTacToeTic Tac Toe is pretty easy to program in a high-level language, but is a little more difficult in assembly language. This took about 600 lines of code. Lots of repetition, though, so not nearly as bad as it sounds. And with this many lines of code, I am finally doing this manual assembly with much fewer errors! Even so, this program took many hours. Debugging was particularly time consuming. I had forgotten how tedious assembly language programming can be!
The logic by which the CPU plays is pretty simple. It proceeds though its turn with the following steps:
1. Did the other player just win?
2. Is there a way the CPU can win with the next move?
3. Is there a move the CPU can make to block the player from winning on next move?
4. Is the middle square available?
5. Is a corner square available?
6. Take any square left.
The first three questions above are answered by summing the content across the 8 ways the game can be won. For example, if we add the content of the top three squares across and find a total of 265, that is the sum three Ascii “X” s. It tells the CPU that the player just won. If the same total is 208, it tells the CPU that there are 2 “X” s and a space – put an “O” in the space or you will lose on the next play. We are reading these sums directly out of the RAM memory that is displaying our Tic Tac Toe grid.
This is not the computer version of Tic Tac Toe that is impossible to beat. The steps above make our CPU a good player, but not a perfect one. There are a couple of ways to beat it.
One final thing I did to make my Tic Tac Toe game work a little better is add random play to the CPU’s move. Since I don’t actually have a random number generator, I put pseudo random numbers into the text ROM and used them to determine which corner or which middle square the CPU moves to. This makes its behavior much less predictable and the game more interesting. Unfortunately, although this sounds like a simple addition, it added a lot of extra code for a small improvement in function!
One thing was very noticeable with Tic Tac Toe and its 600 lines of ROM based code - Vivado took about 3 times as long to build the hardware than it had on any of my other simpler programs! It has to resolve “nodes with overlaps”, its jargon for dealing with pieces of circuitry competing for the same resources.
ConclusionWe have a pretty good CPU now, and almost all the pieces for a complete microcontroller. We are still missing analog inputs and outputs, both of which take some external hardware that is not available on the FPGA.
This was a very educational set of projects! I am debating whether or not to stop here. There are many potential improvements, but they would require major changes.
For example, I would like the change the RAM to byte organization and start storing program code in RAM instead of ROM.
I would like to migrate this whole project to Alchitry's newest platform: Labs 2 and Lucid 2.
I would also like to develop an assembler for my CPU, and a way to download executable machine code files from a PC. Each of these potential improvements would be substantial projects!
Comments