This article is focused on how DRAM (Dynamic Random Access Memory), or more specifically SDR SDRAM (Single Data Rate Synchronous DRAM), works and how they can be used in FPGA projects.
In the first article I made about SRAMs, a brief explanation on how DRAMs are made at transistor level and a comparison between SRAM and DRAM memories was made.
In a regular FPGA we can have two types of embedded memory: distributed RAM and block RAM. A distributed RAM is made from the logic cell's look-up tables (LUTs). A block RAM is a special memory module embedded in an FPGA device and is separated from the regular logic cells.
There is a third type of embedded RAM called UltraRAM. UltraRAM is a memory block in Xilinx UltraScale+ families that enables up to 500Mb of total on-chip storage.
Although there are some options for internal memories, the amount of memory available is too small for many designs, that's why FPGA designers are forced to use dedicated memory chips. These dedicated memory chips are often some sort of SDRAM, because they are cheap (much more than an SRAM) and fast. But they are neither as fast nor as easy to use as a SRAM. In other words, they are compelling thanks to their high-speed and low cost per bit.
BackgroundIn a dynamic memory, the memory is not seen as a long linear array of words (as it happens in SRAMs), but instead is organized as a matrix (row/column) of words. More specifically, the memory of an SDRAM is split in equal chunks called "banks", which are composed of rows and columns. Therefore, to access a specific piece of data you must specify all three pieces of information: the bank, the row and the column.
As I mentioned in [1], a DRAM cell is composed by a transistor and a capacitor. The capacitor stores a charge, and a transistor allows charge to either be put into the capacitor or taken out. If the capacitor is charged, that means that cell contains a value of 1. And if the capacitor is not charged, that means that cell contains a value of 0.
DRAMs have two main problems:
1- DRAM capacitor charge leaks over time, that's why it is called dynamic RAM. Therefore, because of the capacitor, their rows need to be refreshed periodically in order not to lose its bit value.
2- The read operation removes the charge in the cell, losing its bit value. So, after the data is read out, it must be restored immediately. To do this task, RAM memories have what are called “sense amplifiers”. So, when the read operation is done, the sense amplifier must immediately write the data back in the cell by applying a voltage to it, recharging the capacitor. This is called close row or precharge.
In order to perform a read o write operation, it is necessary to choose a bank and activate one of its row. This is called open row or activation. The row remains active until a precharge is performed. Once a row is open you can issue read or write commands on the retrieved data.
Why a Memory Controller?A memory controller is needed to handle the memory in an easy way. Dynamic memories have rows, columns, banks and refresh cycles to take care of; all this make SDRAMs more difficult to handle than SRAMs.
Therefore, the goal here is to handle SDRAMs with the ease of use of SRAMs.
The SDRAM ControllerIn this article, a SDRAM controller for Micron MT48LC4M16A2 SDRAM chip will be developed. This SDRAM can be found in Papilio Pro FPGA development board [3].
If we take a deep look at the datasheet, we can summarize its main characteristics.
- 64Mb: 1 Meg x 16 x 4 Banks.
- 64ms, 4096-cycle refresh.
- 7.5ns @ CL = 2 (PC133).
CAS Latency (CL) stands for Column Address Strobe. This is the number of clock cycles that pass from when an instruction is given for a particular column and the moment the data is available
First of all, it is important to understand how is typical a FPGA system using an SDRAM controller. See Fig. 1.
Then, it is important to understand all the signals that the SDRAM chip has.
In Fig.2, we can see all the control signals and buffers that the SDRAM has.
The SDRAM has 10 control signals and 2 buffers:
CKE
(Clock Enable): CKE activates (HIGH) and deactivates (LOW) the CLK signal.CLK
(Clock): CLK is driven by the system clock.CS#
(Chip Select): CS# enables (registered LOW) and disables (registered HIGH) the command decoder.WE#
(Write Enable).CAS#
(Column Access Strobe).RAS#
(Row Address Strobe).A[11:0]
(Address inputs).BA0, BA1
(Bank Address inputs). BA0, BA1 define to which bank the ACTIVE, READ, WRITE, or PRECHARGE command is being applied.DQML, DQMH
(Data Qualifier Mask Low / High). Mask to select either high or low byte.DQ[15:0]
(Data Qualifier). DQ is an input/output buffer to read/write data into the SDRAM.
For a more detailed description of the signals, please refer to table 4 of the datasheet.
RAS#, CAS#, and WE# (along with CS#) define the command being entered.
SDRAM memories are command based, that means it is necessary to issue commands to the memory in order to perform any operation like read, write, precharge, etc. In Fig. 3, it is shown the complete list of commands for MT48LC4M16A2.
An SDRAM controller can be modeled using a finite state machine (FSM).
The basic operating scheme is as follows:
After power-up an SDRAM, the device must be initialized in a predefined manner. The initialization of an SDRAM, does no vary much from manufacturer one to another, but is very important to check always the datasheet to see the exact process:
1- Wait at least 100μs prior to issuing any command other than a COMMAND INHIBIT or NOP.
2- Starting at some point during this 100μs period, bring CKE HIGH. Continuing at least through the end of this period, 1 or more COMMAND INHIBIT or NOP commands must be applied.
3- Perform a PRECHARGE ALL command.
4- Wait at least tRP time; during this time NOPs or DESELECT commands must be given. All banks will complete their precharge, thereby placing the device in the all banks idle state.
5- Issue an AUTO REFRESH command.
6- Wait at least tRFC time, during which only NOPs or COMMAND INHIBIT commands are allowed.
7- Issue an AUTO REFRESH command.
8- Wait at least tRFC time, during which only NOPs or COMMAND INHIBIT commands are allowed.
9- The SDRAM is now ready for mode register programming. Using the LMR command, program the mode register.
10- Wait at least tMRD time, during which only NOP or DESELECT commands are allowed.
The mode register sets different options:
- Burst Length
- Burst Type
- CAS Latency
- Operating Mode
- Write Burst Mode
At this point the DRAM is ready for any valid command.
The DRAM now moves to the IDLE state and waits either to perform a refresh or to perform a read/write operation.
The final FSM remains this way (WAIT state has been excluded for clarity):
Each operation requested to the SDRAM requires a number clock cycles to be completed. During those waiting cycles, the NOP command has to be sent to the SDRAM. All this task is performed through the WAIT state.
Refresh operation:
This SDRAM requires 4096 AUTO REFRESH cycles every 64ms. Providing a distributed AUTO REFRESH command every 15.625μs will meet the refresh requirement and ensure that each row is refreshed. Alternatively, 4096 AUTO REFRESH commands can be issued in a burst at the minimum cycle rate (tRFC), once every 64ms.
According to the datasheet (page. 22), tRFC = 15.625 us = 15625 ns. The whole system run at 100 MHz (period 10ns). Therefore 15625/10 = 1.562, 5 clock cycles. So, it is necessary to emit the refresh every 1.562, 5 clock cycles maximum. In other case, we'll lose the data.
Read/Writeoperation:
There are three possible scenarios when a read or write operation is requested.
1- The requested row is already open. In that case, the operation is performed.
2- There is no row opened. In that case, the controller first opens the row and then it performs the operation.
3- There is already another row opened. In that case, the other row must be precharged, before the controller can open the new row and perform the operation.
Clock Forwarding techniqueTrying to synthesize this controller without using a clock forwarding technique will result in this error:
The Papilio Pro has a 32 Mhz oscillator. For generating a 100 MHz clock, the Clocking Wizard IP core was used. If we want to forward that clock to an output pin, a clocking forwading technique is needed in order to avoid delay, skew and routing problems.
The dedicated pads that connect to clock resources are always inputs. It is not possible to connect a clock net to a not-clock resource.
Therefore, a very reasonable workaround is to instantiate the output DDR flip-flop in the manner suggested. It doesn't cost you anything because all of the output pins have output DDR flops available. Additionally, using the ODDR flop (reasonably) guarantees that the skew between the output clock and any outputs which are synchronous to that clock is minimal. You have to make sure that all of those data outputs use the flop in the IOB and not the fabric.
Therefore, an ODDR2 (Output Double Data Rate 2) primitive has been used, as recommend in Fig. 6.
ODDR2 #(
.DDR_ALIGNMENT("NONE"), // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0), // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) ODDR2_inst (
.Q(sdram_clk_ddr), // 1-bit DDR output data
.C0(sys_clk), // 1-bit clock input
.C1(~sys_clk), // 1-bit clock input
.CE(1'b1), // 1-bit clock enable input
.D0(1'b0), // 1-bit data input (associated with C0)
.D1(1'b1), // 1-bit data input (associated with C1)
.R(1'b0), // 1-bit reset input
.S(1'b0) // 1-bit set input
);
The DDR_ALIGNMENT parameter has set as "NONE" to output data on both the rising and falling edges of the clock. The NONE mode uses the rising edges from both clocks C0 and C1. In this case, rising to capture data D0 and D1, respectively. These two data bits are multiplexed by the DDR multiplexer and forwarded to the output pin [4].
In this case, the default valies of D0 and D1 have been exchanged. This gives both devices (SDRAM and FPGA) half a clock cycle for their output to become stable before the other device. This all has to do with satisfying setup and hold times of both devices.
However, just using ODDR2, we cannot ensure timing is met. Therefore, it is important to shift the clock a little more.
To perform this task, another Xilinx primitive is used in this design: IODELAY2. Each IOB in the Spartan-6 FPGA contains a delay line that can be configured either for use as an input delay or output delay. IODELAY2 is used to delay the clock being output to the SDRAM.
IODELAY2 #(
.DATA_RATE("SDR"), // "SDR" or "DDR"
.DELAY_SRC("ODATAIN"), // "IO", "ODATAIN" or "IDATAIN"
.IDELAY_MODE("NORMAL"), // "NORMAL" or "PCI"
.IDELAY_TYPE("FIXED"), // "FIXED", "DEFAULT", "VARIABLE_FROM_ZERO", "VARIABLE_FROM_HALF_MAX"
// or "DIFF_PHASE_DETECTOR"
.IDELAY_VALUE(0), // Amount of taps for fixed input delay (0-255)
.ODELAY_VALUE(100) // Amount of taps fixed output delay (0-255)
)
IODELAY2_inst (
.BUSY(), // 1-bit output: Busy output after CAL
.DATAOUT(), // 1-bit output: Delayed data output to ISERDES/input register
.DATAOUT2(), // 1-bit output: Delayed data output to general FPGA fabric
.DOUT(sdram_clk), // 1-bit output: Delayed data output
.TOUT(), // 1-bit output: Delayed 3-state output
.CAL(1'b0), // 1-bit input: Initiate calibration input
.CE(1'b0), // 1-bit input: Enable INC input
.CLK(1'b0), // 1-bit input: Clock input
.IDATAIN(1'b0), // 1-bit input: Data input (connect to top-level port or I/O buffer)
.INC(1'b0), // 1-bit input: Increment / decrement input
.IOCLK0(1'b0), // 1-bit input: Input from the I/O clock network
.IOCLK1(1'b0), // 1-bit input: Input from the I/O clock network
.ODATAIN(sdram_clk_ddr), // 1-bit input: Output data input from output register or OSERDES2.
.RST(1'b0), // 1-bit input: Reset to zero or 1/2 of total delay period
.T(1'b0) // 1-bit input: 3-state input signal
);
The most important parameter in the IODELAY2 instantiation is ODELAY_VALUE. It defines the delay tap value (0-255) for output delay mode.
In this design, ODELAY_VALUE has been set to 100, ensuring that the setup and hold times are being met. This value has been found empirically.
Input/Output Blocks (IOBs)It is important ensure that there are no additional delays due to the signal needing to propagate through the FPGA. Because of this, the registers that go out from the FPGA must be packed into IOBs.
An IOB is a basic mapping and synthesis constraint which indicates which flip-flops and latches can be moved into the IOB/ILOGIC/OLOGIC. In other words, it is essentially a flip-flop that is embedded in the pin of the FPGA. They aren't in the typical FPGA fabric, but rather right at the inputs and outputs.
It is very important the signals are not read in any other part of the design nor connected to anything else than the top level output/input. If this guidelines are not followed, the tools would be forced to pull the flip-flop out of the IOB, possibly messing up the timing.
The constraints have been used as described in UG625 [5].
(* IOB = "TRUE" *)
reg cke_q;
(* IOB = "TRUE" *)
reg [1:0] dqm_q;
(* IOB = "TRUE" *)
reg [3:0] cmd_q;
(* IOB = "TRUE" *)
reg [1:0] ba_q;
(* IOB = "TRUE" *)
reg [11:0] a_q;
(* IOB = "TRUE" *)
reg [15:0] dq_q;
(* IOB = "TRUE" *)
reg [15:0] dqi_q;
We can check the usage of IOBs, ODDR2 and IODELAY at the synthesis report (see Fig. 6):
At this step, a basic simulation was performed.
In order to make an acceptable simulation, the SDRAM model provided by the vendor will be used [6].
It is important to select the right SDRAM model by adding these three defines:
`define den64Mb
`define sg7e
`define x16
Initialization
Steps 1 and 2:
Rest of the stetps:
Write Cycle andRead Cycle
According to the ISIM console (Fig. 8), a write and read operation have been successfully performed.
Here is a demonstration on how this design works.
First, we see a menu and I write 16 bits in a 22 bits random address (22 bits = [ 12 row addr + 8 column addr + 2 BA ]), and then I read back that same address and check the value written and read from the SRAM is exactly the same.
ConclusionThe controller presented in this article is sophisticated, but a much more complex, efficient and fast controller is possible.
It has been designed for educational purposes only. It is not intended to be a controller for any serious design.
You can find the files associated with this project here:
https://github.com/salcanmor/SDRAM-tester-for-Papilio-Pro
References[1] A Practical Introduction to SRAM Memories Using an FPGA (I). https://www.hackster.io/salvador-canas/a-practical-introduction-to-sram-memories-using-an-fpga-i-3f3992
[2] Datasheet of MT48LC4M16A2-7E: https://www.micron.com/~/media/documents/products/data-sheet/dram/64mb_x4x8x16_sdram.pdf
[3] Papilio Pro official site: http://papilio.cc/index.php?n=Papilio.PapilioPro
[4] Spartan-6 FPGA SelectIO Resources User Guide UG381 (v1.7) October 21, 2015. https://www.xilinx.com/support/documentation/user_guides/ug381.pdf
[5] Constraints Guide UG625 (v. 13.3) October 19, 2011 https://japan.xilinx.com/support/documentation/sw_manuals_j/xilinx13_3/cgd.pdf
[6] SDRAM model by Micron: https://www.micron.com/-/media/client/global/documents/products/sim-model/dram/sdram/sdr_sdram.zip
Comments