In the first article of this series, a functional but coarse SRAM controller was presented. In this second article, a very optimized, fast and efficient controller will be presented, together with all the steps followed to reach that goal.
A succession of versions will be presented. In it you can see how the controller has been improved until reaching the optimal design.
Controller v.1The methodology for writing a FSM-based controller has been the following one:
- List all the actions that controller must take.
- Place each action in a different state of the FSM.
- Check if the design provides large timing margins and does not impose any stringent timing constraints.
After doing that, the following controller was written.
`timescale 1ns / 1ps
module sram_ctrl3(clk, start_operation, rw, address_input, data_f2s, data_s2f, address_to_sram_output, we_to_sram_output, oe_to_sram_output, ce_to_sram_output, data_from_to_sram_input_output, data_ready_signal_output, writing_finished_signal_output, busy_signal_output);
input wire clk ; // Clock signal
input wire start_operation; // start operation signal
input wire rw; // With this signal, we select reading or writing operation
input wire [18:0] address_input; // Address bus
input wire [7:0] data_f2s; // Data to be writteb in the SRAM
output wire [7:0] data_s2f; // It is the 8-bit registered data retrieved from the SRAM (the -s2f suffix stands for SRAM to FPGA)
output reg [18:0] address_to_sram_output; // Address bus
output reg we_to_sram_output; // Write enable (active-low)
output reg oe_to_sram_output; // Output enable (active-low)
output reg ce_to_sram_output; // Chip enable (active-low). Disables or enables the chip.
inout wire [7:0] data_from_to_sram_input_output; // Data bus
output reg data_ready_signal_output; // Ready signal
output reg writing_finished_signal_output; // Writing finished signal
output reg busy_signal_output; // Busy signal
//FSM states declaration
localparam [4:0]
rd0 = 3'b000,
rd1 = 3'b001,
rd2 = 3'b010,
rd3 = 3'b011,
wr0 = 3'b100,
wr1 = 3'b101,
wr2 = 3'b110,
wr3 = 3'b111,
idle = 4'b1000;
// signal declaration
reg [3:0] state_reg;
reg [7:0] register_for_reading_data;
reg [7:0] register_for_writing_data;
reg register_for_splitting;
initial
begin
ce_to_sram_output<=1'b1;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b1;
state_reg <= idle;
register_for_reading_data[7:0]<=8'b0000_0000;
register_for_writing_data[7:0]<=8'b0000_0000;
register_for_splitting<=1'b0;
data_ready_signal_output<=1'b0;
writing_finished_signal_output<=1'b0;
busy_signal_output<=1'b0;
end
always@(posedge clk)
begin
case(state_reg)
idle:
begin
if(~start_operation)
state_reg <= idle;
else begin
if(rw)
state_reg <= rd0;
else
state_reg <= wr0;
end
end
rd0:
begin
busy_signal_output<=1'b1;
address_to_sram_output[18:0]<=address_input[18:0];
state_reg <= rd1;
end
rd1:
begin
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b0;
we_to_sram_output<=1'b1;
state_reg <= rd2;
end
rd2:
begin
register_for_reading_data[7:0]<=data_from_to_sram_input_output[7:0];
data_ready_signal_output<=1'b1;
state_reg <= rd3;
end
rd3:
begin
ce_to_sram_output<=1'b1;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b1;
busy_signal_output<=1'b0;
data_ready_signal_output<=1'b0;
state_reg <= idle;
end
wr0:
begin
busy_signal_output<=1'b1;
address_to_sram_output[18:0]<=address_input[18:0];
register_for_writing_data[7:0]<=data_f2s[7:0];
state_reg <= wr1;
end
wr1:
begin
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b0;
register_for_splitting<=1'b1;
state_reg <= wr2;
end
wr2:
begin
register_for_splitting<=1'b0;
writing_finished_signal_output<=1'b1;
state_reg <= wr3;
end
wr3:
begin
busy_signal_output<=1'b0;
ce_to_sram_output<=1'b1;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b1;
writing_finished_signal_output<=1'b0;
state_reg <= idle;
end
endcase
end
assign data_s2f = register_for_reading_data;
assign data_from_to_sram_input_output = (register_for_splitting) ? register_for_writing_data : 8'bz;
endmodule
As you can see in the code above, 9 states were used to make the controller work: 4 for the read operation, 4 for the write operation and 1 state for the idle state.
In this controller some flags were added, compared to the controller presented in the first article:
- data_ready_signal_output: this flag goes up for 1 clock cycle after a read operation is performed, indicating that a new data is ready to be used.
- writing_finished_signal_output: this flag goes up for 1 clock cycle after a write operation is performed.
- busy_signal_output: this flag goes up during a read/write operation.
Having 9 states is not efficient so, a new controller was made.
Controller v.2Controller no.2 presents some improvements compared to the previous one.
Some states were merged and the result was a FSM with 5 states: 2 for the read operation, 2 for the write operation and 1 state for the idle state.
`timescale 1ns / 1ps
module sram_ctrl4(clk, start_operation, rw, address_input, data_f2s, data_s2f, address_to_sram_output, we_to_sram_output, oe_to_sram_output, ce_to_sram_output, data_from_to_sram_input_output, data_ready_signal_output, writing_finished_signal_output, busy_signal_output);
input wire clk ; // Clock signal
input wire start_operation; // start operation signal
input wire rw; // With this signal, we select reading or writing operation
input wire [18:0] address_input; // Address bus
input wire [7:0] data_f2s; // Data to be writteb in the SRAM
output wire [7:0] data_s2f; // It is the 8-bit registered data retrieved from the SRAM (the -s2f suffix stands for SRAM to FPGA)
output reg [18:0] address_to_sram_output; // Address bus
output reg we_to_sram_output; // Write enable (active-low)
output reg oe_to_sram_output; // Output enable (active-low)
output reg ce_to_sram_output; // Chip enable (active-low). Disables or enables the chip.
inout wire [7:0] data_from_to_sram_input_output; // Data bus
output reg data_ready_signal_output; // Ready signal
output reg writing_finished_signal_output; // Writing finished signal
output reg busy_signal_output; // Busy signal
//FSM states declaration
localparam [4:0]
rd0 = 3'd1,
rd1 = 3'd2,
wr0 = 3'd3,
wr1 = 3'd4,
idle = 3'd5;
// signal declaration
reg [3:0] state_reg;
reg [7:0] register_for_reading_data;
reg [7:0] register_for_writing_data;
reg register_for_splitting;
initial
begin
ce_to_sram_output<=1'b1;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b1;
state_reg <= idle;
register_for_reading_data[7:0]<=8'b0000_0000;
register_for_writing_data[7:0]<=8'b0000_0000;
register_for_splitting<=1'b0;
data_ready_signal_output<=1'b0;
writing_finished_signal_output<=1'b0;
busy_signal_output<=1'b0;
end
always@(posedge clk)
begin
case(state_reg)
idle:
begin
register_for_splitting<=1'b0; // We configure the data bus for reading
writing_finished_signal_output<=1'b1; // The write operation is not in process
ce_to_sram_output<=1'b1; // Chip disabled
oe_to_sram_output<=1'b1; // Output disable
we_to_sram_output<=1'b1; // Enabled for READING
busy_signal_output<=1'b0; // The controller is not busy
data_ready_signal_output<=1'b0; // No data ready for reading
if(~start_operation)
state_reg <= idle;
else begin
if(rw)
state_reg <= rd0;
else
state_reg <= wr0;
end
end
//============================== READING PHASE ==============================
rd0:
begin
busy_signal_output<=1'b1;
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b0;
we_to_sram_output<=1'b1;
address_to_sram_output[18:0]<=address_input[18:0];
state_reg <= rd1;
end
rd1:
begin
register_for_reading_data[7:0]<=data_from_to_sram_input_output[7:0];
data_ready_signal_output<=1'b1;
state_reg <= idle;
end
//============================== WRITING PHASE ==============================
wr0:
begin
writing_finished_signal_output<=1'b0;
busy_signal_output<=1'b1;
address_to_sram_output[18:0]<=address_input[18:0];
register_for_writing_data[7:0]<=data_f2s[7:0];
state_reg <= wr1;
end
wr1:
begin
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b0;
register_for_splitting<=1'b1;
state_reg <= idle;
end
endcase
end
assign data_s2f = register_for_reading_data;
assign data_from_to_sram_input_output = (register_for_splitting) ? register_for_writing_data : 8'bz;
endmodule
Controller v.3We can have a 3 state controller, which is a very efficient one(in terms of clock cycles).
The ASMD chart for this controller is shown in Figure 1.
Recall that the FSM is controlled by a 12-MHz clock signal and thus stays in each state for 83, 33 ns.
This controller uses one clock cycle (i.e., 83, 33 ns) to complete memory access and requires two clock cycles (i.e., 166, 66 ns) for back-to-back operations.
tAA (address access time) takes 10 ns, so we have a 73, 33 ns margin in each memory access.
The only way to reduce this margin is to increase the clock frequency.
The code for this controller is presented below.
`timescale 1ns / 1ps
module sram_ctrl5(clk, start_operation, rw, address_input, data_f2s, data_s2f, address_to_sram_output, we_to_sram_output, oe_to_sram_output, ce_to_sram_output, data_from_to_sram_input_output, data_ready_signal_output, writing_finished_signal_output, busy_signal_output);
input wire clk ; // Clock signal
input wire start_operation; // start operation signal
input wire rw; // With this signal, we select reading or writing operation
input wire [18:0] address_input; // Address bus
input wire [7:0] data_f2s; // Data to be writteb in the SRAM
output wire [7:0] data_s2f; // It is the 8-bit registered data retrieved from the SRAM (the -s2f suffix stands for SRAM to FPGA)
output reg [18:0] address_to_sram_output; // Address bus
output reg we_to_sram_output; // Write enable (active-low)
output reg oe_to_sram_output; // Output enable (active-low)
output reg ce_to_sram_output; // Chip enable (active-low). Disables or enables the chip.
inout wire [7:0] data_from_to_sram_input_output; // Data bus
output reg data_ready_signal_output; // Ready signal
output reg writing_finished_signal_output; // Writing finished signal
output reg busy_signal_output; // Busy signal
//FSM states declaration
localparam [1:0]
idle = 2'b00,
rd0 = 2'b01,
wr0 = 2'b10;
// signal declaration
reg [3:0] state_reg;
reg [7:0] register_for_reading_data;
reg [7:0] register_for_writing_data;
reg register_for_splitting;
initial
begin
ce_to_sram_output<=1'b1;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b1;
state_reg <= idle;
register_for_reading_data[7:0]<=8'b0000_0000;
register_for_writing_data[7:0]<=8'b0000_0000;
register_for_splitting<=1'b0;
data_ready_signal_output<=1'b0;
writing_finished_signal_output<=1'b0;
busy_signal_output<=1'b0;
end
always@(posedge clk)
begin
case(state_reg)
idle:
begin
register_for_splitting<=1'b0; // We configure the data bus for reading
writing_finished_signal_output<=1'b1; // The write operation is not in process
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b0;
we_to_sram_output<=1'b1;
busy_signal_output<=1'b0; // The controller is not busy
data_ready_signal_output<=1'b0; // No data ready for reading
if(~start_operation)
state_reg <= idle;
else begin
if(rw) begin
address_to_sram_output[18:0]<=address_input[18:0];
state_reg <= rd0;
end
else begin
address_to_sram_output[18:0]<=address_input[18:0];
register_for_writing_data[7:0]<=data_f2s[7:0];
state_reg <= wr0;
end
end
end
//============================== READING PHASE ==============================
rd0:
begin
register_for_reading_data[7:0]<=data_from_to_sram_input_output[7:0];
data_ready_signal_output<=1'b1;
busy_signal_output<=1'b1;
state_reg <= idle;
end
//============================== WRITING PHASE ==============================
wr0:
begin
ce_to_sram_output<=1'b0;
oe_to_sram_output<=1'b1;
we_to_sram_output<=1'b0;
busy_signal_output<=1'b1;
writing_finished_signal_output<=1'b0;
register_for_splitting<=1'b1;
state_reg <= idle;
end
endcase
end
assign data_s2f = register_for_reading_data;
assign data_from_to_sram_input_output = (register_for_splitting) ? register_for_writing_data : 8'bz;
endmodule
Some redundancy can be noticed on
if(~start_operation)
state_reg <= idle;
else begin
and
if(rw) begin
address_to_sram_output[18:0]<=address_input[18:0];
...
end else begin
address_to_sram_output[18:0]<=address_input[18:0];
...
But it is not really necessary to modify those lines because the resource utilization does not vary when that redundancy is removed and a clearer code is always better
The resource utilization for Controller v.3 if shown in Figure 2.
The controller (v.3) presented here is a 3-state FSM-based controller. It imposes much tighter timing constraints for both read and write operations than the others presented.
However, if we take into consideration the frequency of the clock, there's still a lot of margin. To reduce this margin, a much higher frequency is needed, and it is also recommended to make an estimation of propagation delay to make a fine tuning; pad delay, slew rate, driver strength, etc, are important variables to analyze.
You can find the files associated with this project here:
https://github.com/salcanmor/SRAM-tester-for-Cmod-A7-35T/tree/master/basic%20controller%20v2
Comments