Converting a fixed point Matlab code to Verilog

Converting a fixed point Matlab code to Verilog - arrays

I have a fixed point Matlab code and it needs to be converted to Verilog. Below is the Matlab code. yfftshift is 5000x0 and y2shape 100x50.
rows=100;
colms=50;
r=1;
for m=0:colms-1
for n=0:rows-1
y2shape(n+1,m+1)=yfftshift(r,1);
r=r+1;
end
end
How can I create memories in Verilog and call them inside the for loop?

The easiest way to handle fixed precision in Verilog is to introduce a scale factor and allocate sufficiently large registers to hold the maximum value. For example, if you know that the maximum value of your numbers will be 40, and three digits of precision to the right of the decimal place are OK, a scaling factor of 1000 can be used with 16-bit registers. Verilog treats unsigned numbers, so if values can be negative, it's necessary to add "signed" to the declarations. The Verilog could be:
`define NUMBER_ROWS 100
`define NUMBER_COLS 50
`define MAX_ROW (`NUMBER_ROWS-1)
`define MAX_COL (`NUMBER_COLS-1)
module moveMemory();
reg clk;
reg [15:0] y2shape [`MAX_ROW:0][`MAX_COL:0];
reg [15:0] yfftshift [`NUMBER_ROWS * `NUMBER_COLS:0];
integer rowNumber, colNumber;
always #(posedge clk)
begin
for (rowNumber = 0; rowNumber < `NUMBER_ROWS; rowNumber = rowNumber + 1)
for (colNumber = 0; colNumber < `NUMBER_COLS; colNumber = colNumber + 1)
y2shape[rowNumber][colNumber] <= yfftshift[rowNumber * `NUMBER_COLS + colNumber];
end
endmodule
This is OK for an FPGA or simulation project, but for full custom work, an SRAM macro would be used to avoid the die area associated with 16,000 registers. For an FPGA implementation, you've probably already paid for the 16K registers, or you may be able to do some extra work get the synthesizer to map the registers to an on-chip SRAM.
The test bench:
// Testing code
integer loadCount, rowShowNumber, colShowNumber;
initial
begin
// Initialize array with some data
for (loadCount=0; loadCount < (NUMBER_ROWS *NUMBER_COLS); loadCount = loadCount + 1)
yfftshift[loadCount] <= loadCount;
clk <= 0;
// Clock the block
#1
clk <= 1;
// Display the results
#1
$display("Y2SHAPE has these values at time ", $time);
for (rowShowNumber = 0; rowShowNumber < `NUMBER_ROWS; rowShowNumber = rowShowNumber + 1)
for (colShowNumber = 0; colShowNumber < `NUMBER_COLS; colShowNumber = colShowNumber + 1)
$display("y2shape[%0d][%0d] is %d ", rowShowNumber, colShowNumber, y2shape[rowShowNumber][colShowNumber]);
end
The simulation results for NUMBER_ROWS=10, NUMBER_COLS=5
Y2SHAPE has these values at time 2
y2shape[0][0] is 0
y2shape[0][1] is 1
y2shape[0][2] is 2
y2shape[0][3] is 3
y2shape[0][4] is 4
.
.
.
y2shape[9][2] is 47
y2shape[9][3] is 48
y2shape[9][4] is 49

Related

reading multiple block ram indexes in one write clock cycle

I have an application where I'm continuously writing to a block ram at a slow clock speed (clk_a) and within this slow clock cycle need to read three indexes from the block ram at a fast clock speed (clk_b) to use these values as operands in a math module, the result being written back to the block ram on the next slow clock. These three indexes are the current address written to at posedge of the slow clock, plus the two immediate neighbouring addresses (addr_a -1 and addr_a +1).
What is an efficient way to synthesize this? My best attempt to date uses a small counter (triplet) running at fast clock rate that increments the addresses but I end up running out of logic as it looks like Yosys does not infer the ram properly. What is a good strategy for this?
here is what I have:
module myRam2 (
input clk_a,
input clk_b,
input we_a,
input re_a,
input [10:0] addr_a,
input [10:0] addr_b,
input [11:0] din_a,
output [11:0] leftNeighbor,
output [11:0] currentX,
output [11:0] rightNeighbor
);
parameter MEM_INIT_FILE2 = "";
initial
if (MEM_INIT_FILE2 != "")
$readmemh(MEM_INIT_FILE2, ram2);
reg [11:0] ram2 [0:2047];
reg [1:0] triplet = 3;
reg [10:0] old_addr_a;
reg [11:0] temp;
always #(posedge clk_a) begin
ram2[addr_a] <= din_a;
end
always#(posedge clk_b)
if (old_addr_a != addr_a) begin
triplet <= 0;
old_addr_a <= addr_a;
end
else
if(triplet < 3) begin
triplet <= triplet +1;
end
always #(posedge clk_b) begin
temp <= ram2[addr_a + (triplet - 1)];
end
always #(posedge clk_b) begin
case(triplet)
0: leftN <= temp;
1: X <= temp;
2: rightN <= temp;
endcase
end
reg signed [11:0] leftN;
reg signed [11:0] X;
reg signed [11:0] rightN;
assign leftNeighbor = leftN;
assign currentX = X;
assign rightNeighbor = rightN;
endmodule

Regarding the efficiency the following approach should work and removes the need for a faster clock:
module myRam2 (
input wire clk,
input wire we,
input wire re,
input wire [10:0] addr_a,
input wire [10:0] addr_b,
input wire [11:0] din_a,
output reg [11:0] leftNeighbor,
output reg [11:0] currentX,
output reg [11:0] rightNeighbor
);
reg [11:0] ram2 [2047:0];/* synthesis syn_ramstyle = "no_rw_check" */;
always #(posedge clk) begin
if(we) ram2[addr_a] <= din_a;
if(re) {leftNeighbor,currentX,rightNeighbor} <= {ram2[addr_b-1],ram2[addr_b],ram2[addr_b+1]};
end
endmodule
The synthesis keyword helped me in the past to increase the likelyhood of correctly inferred ram.
EDIT: removed second example suggesting a 1D mapping. It turned out that at least Lattice LSE cannot deal with that approach. However the first code snipped should work according to Active-HDL and Lattice LSE.

Verilog vector inner product

I am trying to implement a synthesizable verilog module, which produces a vector product of 2 vector/arrays, each containing eight 16-bit unsigned integers. Design Compiler reported error that symbol i must be a constant or parameter. I don't know how to fix it. Here's my code.
module VecMul16bit (a, b, c, clk, rst);
// Two vector inner product, each has 8 elements
// Each element is 16 bits
// So the Output should be at least 2^32*2^3 = 2^35 in order to
// prevent overflow
// Output is 35 bits
input clk;
input rst;
input [127:0] a,b;
output [35:0] c;
reg [15:0] a_cp [0:7];
reg [15:0] b_cp [0:7];
reg [35:0] c_reg;
reg k,c_done;
integer i;
always # (a)
begin
for (i=0; i<=7; i=i+1) begin
a_cp[i] = a[i*15:i*15+15];
end
end
always # (b)
begin
for (i=0; i<=7; i=i+1) begin
b_cp[i] = b[i*15:i*15+15];
end
end
assign c = c_reg;
always #(posedge clk or posedge rst)
begin
if (rst) begin
c_reg <= 0;
k <= 0;
c_done <= 0;
end else begin
c_reg <= c_done ? c_reg : (c_reg + a_cp[k]*b_cp[k]);
k <= c_done ? k : k + 1;
c_done <= c_done ? 1 : (k == 7);
end
end
endmodule
As you can see, I'm trying to copy a to a_cp through a loop, is this the right way to do it?
If yes, how should I defined it i and can a constant be used as a stepper in for loop?

A part select in verilog must have constant bounds. So this is not allowed:
a_cp[i] = a[i*15:i*15+15];
Verilog-2001 introduced a new indexed part select syntax where you specify the starting position and the width of the selected group of bits. So, you need to replace the above line by:
a_cp[i] = a[i*15+:16];
This takes a 16-bit width slice of a starting at bit i*15 counting rightwards. You can use -: instead of +:, in which case you count leftwards.
Be careful: it is very easy to type :+ instead of +: and :+ is valid syntax and so might not be spotted by your compiler (but could still be a bug). In fact I did exactly that when typing this EDA Playground example, though my typo was caught by the compiler in this case.

Actually, what you need for your code to be synthesizable is using genvar as the type of i. Kind of like this (using macros, put it above your module):
`define PACK_ARRAY_2D2(PK_WIDTH,PK_LEN,PK_DIMS,PK_SRC,PK_DEST,PK_OFFS) ({\
genvar pk_idx; genvar pk_dims; \
generate \
for (pk_idx=0; pk_idx<(PK_LEN); pk_idx=pk_idx+1) \
begin \
for (pk_dims=0; pk_dims<(PK_DIMS); pk_dims=pk_dims+1) \
begin \
assign PK_DEST[(((PK_WIDTH)*(pk_idx+pk_dims+1))-1+((PK_WIDTH)*PK_OFFS*pk_idx)):(((PK_WIDTH)*(pk_idx+pk_dims))+((PK_WIDTH)*PK_OFFS*pk_idx))] = PK_SRC[pk_idx][pk_dims][((PK_WIDTH)-1):0];\
end\
end\
endgenerate\
})
`define UNPACK_ARRAY_2D2(PK_WIDTH,PK_LEN,PK_DIMS,PK_DEST,PK_SRC,PK_OFFS) ({\
genvar unpk_idx; genvar unpk_dims; \
generate \
for (unpk_idx=0; unpk_idx<(PK_LEN); unpk_idx=unpk_idx+1) \
begin \
for (unpk_dims=0; unpk_dims<(PK_DIMS); unpk_dims=unpk_dims+1)\
begin \
assign PK_DEST[unpk_idx][unpk_dims][((PK_WIDTH)-1):0] = PK_SRC[(((PK_WIDTH)*(unpk_idx+unpk_dims+1))-1+((PK_WIDTH)*PK_OFFS*unpk_idx)):(((PK_WIDTH)*(unpk_idx+unpk_dims))+((PK_WIDTH)*PK_OFFS*unpk_idx))];\
end end endgenerate\
})
and here is how to use it (just put in inside pack_unpack.v) as an example of function to transpose matrix :
// Macros for Matrix
`include "pack_unpack.v"
module matrix_weight_transpose(
input signed [9*5*32-1:0] weight, // 5 columns, 9 rows, 32 bit data length
output signed [9*5*32-1:0] weight_transposed // 9 columns, 5 rows, 32 bit data length
);
wire [31:0] weight_in [8:0][4:0];
`UNPACK_ARRAY_2D2(32,9,5,weight_in,weight,4)
wire [31:0] weight_out [4:0][8:0];
`PACK_ARRAY_2D2(32,5,9,weight_out,weight_transposed,8)
generate // Computing the transpose
genvar i;
for (i = 0; i < 9; i = i + 1)
begin : columns
genvar j;
for (j = 0; j < 5; j = j + 1)
begin : rows
assign weight_out[j][i] = weight_in[i][j];
end
end
endgenerate
endmodule

calculation in C for C2000 device

I'm having some trouble with my C code. I have an ADC which will be used to determine whether to shut down (trip zone) the PWM which I'm using. But my calculation seem to not work as intended, because the ADC shuts down the PWM at the wrong voltage levels. I initiate my variables as:
float32 current = 5;
Uint16 shutdown = 0;
and then I calculate as:
// Save the ADC input to variable
adc_info->adc_result0 = AdcRegs.ADCRESULT0>>4; //bit shift 4 steps because adcresult0 is effectively a 12-bit register not 16-bit, ADCRESULT0 defined as Uint16
current = -3.462*((adc_info->adc_result0/1365) - 2.8);
// Evaluate if too high or too low
if(current > 9 || current < 1)
{
shutdown = 1;
}
else
{
shutdown = 0;
}
after which I use this if statement:
if(shutdown == 1)
{
EALLOW; // EALLOW protected register
EPwm1Regs.TZFRC.bit.OST = 1; // Force a one-shot trip-zone interrupt
EDIS; // end write to EALLOW protected register
}
So I want to trip the PWM if current is above 9 or below 1 which should coincide with an adc result of <273 (0x111) and >3428 (0xD64) respectively. The ADC values correspond to the voltages 0.2V and 2.51V respectively. The ADC measure with a 12-bit accuracy between the voltages 0 and 3V.
However, this is not the case. Instead, the trip zone goes off at approximately 1V and 2.97V. So what am I doing wrong?

adc_info->adc_result0/1365
Did you do integer division here while assuming float?
Try this fix:
adc_info->adc_result0/1365.0
Also, the #pmg's suggestion is good. Why spending cycles on calculating the voltage, when you can compare the ADC value immediately against the known bounds?
if (adc_info->adc_result0 < 273 || adc_info->adc_result0 > 3428)
{
shutdown = 1;
}
else
{
shutdown = 0;
}
If you don't want to hardcode the calculated bounds (which is totally understandable), then define them as calculated from values which you'd want to hardcode literally:
#define VOLTAGE_MIN 0.2
#define VOLTAGE_MAX 2.51
#define AREF 3.0
#define ADC_PER_VOLT (4096 / AREF)
#define ADC_MIN (VOLTAGE_MIN * ADC_PER_VOLT) /* 273 */
#define ADC_MAX (VOLTAGE_MAX * ADC_PER_VOLT) /* 3427 */
/* ... */
shutdown = (adcresult < ADC_MIN || adcresult > ADC_MAX) ? 1 : 0;
/* ... */
When you've made sure to grasp the integer division rules in C, consider a little addition to your code style: to always write constant coefficients and divisors with a decimal (to make sure they get a floating point type), e.g. 10.0 instead of 10 — unless you specifically mean integer division with truncation. Sometimes it may also be a good idea to specify float literal precision with appropriate suffix.

Counter based memory array in Verilog

I got a problem.
I am a Verilog newbie, and I have to write a counter based memory array. Basically, my array is 16 x 8 bits (16 x 1 byte). I have 8 bit data coming into my memory, 16 times. So I made a memory block, and a counter which supplies addresses to this memory block by incrementing with the positive edge of the clock (actually, on each positive edge, 8 bit data is fed into the memory, so I increment my counter). Now I have done this process 16 times, and now 128 bits of data are stored and provided by my memory block. But now I want to reset my counter, and repeat the whole process again after a small delay. I am confused as to how I do this. Please have a look at my code and advise.
Thanking all of you in advance.
// creation of counter & a dummy variable
wire cnt;
wire cnt_next;
reg [3:0] counter;
always #(posedge clock)
assign cnt_next=cnt+1'b1;
counter <= cnt_next
wire [3:0] write_address = counter;
//creation of ram function
module single_port_ram
(
input [7:0] data,
input [3:0] addr,
input wr, clk, rd
output [127:0] q
);
reg [15:0] ram[0:7];
always # (posedge clk or posedge reset)
begin
// Code for writing the data
if (wr)
{
addr <= write_address
case {addr}
4'b0000: ram[0] <= data
4'b0001: ram[1] <= data
4'b0010: ram[2] <= data
4'b0011: ram[3] <= data
4'b0100: ram[4] <= data
4'b0101: ram[5] <= data
4'b0110: ram[6] <= data
4'b0111: ram[7] <= data
4'b1000: ram[8] <= data
4'b1001: ram[8] <= data
4'b1010: ram[10] <= data
4'b1011: ram[11] <= data
4'b1100: ram[12] <= data
4'b1101: ram[13] <= data
4'b1110: ram[14] <= data
4'b1111: ram[15] <= data
end
always # (posedge clk or posedge reset)
begin
//Code for reading the data
if (rd)
{
q <= {ram[15],ram[14],ram[13],ram[12],ram[11],ram[10],ram[9],ram[8],ram[7],ram[6],ram[5],ram[4],ram [3],ram[2],ram[1],ram[0]}
}

Make your counter a little bigger, say up to 31 instead of 15. For the first 16 clocks, write the memory address like you are doing, for the next 16 clocks just do nothing but increment (the small delay you wanted), and then when the counter reaches 31 just reset it back to 0. Then the process should restart.

to have 16 x 8 bits (16 x 1 byte)
first you declare the size of your RAM width then you declare the actual depth of your RAM for example if i want an array of 512 reg variables (depth) and each is 8 bits (width) then it will be written like:
reg [7:0] ram[511:0];
so your code should look like:
reg [7:0] ram[15:0];

VHDL: Add list of numbers using loop

To start off, I have a very limited knowledge of C, just basic functions. I have been set a task in VHDL of which i have no experience.
The task is to write a program in VHDL that will use a loop to add a list of 10 numbers (13,8,6,5,19,21,7,1,12,3).
I was thinking of a way of doing this even in C to see if i could somewhat mimic the method. so far i have only came up with
int start = 0;
int add = start;
int increment = 5;
for (int i=0; i<10; i++) {
add = add + increment;
}
now i know that is VERY basic but it's the best i can do. that loop will only increment it by 5 as apposed to the list that i have.
Any help is very appreciated and it's my first question so apologies i if i am breaking any 'unwritten laws'

You mention that this is a part of a study on parwan processors, So the way to think about it depends a lot on how you are studying them.
If you are building up an implementation of the processor than just learning the syntax for logical operations is the important part, and you should focus on the types
unsigned range 0 to 255 and signed range -128 to 127. By making use of the package ieee.numeric_std.all you get the addition operation defined for those types.
If however the processor is already defined for you take a good look at the processor interfaces. The code you will write for this will be much more of an explicit state machine.
Either way I find the best way to start is to write a test bench. This is the part that will feed in the list of inputs, because ultimately you wont want it to be a for (int i=0; i<10; i++), but rather a while(1) style of processing.
That's all theory stuff, so here's some pseudo code for a simple accumulator process:
signal acc : unsigned range 0 to 255 := 0; --accumulator register
signal b : unsigned range 0 to 255 := 5; --value to be added
--each cycle you would change b
accumulator :process (clk)
begin
if rising_edge(clk)
acc <= acc + b;
end if;
end process;
or maybe better yet take a look here: Accumulator

The solution below could help you get started with your problem in VHDL:
For the implementation in a FPGA, better solutions could be figured out. So, just consider it as a start...
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add is
port (
clk : in std_logic;
rst : in std_logic;
add : in std_logic;
sum : out std_logic_vector(31 downto 0));
end entity add;
architecture RTL of add is
constant rom_size : integer := 10;
type t_rom is array (0 to rom_size-1) of unsigned(31 downto 0);
constant rom : t_rom := (
to_unsigned(13, sum'length),
to_unsigned(8, sum'length),
to_unsigned(6, sum'length),
to_unsigned(5, sum'length),
to_unsigned(19, sum'length),
to_unsigned(21, sum'length),
to_unsigned(7, sum'length),
to_unsigned(1, sum'length),
to_unsigned(12, sum'length),
to_unsigned(3, sum'length));
signal add_d : std_logic;
signal index : integer range 0 to rom_size;
signal sum_i : unsigned(sum'range);
begin
p_add : process (clk) is
begin
if rising_edge(clk) then -- rising clock edge
if rst = '1' then -- synchronous reset (active high)
sum_i <= (others => '0');
add_d <= '0';
index <= 0;
else
add_d <= add; -- rising edge detection
if add_d = '0' and add = '1' then -- rising_edge -> add next item to sum
sum_i <= sum_i + rom(index);
index <= index + 1;
end if;
end if;
end if;
end process p_add;
-- output
sum <= std_logic_vector(sum_i);
end architecture RTL;

First, I'll point out there's no need to add the complexity of std_logic_vectors or vector arithmetic with signed and unsigned. This works fine with simple integers:
So, you have some numbers coming in and a sum going out:
entity summer
port (
inputs : integer_vector := (13,8,6,5,19,21,7,1,12,3);
sum_out : integer);
end entity summer;
Note, I've initialise the inputs port with your values - normally you'd write to that port in your testbench.
Now to add them up, you need a process:
process(inputs)
variable sum : integer;
begin
sum := 0;
for i in inputs'range loop
sum := sum + inputs(i);
end for;
sum_out <= sum;
end process;
That's a simplistic solution - to create a "best" solution you need a more detailed specification. For example: how often will the inputs change? How soon do you need the answer after the inputs change? Is there a clock?