VHDL average of Array through for loop - arrays

I have an Array of X Integer values in VHDL declared as a variable inside a process.
I would like to calculate the average of all Values in a for loop.
If I write it out for 3 Values manually everything works fine (tested on hardware):
entity MyEntity is
Port(
Enable : IN STD_LOGIC ;
CLK : IN STD_LOGIC;
SpeedOut : OUT INTEGER
);
end MyEntity;
Average : process
type SampleArray is Array (2 downto 0) of INTEGER;
variable SpeedSamples : SampleArray;
begin
wait until rising_edge(CLK);
if ENABLE = '1' then
SpeedOut <= ( SpeedSamples(0)+ SpeedSamples(1)+SpeedSamples(2) ) / 3;
end if;
end process Average;
If i use a for loop to do the same SpeedOut is constant 0:
entity MyEntity is
Port(
Enable : IN STD_LOGIC ;
CLK : IN STD_LOGIC;
SpeedOut : Out INTEGER
);
end MyEntity;
Average : process
type SampleArray is Array (2 downto 0) of INTEGER;
variable SpeedSamples : SampleArray;
variable tempVar : Integer;
begin
wait until rising_edge(CLK);
if ENABLE = '1' then
for i in 0 to 2 loop
tempVar := tempVar + SpeedSamples(i);
end loop;
SpeedOut <= tempVar / 3;
end if;
end process Average;
I am aware this will need a lot of resources if the Array is bigger but i think there is something fundamentally wrong with my code.
Is there a proven method of calculating a moving average in VHDL?

It's not that efficient to add up a large number of samples each clock period like that; an adder with n inputs will consume a lot of logic resource as n starts to increase.
My suggestion is to implement a memory buffer for the samples, which will have as many locations as you want samples in your rolling average. This will have one new sample written to it each clock cycle; you will also add this same sample to your total on the following clock edge.
Using dual-port memory, you can simultaneously read out the 'oldest' sample in the memory from the same location (provided you have the memory in read-before-write mode). Subtract this from your total, then perform the divide. I expect by far the most efficient divisor will be a power of two, so that your divide does not consume any logic resource. Other types of divider use relatively lots of logic.
So the design would boil down to a memory buffer, a 3-input adder, a counter for use as a pointer to the sample buffer, and a wire-shift divider. If performance was an issue, you could pipeline the add/subtract phases so that you only ever needed 2-input adders.
As for the actual coding question about creating a multi-input adder using a loop, on top of suggestions made in the comments, I would say it's really up to your synthesis tool as to whether it would be able to identify this as a multi-input adder. Have you looked in the synthesis report for any messages relating to this segment of code?

Related

how to do acircular shift for an array via verilog

In C language, there is an array x[0], x[1], ..., x[127], for a given number k in [0, 127), we difine left shift operation as y[n] = x[(n+k)%128], for n=0,1,2...,127
Now I am try to implement this in FPGA, as there are so many this type operations, I like to get the result as fast as possile.
I did this as follows,
module LEFT_SHIFT(
input clk,
input rst,
input [31:0] data_in[0:127])
input [6:0] shift,
output reg [31:0] data_ou[0:127]
);
integer i;
always # (posedge clk)
begin
if (rst)
for (i=0;i<128;i++)
data_out[i] <= 32'bb0;
else
for (i=0;i<128;i++)
data_out[(i+shift)%128] = data_in[i];
end
endmodule
Is this code fine in terms speed, resource and timing? I looks like a RAM, but RAM does't output all the memory at the same time.
Many thanks,
Jerry
If you replace the Mod operator (%) with a replication of the input data to make the circular shift you could make the task easier for the compiler. I tried this on the synthesis tool from a major ASIC tool vendor and the results were quite different.
if (rst)
for (integer i=0;i<128;i++)
data_out[i] <= 32'b0;
else begin
logic [31:0] tmp [0:255];
for (integer i=0;i<128;i++) begin
// replicate input data
tmp[i] = data_in[i];
tmp[i+128] = data_in[i];
end
for (integer i=0;i<128;i++)
data_out[i] <= tmp[128-shift+i];
end
That's a huge mux that will consume a lot of logic resources in the FPGA. I've seen things like that crash the tools before. You may want to consider adding more than just one register in there.
As far as speed, resource, and timing goes, it depends on how fast you want it to run and how many free resources you have. It could be fine at low speed on a big FPGA or impossible at higher speeds or small/full FPGA. But there's no need to speculate about resource and timing, just build it and see what happens.

VHDL Comparison Operation Not Defined with Looping Counter

I've been trying to make an SRAM chip in vhdl with arbitrary amount of registers and register size using generics and I've almost gotten it to work except for the addressing part.
To make an arbitrary sized SRAM chip I started by making a unit SRAM Cell (which I tested to confirm that it works) with the following port map.
component SRAM_Cell_vhdl
port (
IN : in std_ulogic;
Select_Chip : in std_ulogic;
Write_Enable : in std_ulogic;
Out1 : out std_ulogic
);
The generic SRAM chip has the following port map:
port (
Datain : in std_logic_vector(m-1 downto 0);
address: in std_logic_vector(n-1 downto 0);
Chip_Select: in std_logic;
Output_Enable: in std_logic;
Write_Enable: in std_logic;
Out2: out std_logic_vector(m-1 downto 0)
);
The way I'm trying to do the addressing is that when it generates the SRAM it checks if the loop counter is equal to the address. If it is it will write the bit to the SRAM cell, if not it will not.
loop1: for I in 0 to n-1 generate
loop2: for J in 0 to m-1 generate
SRAM_Cell_vhdl1 : SRAM_Cell_vhdl port map
(Datain(J), Chip_Select and (I = to_integer(unsigned(address))), Write_Enable and Chip_Select, intermediate_out(I, J));
end generate loop2;
end generate loop1;
However, I am getting an error at I = to_integer(unsigned(address))) telling me that it can't determine the definition of the operation "=". I thought that a loop counter is an integer and the way I'm converting the address to an integer it should be doing a comparison between two integers. The other way I thought of doing this is to use an if statement comparing I and the address, but then I fear that it will not generate all of the required SRAM cells.
Is there a way to solve this problem?
The = operator returns a boolean. So, the expression
Chip_Select and (I = to_integer(unsigned(address)))
when associated with an input port of type std_ulogic requires a version of the and operator with an input of type std_ulogic, an input of type boolean and a return value of type std_ulogic. (This list of types is called its signature). No such version of the and operator exists.
There is a version of theand operator that with two inputs of type std_ulogic and a return value of type std_ulogic. So, in order to use that, your compiler is trying to find a version of the = operator that returns a std_ulogic. No such version exists. Hence your error.
Solving this problem is not straight forward, because you'll need an array of chip select signals. So, you'll need something like this (as there's no MCVE, I haven't tested it):
loop1: for I in 0 to n-1 generate
loop2: for J in 0 to m-1 generate
if Chip_Select = '1' and (I = to_integer(unsigned(address))) then
CS(I)(J) <= '1';
else
CS(I)(J) <= '0';
end if;
SRAM_Cell_vhdl1 : SRAM_Cell_vhdl port map (Datain(J), CS(I)(J), Write_Enable and Chip_Select, intermediate_out(I, J));
end generate loop2;
end generate loop1;
where CS is some kind of array of std_ulogic.
Firstly, your code is not MCVE. It would have been more helpful if it was. I guess the below alternative should work.
loop1: for I in 0 to n-1 generate
loop2: for J in 0 to m-1 generate
signal2 <= Write_Enable and Chip_Select;
check1: if (I = to_integer(unsigned(address))) generate
signal1 <= Chip_Select and std_ulogic(unsigned(address));
end generate check1;
SRAM_Cell_vhdl1 : SRAM_Cell_vhdl port map (Datain(J), signal1 , signal2 , intermediate_out(I, J));
end generate loop2;
end generate loop1;
Also, I would prefer using named association rather than positional association in port map

VHDL - loop failure/'empty' cycle issue

I'm not so great with VHDL and I can't really see why my code won't work. I needed an NCO, found a working program and re-worked it to fit my needs, but just noticed a bug: every full cycle there is one blank cycle.
The program takes step for argument (jump between next samples) and clock as trigger.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL; --try to use this library as much as possible.
entity sinwave_new_01 is
port (clk :in std_logic;
step :in integer range 0 to 1000;
dataout : out integer range 0 to 1024
);
end sinwave_new_01;
architecture Behavioral of sinwave_new_01 is
signal i : integer range 0 to 1999:=0;
type memory_type is array (0 to 999) of integer range 0 to 1024;
--ROM for storing the sine values generated by MATLAB.
signal sine : memory_type :=(long and boring array of 1000 samples here);
begin
process(clk)
begin
--to check the rising edge of the clock signal
if(rising_edge(clk)) then
dataout <= sine(i);
i <= i+ step;
if(i > 999) then
i <= i-1000;
end if;
end if;
end process;
end Behavioral;
What do I do to get rid of that zero? It appears every full cycle - every (1000/step) pulses. It's not supposed to be there and it messes up my PWM...
From what I understand the whole block (dataout changes, it is increased, and if i>999 then i<=i-1000) executes when there is a positive edge of clock applied on the entrance...
BUT it looks like it requires one additional edge to, I don't know, reload it? Does the code execute sequentially, or are all conditions tested when the clock arrives? Am I reaching outside the table, and that's why I'm getting zeroes in that particular pulse? Program /shouldn't/ do that, as far as I understand if statement, or is it VHDL being VHDL and doing its weird stuff again.
How do I fix this bug? Guess I could add one extra clock tick every 1k/step pulses, but that's a work around and not a real solution. Thanks in advance for help.
It looks like your problem is that your variable 'i' exceeds 999 before you reset it. Remember, you're in a sequential process. 'i' doesn't get the assigned value until the next clock tick AFTER you assign it.
I think if you change this code
i <= i + step;
if (i > 999) then
i <= i-1000;
to
if ((i + step) > 999) then
i <= (i + step) - 1000;
else
i <= i + step;
you should get the behavior you're looking for.
One more thing...
Does the declaration of sine (sample array) actually creates combinatory circuit (bad) or allocates those samples in ROM memory ('good')?

Weird signal behaviour (clock-dependant signal changing with no clock present)

Im working on a NCO (still) and I got problems with adress select block - my teacher wants the samples in ROM block (done that already) but the adressing thingie doesnt seem to work. What I need is a modulo 200 accumulator with variable step... I adopted this code from a sample where somebody used i as counter to pick a value from an array of samples, BUT I need to simply copy i to the output port.
Something with PWM wasnt working, it skipped not ten but ~80 samples, so I decided to check the adressing - Ive been mighty surprised when I noticed that adress changes INDEPENDENTLY from the clock signal. ( http://i.imgur.com/XL9l8mj.jpg )
Heres the code:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL; --try to use this library as much as possible.
entity adress_select_200 is
port (clk :in std_logic;
step :in integer range 0 to 200;
adress : out integer range 0 to 199
);
end adress_select_200;
architecture Behavioral of adress_select_200 is
signal i : integer range 0 to 399:=0;
begin
process(clk)
begin
--to check the rising edge of the clock signal
if(rising_edge(clk)) then
adress <= i;
i <= i+step;
if ((i + step) > 199) then
i <= (i + step) - 200;
else
i <= i + step;
end if;
end if;
end process;
end Behavioral;
Im not so great with VHDL, but I suppose the whole loop should ONLY execute on clk rising edge, right? Meanwhile its doing that weird sh... in the middle of the cycle, no idea why.
How do I stop that from happening?

VHDL: Add list of numbers using loop

To start off, I have a very limited knowledge of C, just basic functions. I have been set a task in VHDL of which i have no experience.
The task is to write a program in VHDL that will use a loop to add a list of 10 numbers (13,8,6,5,19,21,7,1,12,3).
I was thinking of a way of doing this even in C to see if i could somewhat mimic the method. so far i have only came up with
int start = 0;
int add = start;
int increment = 5;
for (int i=0; i<10; i++) {
add = add + increment;
}
now i know that is VERY basic but it's the best i can do. that loop will only increment it by 5 as apposed to the list that i have.
Any help is very appreciated and it's my first question so apologies i if i am breaking any 'unwritten laws'
You mention that this is a part of a study on parwan processors, So the way to think about it depends a lot on how you are studying them.
If you are building up an implementation of the processor than just learning the syntax for logical operations is the important part, and you should focus on the types
unsigned range 0 to 255 and signed range -128 to 127. By making use of the package ieee.numeric_std.all you get the addition operation defined for those types.
If however the processor is already defined for you take a good look at the processor interfaces. The code you will write for this will be much more of an explicit state machine.
Either way I find the best way to start is to write a test bench. This is the part that will feed in the list of inputs, because ultimately you wont want it to be a for (int i=0; i<10; i++), but rather a while(1) style of processing.
That's all theory stuff, so here's some pseudo code for a simple accumulator process:
signal acc : unsigned range 0 to 255 := 0; --accumulator register
signal b : unsigned range 0 to 255 := 5; --value to be added
--each cycle you would change b
accumulator :process (clk)
begin
if rising_edge(clk)
acc <= acc + b;
end if;
end process;
or maybe better yet take a look here: Accumulator
The solution below could help you get started with your problem in VHDL:
For the implementation in a FPGA, better solutions could be figured out. So, just consider it as a start...
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add is
port (
clk : in std_logic;
rst : in std_logic;
add : in std_logic;
sum : out std_logic_vector(31 downto 0));
end entity add;
architecture RTL of add is
constant rom_size : integer := 10;
type t_rom is array (0 to rom_size-1) of unsigned(31 downto 0);
constant rom : t_rom := (
to_unsigned(13, sum'length),
to_unsigned(8, sum'length),
to_unsigned(6, sum'length),
to_unsigned(5, sum'length),
to_unsigned(19, sum'length),
to_unsigned(21, sum'length),
to_unsigned(7, sum'length),
to_unsigned(1, sum'length),
to_unsigned(12, sum'length),
to_unsigned(3, sum'length));
signal add_d : std_logic;
signal index : integer range 0 to rom_size;
signal sum_i : unsigned(sum'range);
begin
p_add : process (clk) is
begin
if rising_edge(clk) then -- rising clock edge
if rst = '1' then -- synchronous reset (active high)
sum_i <= (others => '0');
add_d <= '0';
index <= 0;
else
add_d <= add; -- rising edge detection
if add_d = '0' and add = '1' then -- rising_edge -> add next item to sum
sum_i <= sum_i + rom(index);
index <= index + 1;
end if;
end if;
end if;
end process p_add;
-- output
sum <= std_logic_vector(sum_i);
end architecture RTL;
First, I'll point out there's no need to add the complexity of std_logic_vectors or vector arithmetic with signed and unsigned. This works fine with simple integers:
So, you have some numbers coming in and a sum going out:
entity summer
port (
inputs : integer_vector := (13,8,6,5,19,21,7,1,12,3);
sum_out : integer);
end entity summer;
Note, I've initialise the inputs port with your values - normally you'd write to that port in your testbench.
Now to add them up, you need a process:
process(inputs)
variable sum : integer;
begin
sum := 0;
for i in inputs'range loop
sum := sum + inputs(i);
end for;
sum_out <= sum;
end process;
That's a simplistic solution - to create a "best" solution you need a more detailed specification. For example: how often will the inputs change? How soon do you need the answer after the inputs change? Is there a clock?

Resources