VHDL Generate Array Of STD_LOGIC_VECTORS with Reducing Length - arrays

I am trying to create an array of std_logic_vectors with reducing lengths. I have tried making an array with a generic std_logic_vector and then using a generate statement to make the vectors.
architecture behavioral of dadda_mul_32bit is
type and_planes is array (0 to 31) of std_logic_vector;
begin
generate_and_plane:
for i in 0 to 31 generate
and_planes(i) <= and_vector(a, b(i), i);
end generate generate_and_plane;
end behavioral;
along with a function that returns a generic std_logic_vector:
function and_vector(vec: std_logic_vector; x: std_logic; length: natural) return std_logic_vector is
variable result: std_logic_vector(length - 1 downto 0);
begin
for i in 0 to length - 1 loop
result(i) := vec(i) and x;
end loop;
return result;
end function;
Am I using the generate statement incorrectly?

and_planes is a type not a signal, so you can't assign to it!
More over you are creating a partially constrained type, which needs to be constrained in a object (e.g. signal) declaration.
VHDL doesn't support ragged arrays. (Arrays wherein each element is of different size). If you need this for simulation, you can use access types and emulate ragged arrays like in C. If you need it for synthesis, then you can emulate a ragged array with a one-dimensional array and some functions to compute the bounds of a nested vector.
See this answer of me:
VHDL Multidimensional arrays with different internal size
vhdl port declaration with different sizes
Btw. VHDL-2008 adds an overload: "and"(std_logic, std_logic_vector), so no function is needed to calculate the anding of a single bit with each bit in a vector.
-- slice 'a' and gate it by 'b(i)'
and_planes(i) <= a(i downto 0) and b(i);

Related

Unsigned addition in VHDL resulting in incorrect length unsigned result

update
#user1155120's comment below is correct:
This is telling you the error is somewhere in the realm of -- other assignments here
I had multiplication operations which I mistakenly believed functioned in the same manner as addition. My mistake.
I am working on a rudimentary ALU using VHDL.
Here is the code which is throwing an error:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use ieee.std_logic_misc.all;
entity alu is
port (
A, B : in unsigned(31 downto 0);
sel : in unsigned(2 downto 0);
O : out unsigned(63 downto 0));
end entity alu;
architecture Behavioral of alu is
begin
O <= resize(A, 64) + resize(B, 64) when sel = "000"
-- other assignments here
end Behavioral;
My understanding of Unsigned addition in VHDL is that the length of the sum will be equal to the longest length of the operands. However, my code gives the following error:
ERROR: Array sizes do not match, left array has 64 elements, right array has 128 elements
Strange. However, if I change the resize values to be less than 64 bits, then the behavior follows my expectation (width=max width of operands). Like this:
O <= resize(A, 33) + resize(B, 33) when sel = "000"
I get the following error:
ERROR: Array sizes do not match, left array has 64 elements, right array has 33 elements
I end up being very confused. Why is the width of the output changing when I resize only to a certain value?
I am using the student license for Vivado 2020.
You probably have
O <= resize(A, 64) * resize(B, 64) when sel = whatever
Somewhere in your code on a lower line and you're messing up the line numbers when you read the error. The moment you change the addition to be wrong, that's the first line analysis fails on and you get the error you see now.

VHDL - Simultaneous addition of large 2D array. What is the syntax for this

I have reached a position in my design in which we need to massively increase parallelisation, but we have many resources to spare in the FPGA.
To that end, I have the type defined as
type LargeByteArray is array(0 to 10000) of std_logic_vector(7 downto 0);
I have two of these that I want to "byte-wise" average in as few operations as possible, as well as shift right to divide by two. So for example, avg(0) should be an 8bit standard logic vector which is a_in(0) + b_in(0) / 2. avg(1) should be a_in(1) + b_in(1) / 2 and so on. Assume for the moment we don't care that two 8 bit numbers add to a 9 bit. And I want to be able to do the entire 10000 operations in parallel.
I think I need to use an intermediate step to be able to bitshift like this, using the Signal "inter".
entity Large_adder is
Port ( a_in : LargeByteArray;
b_in : LargeByteArray;
avg_out : LargeByteArray);
architecture arch of Large_adder is
SIGNAL inter : LargeByteArray;
begin
My Current code looks a bit like this;
inter(0) <= std_logic_vector((unsigned(a_in(0)) + unsigned(b_in(0))));
inter(1) <= std_logic_vector((unsigned(a_in(1)) + unsigned(b_in(1))));
10000 lines later...
inter(10000) <= std_logic_vector((unsigned(a_in(10000)) + unsigned(b(10000))));
And a similar story for finally assigning the output with the bit shift
avg_out(0) <= '0' & inter(0)(7 downto 1);
avg_out(1) <= '0' & inter(1)(7 downto 1);
All the way down to 10000.
Surely there is a more space efficient way to specify this.
I have tried
inter <= std_logic_vector((unsigned(a_in) + unsigned(b)));
but I get an error about found '0' matching definitions for <= operator.
Now obviously the number could be decreased from 10000 in case this question looks stupid in what I'm trying to achieve, but in general, how do you write these sort of operations elegantly without a line for every element of my Type?
If I had to guess I would say we can describe to the "<=" operator what to do when met with LargeByteArray types. But I do not know how to do so or where to define this behaviour.
Thanks
You have two choices. Either a for loop inside a process:
process (a_in, b_in)
begin
for I in 0 to 10000 loop
inter(I) <= std_logic_vector((unsigned(a_in(I)) + unsigned(b_in(I))));
end loop;
end process;
process (inter)
begin
for I in 0 to 10000 loop
c_out(I) <= '0' & inter(I)(7 downto 1);
end loop;
end process;
or a generate loop outside a process:
G1: for I in 0 to 10000 generate
inter(I) <= std_logic_vector((unsigned(a_in(I)) + unsigned(b_in(I))));
end generate;
G2: for I in 0 to 10000 generate
c_out(I) <= '0' & inter(I)(7 downto 1);
end generate;
https://www.edaplayground.com/x/3hJV
The simulator executes the lines inside the for loop (inside the process) sequentially because simulators always execute lines inside a process sequentially (but concurrently will other processes and concurrent assignments). The simulator executes the lines inside the generate loop concurrently, because a generate loop is a language construct that is used to generate multiple concurrent things. Because of the topology of your circuit (everything is parallel), both methods will behave the same in simulation and in synthesis.
Use a regular process:
process(a_in, b_in)
variable tmp: unsigned(8 downto 0);
begin
for i in a_in'range loop
tmp := unsigned('0' & a_in(i)) + unsigned('0' & b_in(i));
avg_out(i) <= std_logic_vector(tmp(8 downto 1));
end loop;
end process;
It looks sequential but it is not, for reasons about the VHDL semantics that would be too long to explain here. Your synthesizer will do want you want.
And, by the way, the sum of two 8-bits unsigned numbers is a 9-bits unsigned number (reason why variable tmp is declared as unsigned(8 downto 0)). And dividing by two simply consists in shifting to the right (if the Least Significant Bit is the rightmost, which is usually the case) by one position. So, if you want an 8-bits result, just left-extend your operands by one bit, add them and drop the LSB of the result, as proposed in the process above. If, instead, you add them without extension you will encounter overflow problems and severe inaccuracies.

VHDL - array of std_logic_vectors convert into std_logic_vector

INTENTION:
I am reading data from RAM on ZedBoard, the RAM consists of 32 bits long words so I use the following buffer
type mem_word is array (0 to 127) of std_logic_vector(31 downto 0);
signal buffer_word : mem_word;
but then, I would like to address data in a linear fashion, in an intermediary linear buffer
signal buffer_linear : std_logic_vector(4095 downto 0);
buffer_linear <= buffer_word; -- !!! PROBLEM
so I can easily address any bit in the buffer without recalculating the position in specific word (of the buffer_word).
QUESTION:
How do I get from array of std_logic_vectors into 1 long std_logic_vector ? Is there a way to avoid concatenating 128 words in a loop ? (something like above buffer_linear <= buffer_word;)
You need a function to convert from vector-vector to a 1-dimensional vector.
My following example uses the type name T_SLVV_32 to denote that it is a vector of vectors, wherin the inner vector is 32 bit long. (See my linked source file, for a true 2-dimensional STD_LOGIC matrix type called T_SLM). So T_SLVV_32 is equivalen to your mem_word type.
subtype T_SLV_32 is STD_LOGIC_VECTOR(31 downto 0);
type T_SLVV_32 is array(NATURAL range <>) of T_SLV_32;
function to_slv(slvv : T_SLVV_32) return STD_LOGIC_VECTOR is
variable slv : STD_LOGIC_VECTOR((slvv'length * 32) - 1 downto 0);
begin
for i in slvv'range loop
slv((i * 32) + 31 downto (i * 32)) := slvv(i);
end loop;
return slv;
end function;
Usage:
buffer_linear <= to_slv(buffer_word);
This function creates no logic, just wiring.
Note: Accessing all bits of a memory at once, prevents synthesis tools of inferring RAM or ROM memory blocks!
Source: PoC.vectors
See my vector package at GitHub for more examples on transforming vectors and matrices forth and backwards.

VHDL simulation stuck in for loop

I'm doing simulation testing for some VHDL I wrote and when I run it in ModelSim it gets stuck. When I hit 'break' it has an arrow pointing to the For loop in the following function:
function MOD_3 (a, b, c : UNSIGNED (1023 downto 0)) return UNSIGNED is
VARIABLE x : UNSIGNED (1023 downto 0) := TO_UNSIGNED(1, 1024);
VARIABLE y : UNSIGNED (1023 downto 0) := a;
VARIABLE b_temp : UNSIGNED (1023 downto 0) := b;
begin
for I in 0 to 1024 loop
if b_temp > 0 then
if b_temp MOD 2 = 1 then
x := (x * y) MOD c;
end if;
y := (y * y) MOD c;
b_temp := b_temp / 2;
else
exit;
end if;
end loop;
return x MOD c;
end function;
I originally had this as a while loop which I realize is not good for synthesizing. So I converted it to a for loop with the condition that b_temp is greater than 0. b_temp is a 1024-bit unsigned and so if it is the largest number that could be represented by 1024 bits and divided in half (which I do in each iteration) 1024 times, shouldn't it definitely be 0?
I have a feeling my problem lies in the large multiplications...if I comment out x := (x * y) MOD c and y := (y * y) MOD c then it exits the loop. So the only thing I can think of is it takes too long to carry out these 1024-bit multiplications? If this is the case, is there any built-in way I can optimize this to make it faster, or is my only option to implement something like Karatsuba multiplication, etc...?
I'm of the opinion implementing a Montgomery multiplier in numeric_std function calls may not improve simulation as much as you'd like (while giving synthesis eligibility).
The issue is the number of dynamically elaborated subprogram calls vs. their operand sizes vs. fitting in your CPU-running-Modelsim's L1/L2/L3 caches.
It does do wonders for targeting synthesis in an FPGA or a SIMD GPU implementation.
See Subversion Repositories BasicRSA file modmult.vhd (which has a generic size). I successfully converted this to using numeric_std[_unsigned].
If I recall correctly this appears inspired by
a Masters thesis (Efficient Hardware Architectures for Modular Multiplication) by David Narh Amanor in 2005 outlining a Java and a VHDL implementation in various word sizes.
I found the OpenCores implementation mentioned in a Stackoverflow question (Montgomery multiplication VHDL Implementation) and found the generic sized version in the SVN repository (the downloadable version is 16 bit) and the mention of the thesis in A 1024 – Bit Implementation of the Faster Montgomery Multiplier Using VHDL (by David Narh Anamor, the original link having expired). Note the quoted FPGA implementation performance under 42 usec.
Notice the length 1024 version specified by a generic would still be performing dynamically elaborated function calls with length 1024 operands (although not the "*"s, the "mod"s or the "/"s. You'd still be doing millions of function calls with dynamically elaborated (passed on an expression stack) 1024 bit parameters. We're simply changing how many millions of large parameter subroutine calls and how long they can take.
And that also brings up the possibility of an integer vector implementation (bignum equivalent) in VHDL, which would potential increase simulation performance even more (and you're likely in uncharted territory here).
A subprogram based version of the OpenCores model using variable parameters would be telling. (Whether or not you can impress anyone showing them a simulation model executing, or whether there's this looong pause interrupted by everyone taking furtive glances at the wall clock and looking bored).

$size, $bits, verilog

What is the difference between $size and $bits operator in verilog.?
if I've variables, [9:0]a,[6:0]b,[31:0]c.
c <= [($size(a)+$size(b)-1]-:$bits(b)];
What will be the output at 'c' from the above expression?
$size() gives the number of bits for a single dimension. $bits() gives the number of bits to completely represent the variable.
For example:
reg [9:0] a;
reg [9:0] b [5:0];
initial begin
$display("a Size ", $size(a));
$display("a Bits ", $bits(a));
$display("b Size ", $size(b));
$display("b Bits ", $bits(b)) ;
end
Gives :
a Size 10
a Bits 10
b Size 6 // Depth of memory
b Bits 60 // Width * Depth
In your case you just have 1 dimensional arrays, not memories or structs so $size() and $bits() would be the same thing.
$size shall return the number of elements in the dimension, which is equivalent to $high - $low + 1. It is relative to the dimension, not only bit counts. If the type is 1D packed array or integral type, it is equal to $bits.
$bits system function returns the number of bits required to hold an expression as a bit stream.
$bits ( [expression|type_identifier] )
It returns 0 when called with a dynamically sized type that is currently empty. It is an error to use the $bits system function directly with a dynamically sized type identifier.
I have no idea about your question, c <= [($size(a)+$size(b)-1]-:$bits(b)];. Is it a valid expression in RHS? Are you talking about the array range expression, [n +: m] or [n -: m] ?

Resources