Verilog, logic OR-ing an entire array - arrays

Suppose I have an array like this:
parameter n=100;
reg array[0:n-1];
How would one get the logic-OR value of each and every bit in the array?
The resulted circuit must be combinatorial.
This is a follow up question from this one.
(see discussion below the answer)

I don't know if this meets your design requirements, but you might have a much easier time with a hundred bit bus reg [n-1:0] array; than by using an array of 1 bit wires. Verilog does not have the greatest syntax to support arrays. If you had a bus instead you could just do assign result = |array;
If you must use an array, than I might consider first turning it into a bus with a generate loop, and then doing the same:
parameter n=100;
reg array[0:n-1];
wire [n-1:0] dummywire;
genvar i;
generate
for (i = 0; i < n; i = i+1) begin
assign dummywire[i] = array[i];
end
endgenerate
assign result = |dummywire;
I'm not aware of a more elegant way to do this on arrays.

Related

Hiding number in large pointer array by modifying least significant bit of elements

Straight of the bat I will say I barely know what I'm doing here - I'm having a major trouble grasping bitwise operators in C. As an exercise in one of my courses I'm supposed to hide a number (unsigned int) in large pointer array (unsigned char) containing numbers. I'm doing it using srand (with key, so that I can decode it later) to choose specific elements of the array and then take one bit of the number I'm supposed to hide (iterating through all the bits) and changing the least significant bit of the array element. The choosing elements.
While I get the general idea I cannot, despite googling, figure out the bit operations. So having the size that I'm supposed to encode in i-th run of the loop (i-th bit of size) and randomly chosen current_element this is what I came up with to get the bit and then alter the element.
for (i=0; i<32; i++){
tmp = rand() % max;
current_element = array[tmp];
current_element ^= ((size >> i) & 0x01)<<7;
}
To decode I would write it analogically (where size is wiped unsigned char that I'm trying to write the decoded number to):
for (i=0; i<32; i++){
tmp = rand() % max;
current_element = array[tmp];
size = size ^ ((current_pixel.blue<<0)<<7);
}
These two are in diffrent functions and srand() is seeded anew in each of them beforehand.
But these are clearly not working and I don't even know which one (I can only check if it decoded correctly). Truth be told these are mostly copied from other things I found online as so far operating on individual bits eludes me. So I'd be grateful for some sort of advice on what is wrong here (and I'm aware probably everything is wrong here and it's all gibberish but I've been trying to fix it to no avail for quite a while now).
I'm only going to indirectly answer your question. I'm going to give you some gentle advice that many new programmers need.
If you want to learn to program, Stop googling and Think.
Break the problem down into steps. Write pseudocode:
encode:
for each bit in message_word:
select random array element
if bit is set:
toggle LSB of element.
decode:
for each bit in message_word:
select random element
if LSB is toggled:
set bit in message_word
Now, write the C code for each of those steps. You actually have most of the pieces.
// for each bit in message_word
for (i=0;i<sizeof(message_word); i++) {
// select random array element
tmp = rand() % max;
current_element = array[tmp];
// if bit is set:
if ( bit_is_set(message_word,i) ) {
// toggle LSB of element.
toggle_lsb(current_element);
}
}
Now that you are down to basic steps, maybe you can google "how to toggle a bit". But make sure you understand the answer before plugging it in.
int bit_is_set(word,bit) { return ((word>>bit)&0x01); }
int toggle_lsb(word) { return word ^ 1; }
However - this still won't work. Why? Time to think again. You made a copy of the value at the randomly selected array index. You toggled a bit in the copy. What effect is that going to have on the array?
Solve that and there's at least one more challenge. In the decode function, how will you implement is_lsb_toggled? How will you know if a given bit was supposed to be 1 or 0? You have all the information you need. Good Luck.

Systemverilog localparam array with configurable size

I want to create and define a localparam array in SystemVerilog. The size of the array should be configurable, and the value of each localparam array cell calculated based on its location. Essentially this code:
localparam [7:0] [ADDR_BITS-1:0] ADDR_OFFSET = '{
7*PAGE_SIZE,
6*PAGE_SIZE,
5*PAGE_SIZE,
4*PAGE_SIZE,
3*PAGE_SIZE,
2*PAGE_SIZE,
1*PAGE_SIZE,
0
};
but where the first '7' is replaced with a parameter, and where the parameter initialization is extended to the generic case. So I need a way to loop from 0 to (N-1) and set ADDR_OFFSET(loop) = loop*PAGE_SIZE.
The "obvious" option in SystemVerilog would be generate, but I read that placing a parameter definition inside a generate block generates a new local parameter relative to the hierarchical scope within the generate block (source).
Any suggestions?
For background reference: I need to calculate an actual address based on a base address and a number. The calculation is simple:
real_address = base_address + number*PAGE_SIZE
However, I don't want to have the "*" in my code since I am afraid the synt tool will generate a multiplier, that it will then try to simplify since PAGE_SIZE is a constant value. I am guessing that this can lead to more logic than if I try to do all calculations when generating the localparam array, since this for sure will not give any multiplier in logic.
So with the above localparam definition, I perform the desired address calculation like this:
function [ADDR_BITS-1:0] addr_calc;
input [ADDR_BITS-1:0] base_addr;
input [NBITS-1:0] num;
addr_calc = base_addr + ADDR_OFFSET[num];
endfunction
I think perhaps I found a solution. Wouldn't I essentially accomplish the same by not defining a localparam array, but rather performing the address calculation inside a loop? Since systemverilog sees the loop variable as "constant" (when it comes to generating logic) that seems to accomplish the same? Like this (inside the function I wrote above):
for (int loop1 = 0; loop1 < MAXNUM ; loop1++) begin
if (num == loop1) begin
addr_offset = CSP_PAGE_SIZE*loop1;
end
addr_calc = base_addr + addr_offset;
end
You can set your localparam with the return value of a function.
localparam bit [7:0] [ADDR_BITS-1:0] ADDR_OFFSET = ADDR_CALC();
function bit [7:0] [ADDR_BITS-1:0] ADDR_CALC();
for(int ii=0;ii<$size(ADDR_CALC,1); ii++)
ADDR_CALC[ii] = ii * PAGE_SIZE;
endfunction

Efficient search for series of values in an array? Ideally OpenCL usable?

I have a massive array I need to search (actually it's a massive array of smaller arrays, but for all intents and purposes, lets consider it one huge array). What I need to find is a specific series of numbers. Obviously, a simple for loop will work:
Pseudocode:
for(x = 0; x++) {
if(array[x] == searchfor[location])
location++;
else
location = 0;
if(location >= strlen(searchfor))
return FOUND_IT;
}
Thing is I want this to be efficient. And in a perfect world, I do NOT want to return the prepared data from an OpenCL kernel and do a simple search loop.
I'm open to non-OpenCL ideas, but something I can implement across a work group size of 64 on a target array length of 1024 would be ideal.
I'm kicking around ideas (split the target across work items, compare each item, looped, against each target, if it matches, set a flag. After all work items complete, check flags. Though as I write that, that sounds very inefficient) but I'm sure I'm missing something.
Other idea was that since the target array is uchar, to lump it together as a double, and check 8 indexes at a time. Not sure I can do that in opencl easily.
Also toying with the idea of hashing the search target with something fast, MD5 likely, then grabbing strlen(searchtarget) characters at a time, hashing it, and seeing if it matches. Not sure how much the hashing will kill my search speed though.
Oh - code is in C, so no C++ maps (something I found while googling that seems like it might help?)
Based on comments above, for future searches, it seems a simple for loop scanning the range IS the most efficient way to find matches given an OpenCL implementation.
Create an index array[sizeof uchar]. For each uchar in the search string make array[uchar] = position in search string of first occurence of uchar. The rest of array contains -1.
unsigned searchindexing[sizeof char] = { (unsigned)-1};
memcpy(searchindexing + 1, searchindexing, sizeof char - 1);
for (i = 0; i < strlen(searchfor); i++)
searchindexing[searchfor[i]] = i;
If you don't start at the beginning, an uchar occuring more than one time will get the wrong position entered into searchindexing.
Then you search the array by stepping strlen(searchfor) unless finding an uchar from searchfor.
for (i = 0; i < MAXARRAYLEN; i += strlen(searchfor))
if ((unsigned)-1 != searchindexing[array[i]]) {
i -= searchindexing[array[i]];
if (!memcmp(searchfor, &array[i], strlen(searchfor)))
return FOUND_IT;
}
If most of the uchar in array isn't in searchfor, this is probably the fastest way. Note the code has not been optimized.
Example: searchfor = "banana". strlen is 6. searchindexing['a'] = 5, ['b'] = 0, ['n'] = 4 and the rest a value not between 0 to 5, like -1 or maxuint. If array[i] is something not in banana like space, i increments by 6. If array[i] now is 'a', you might be in banana and it can be any of the 3 'a's. So we assume the last 'a' and move 5 places back and do a compare with searchfor. If succes, we found it, otherwise we step 6 places forward.

Efficient way to scan struct nested arrays

I have a struct that has several arrays members:
typedef myData someStruct {
uint16_t array1 [ARRAY_LENGTH]
uint16_t array2 [ARRAY_LENGTH]
} myData;
myData testData = {0}; // Global struct
At some point in my program I need to set the arrays to some set of predefined values, e.g., set array1 to all 0, array2 to all 0xFF, etc. My first instinct was to write out a for loop something like:
void someFunction (myData * test) {
for (uint16_t i = 0; i < ARRAY_LENGTH; ++i) {
test->array1[i] = 0xFF;
test->array2[i] = 0xCC;
}
}
However I then reasoned that the actions required by the program to do this would go something like:
load address of array1 first position
set value 0xFF;
load far address of array2 first postion
set value 0xCC;
load far address of array1 second position
set value 0xFF;
// and so on...
Whereas if I used a separate loop for each array the addresses would be a lot nearer each other (as arrays and structs stored contiguously), so the address loads are only to the next byte each time, making the code actually more efficient as follows:
void someFunction (myData * test) {
uint16_t i = 0;
for (i; i < ARRAY_LENGTH; ++i)
test->array1[i] = 0xFF;
for (i = 0; i < ARRAY_LENGTH; ++i)
test->array2[i] = 0xCC;
}
Is my reasoning correct, is the second one better? Furthermore, would a compiler (say gcc, for e.g.) normally be able to make this optimization itself?
It's going to depend on your system architecture. For example, on, say, a SPARC system, the cache line size is 64-bytes, and there are enough cache slots for both arrays, so the first version would be efficient. The load of the first array element would populate the cache, and subsequent loads would be very fast. If the compiler is smart enough, it can use prefetch as well.
On ISAs that support offset addressing, it doesn't actually fetch the address of the array element each time, it just increments an offset. So it only fetches the base address of the array, once, and then uses a load instruction with the base and offset. Each time through the loop it increments the offset in a register. Some instruction sets even have auto-increment.
The best thing to do would be to write a sample program/function, and try it. Optimizations at this low a level require either a thorough knowledge of the CPu/system, or lots of trial and error.
My humble recommendation: try and see. One loop solution saves arithmetic operations around increment and test of i. Two loops will probably profit of better cache optimization, especially if arrays are aligned to memory pages. In such case each access may cause a cache miss and cache reload. Personally, if the speed really matters I would prefer two loops with some unfolding.

Verilog Parallel Check and Assignment Across Dissimilar Sized Shift Registers

I'm looking to perform the cross-correlation* operation using an FPGA.
The secific part that I am currently struggling with is the multiplication piece. I want to multiply each 8-bit element of a nx8 shift register that uses excess or offset representation** against a nx1 shift register where I treat 0s as a -1 for the purposes of multiplication.
Now if I was doing that for a single element, I might do something like this for the operation:
input [7:0] dataIn;
input refIn;
output [7:0] dataOut;
wire [7:0] dataOut;
wire [7:0] invertedData;
assign invertedData = 8'd0 - dataIn;
assign dataOut <= refIn ? dataIn : invertedData;
What I'm wondering is how do I scale this to 4, 8, n elements?
My first though was to use a for loop like this:
for(loop=0; loop < n; loop = loop+1)
begin
assign invertedData[loop*8+7:loop*8] = 8'd0 - dataIn[loop*8+7:n*8];
assign dataOut[loop*8+7:loop*8] <= refIn[loop] ? dataIn[loop*8+7:loop*8] : invertedData[loop*8+7:loop*8];
end
This doesn't compile, but that's more or less the idea, and I can't seem to find the right syntax to do what I want.
https://en.wikipedia.org/wiki/Cross-correlation
** http://www.cs.auckland.ac.nz/~patrice/210-2006/210%20LN04_2.pdf
for(loop=0; loop < n; loop = loop+1)
begin
assign invertedData[n*8+7:n*8] = 8'd0 - dataIn[n*8+7:n*8];
assign dataOut[n*8+7:n*8] <= refIn[n] ? dataIn[n*8+7:n*8] : invertedData[n*8+7:n*8];
end
There's a few issues with this, but I think you can make this work.
You can't have 'assign' statements in a for loop. A for loop is meant to be used inside a begin/end block, so you need to change invertedData/dataOut from wire type to reg type, and remove the assign statements.
You generally can't have variable part-selects, unless you use the special constant-width selection operator (verilog-2001 support required). That would look like this: dataIn[n*8 +:8], which means: select 8 bits starting from n*8.
I don't know about your algorithm, but it looks like loop/n are backwards in your statement. You should be incrementing n, not loop variable (or else all statements will be operating on the same part-select).
So considering those points I believe this should compile for you:
always #* begin
for(n=0; n< max_loops ; n=n+1)
begin
invertedData[n*8 +:8] = 8'd0 - dataIn[n*8 +:8];
dataOut[n*8 +:8] <= refIn[n] ? dataIn[n*8 +:8] : invertedData[n*8 +:8];
end
end

Resources