Formatting Dynamic Array of Bits as String in SystemVerilog - arrays

How can I format a dynamic array of bits (or more correctly, logics) as a string, e.g., for UVM's convert2string? For example, I would like to convert
logic vdyn[];
...
vdyn = new [16] ('{0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1});
to the string 0097.
I thought the following would work (the # are just to delimit the string for readability):
fmt = $sformatf("#%%%0dh#", (vdyn.size-1)/4 + 1); // "#%4h#"
vstr = $sformatf(fmt, { >> {vdyn}});
but it returns # x#, at least in Questa 10.3d (I suspect this is a bug - I'd be interested if it works on other simulators).
I tried converting it to a packed array first, but that runs into other problems. Without a size constraint on the result, the source value always gets left-justified in the destination variable, e.g.:
logic [63:0] v64;
...
v64 = {>> {vdyn}}; // 64'h0097000000000000
There's no way to print out just the part I want without using variable-size slices. The following works, but it requires that I know the size of the array at compile time:
v64 = 16'({>> {vdyn}}); // 64'h0000000000000097
The best thing I've found is the following "double-reverse" (note that I'm using << here, not >>):
v64 = {<< {vdyn}}; // 64'he900000000000000
v64 = {<< {v64}}; // 64'h0000000000000097
vstr = $sformatf(fmt, v64); // #0097#
It seems kind of hokey to have to do this, though. By the way, combining the first two statements into one doesn't work:
v64 = {<< {{<< {vdyn}}}}; // 64'hZ900000000000000
(v64[63] is z for some reason). Again, I suspect this is a bug in Questa 10.3d.

Try casting a slice of the array and loop through. For example an 4 entry slice that is cast to a 4-bits value. A slize can be done with the -: or +: operator (See IEEE Std 1800-2012 § 7.4.3 Operations on arrays and § 7.4.6 Indexing and slicing of arrays)
vstr = "";
for(int i=vdyn.size()-1; i>=0; i-=4) begin
vstr = $sformatf("%h%s", 4'({>>{vdyn[i -: 4]}}), vstr);
end
vstr = $sformatf("#%s#", vstr); // user formatting
The 4s in the code can be changed to something else depending how much leading 0 or a non-power-of-two formatting is desired, but it must be a numeric constant
I tried your code on some other simulators. vstr = $sformatf(fmt, { >> {vdyn}}); sometimes gave me compiling errors. Casting the array to something bigger than its expected max size seems to work
fmt = $sformatf("#%%%0dh#", (vdyn.size-1)/4 + 1); // "#%4h#"
vstr = $sformatf(fmt, 128'({ >> {vdyn}})); // works iff 128>=vdyn.size

I think the problem may be that the width of the streaming operator using dynamic types is not defined in a self-determined context (e.g. an argument to a system task). I think the LRM should have treated this an error.
A work-around is to shift the left-justified result to the right by
v64 = {>> {vdyn}};
v64 >>= 64-vdyn.size;

Related

Conditional SSE/AVX add or zero elements based on compare

I have the following __m128 vectors:
v_weight
v_entropy
I need to add v_entropy to v_weight only where elements in v_weight are not 0f.
Obviously _mm_add_ps() adds all elements regardless.
I can compile up to AVX, but not AVX2.
EDIT
I do know beforehand how many elements in v_weight will be 0 (there will always be either 0 or the last 1, 2, or 3 elements). If it's easier, how do I zero-out the corresponding elements in v_entropy?
The cmpeq/cmpgt instructions create a mask, all ones or all zeros. The overall process goes as follows:
auto mask=_mm_cmpeq_ps(_mm_setzero_ps(), w);
mask=_mm_andnot_ps(mask, entropy);
w = _mm_add_ps(w, mask);
Other option is to accumulate anyway, but use blendv to select between added/not added.
auto w2=_mm_add_ps(e,w);
auto mask=_mm_cmpeq_ps(zero,w);
w=_mm_blendv_ps(w2,w, mask);
Third option uses the fact that w+e = 0, when w=0
m=(w==0); // make mask as in above
w+=e; // add
w&=~m; // revert adding for w==0
(I'm using cmpeq instead of cmpneq to make it usable for integers as well.)

Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

I am parsing a Photoshop raw, 16 bit/channel, RGB file in C and trying to keep a log of exceptional data points. I need a very fast C analysis of up to 36 MPix images with 16 bit quanta or 216 MB Photoshop .RAW files.
<1% of the points have weird skin tones and I want to graph them with PerlMagick or Perl GD to see where they are coming from.
The first 4 bytes of the C data file contain the unsigned image width as a uint32_t. In Perl, I read the whole file in binary mode and extract the first 32 bits:
Xres=1779105792l = 0x6a0b0000
It looks a lot like the C log file:
DA: Color anomalies=14177=0.229%:
DA: II=1) raw PIDX=0x10000b25, XCols=[0]=0x00000b6a
Dec(0x00000b6a) = 2922, the Exact X_Columns_Width of a small test file.
Clearly a case of intel's 1972 8008 NUXI architecture. How hard could it possibly be to translate 0x6a0b0000 to 0x6a0b0000; swap 2 bytes and 2 nibbles and you're done. Slicing the 8 characters and rearranging them could be done but that is the kind of ugly hack I am trying to avoid.
Grab the same 32 bit vector from file offset zero and unpack it as "VAX" unsigned long.
$xres = vec($bdat, 0, 32); # vec EXPR,OFFSET,BITS
$vul = unpack("V", vec($bdat, 0, 32));
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731
Every single hex character is mangled. Obviously wrong Endian, it is not VAX. The "Other" one is Network Big-endian
http://perldoc.perl.org/functions/pack.html
N An unsigned long (32-bit) in "network" (big-endian) order.
V An unsigned long (32-bit) in "VAX" (little-endian) order.
$nul = unpack("N", vec($bdat, 0, 32)); # Network Unsigned Long 32b
printf("Xres=0x%08x, NET ulong=%ul=0x%08x\n", $xres, $nul, $nul);
Xres=0x6a0b0000, NET ulong=825702201l=0x31373739
The $XRES still shows the right hex in the wrong order. The "NETWORK" long 32 bit uint extracted from the same bits is unrecognizable. Try Binary
$bits = unpack("b*", vec($bdat, 0, 32));
printf("bits=$bits, len=%d\n", length $bits);
bits=10001100111011001110110010011100100011000000110010101100111011001001110001001100, len=80
I clearly asked for 32 bits and got 80 bits. What gives?
Try for 4, unsigned, 8bit bytes which can NOT be swapped:
for($ii = 0; $ii < 4; $ii++) {
$bit_off=$ii*8; # Bit offset
$uc = unpack("C", vec($bdat, $bit_off, 8)); # C An unsigned char
printf("II $ii, bo $bit_off, d=%d, u=%u, x=0x%x\n",
$uc,$uc, $uc);
}
II 0, bo 0, d=49, u=49, x=0x31
II 1, bo 8, d=51, u=51, x=0x33
II 2, bo 16, d=49, u=49, x=0x31
II 3, bo 24, d=49, u=49, x=0x31
I am looking for hex 0, 6, a or b. There are no "3"s or "1"s in the right answer. Try pirating from a C file:
http://cpansearch.perl.org/src/MHX/Convert-Binary-C-0.76/tests/include/include/bits/byteswap.h
$x = $xres;
$x= (((($x) & 0xff000000) >> 24) | ((($x) & 0x00ff0000) >> 8) | ((($x) & 0x0000ff00) << 8) | ((($x) & 0x000000ff) << 24));
printf("\$xres=0x%08x -> \$x=0x%08x = %u\n", $xres, $x, $x);
$xres=0x6a0b0000 -> $x=0x00000b6a = 2922
It WORKS! But, this is uglier than converting the original, wrong order hex number to a string to untangle it:
$stupid_str = sprintf("%08x", $xres);
$stupid_num = join('', reverse ($stupid_str =~ m/../g));
printf("Stupid_num '%s'->0x%08x=%d\n", $stupid_num, $dec=hex $stupid_num, $dec);
Stupid_num '00000b6a'->0x00000b6a=2922
It's like judging the Ugliest Dog contest, but I would still rather have to maintain the text version than the even more abominable C version.
I know there are ways to do this in Java/Python/Go/Ruby/.....
I know there are command line utilities that do exactly this.
I must figure out how I am misusing either VEC or Unpack, both of which I have used a zillion times. It is the Brain Teasing aspect which is driving me nuts! EndianNess == EndianMess!!!
TYVM!
=================================================
Borodin,
Thanks for lookin' at this.
My intel processor is little-endian. When I read it back, it was trans-mutilated by vec to the "correct" big-endian, network format.
I just tried reading it VERBATIM from a BINARY file read and it works fine:
($b4 = $bdat) =~ s/^(....).*$/$1/msg; # Give me my 4 bytes back without mutilation!
printf("B4='%s'=>0x%08x=<0x%08x\n", $b4, unpack("L>", $b4), unpack("L<", $b4));
B4='j...' = >0x6a0b0000 = <0x00000b6a <<< THE RIGHT ANSWER!!!
If you try unpack 'V', $bdat then you will find that it works
That was my first attempt:
$vul = unpack("V", vec($bdat, 0, 32)); # UNPACK V!
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731 <<<< TOTALLY WRONG!
I had already verified that the $BDAT info was the right data in the wrong format. It just needed some rearrangement.
I just used vec() to generate 1 bit and 4 bit graphics files and it worked faithfully, returning the exact bits I wrote. It must have mistaken my Intel i7 for my IBM System/370. I7/37??? Easy mistake to make. :)
I read the [confusing] part about "converted to a number as with pack ...". That's why my number was backward. The >>unpack("V", vec($bdat"<< ... was my ill-fated attempt to byte-swap the backward number in $BDAT from the WRONG VEC()-preferred FORMAT to the native format supported by my architecture.
Now I understand why I saw so many examples of people extracting by the byte, to avoid Big Brother's helping hand!
Data::BitStream::Vec "uses a Perl vec to store the data. The vector is accessed in 1-bit units"
Thanks 1E6,
B
You are confusing things by combining vec with unpack
The correct way is simply
unpack 'V', $bdat
which returns a value of 0x00000B6A as you expect
vec($bdat, 0, 32) is equivalent to unpack 'N', $bdat as you can see from the value of $xres in your first code block, and the documentation for vec confirms this with
If BITS is 16 or more, bytes of the input string are grouped into chunks of size BITS/8, and each group is converted to a number as with pack()/unpack() with big-endian formats n/N
The line
$vul = unpack("V", vec($bdat, 0, 32))
is very wrong, because the decimal value of vec($bdat, 0, 32) is 1779105792, so you are then calling unpack on the string "1779105792" which doesn't do anything useful at all

Assign values to array requires memcpy

I've ran into a weird bug.
I'm writing code for a bootloader so I don't have many fancy libraries and all.
The code itself is pretty simple, it's
int array[32] = { 1, 2, 3, [...snip...], 31, 32 };
This code leads to an unresolved external problem regarding memcpy not being linked.
However this code compiles and link fine
int array[12] = { 1, 2, 3, [...snip...], 11, 12 };
In fact, the error comes between
int array[12] = { 0 };
and
int array[13] = { 0 };
The first one links fine, but the second cannot link. I just don't get why at size 13, the compiler suddently decides to rely on memcpy for the thing. I tried with both -O0 and -O3. My compiler is a windows executable called cl470, not really sure where that comes from.
Another weird thing is that this is problematic when I put it inside a function, but if I declare the array globally, then there is no problem.
Your compiler is performing a time-space tradeoff.
For the smaller array, the compiler is emitting individual instructions to initialise each array slot on the stack:
mov [ebp-4], 1
mov [ebp-8], 2
mov [ebp-12], 3
...
For the larger array, the compiler is placing the data in the program's read-only data segment and copying it onto the stack using memcpy:
.rodata:
_array_initialiser = { 1, 2, 3, ... }
push ebp-4
push _array_initialiser
push 32
call memcpy
This is why making the array file-scope or static will eliminate the memcpy; the array can be placed directly in the data segment and initialised at compile time.
Using memcpy for larger arrays is more efficient because it reduces code size and so reduces instruction cache misses.
Some things you could try are moving the array to file scope or making it static yourself; if you need it reinitialised each time through the array you can copy it into a local array manually (although the compiler could also convert such a loop into a memcpy!)
static const int array_data[] = { 1, 2, 3, ... };
int array[sizeof(array_data) / sizeof(array_data[0]))];
for (size_t i = 0; i < sizeof(array_data) / sizeof(array_data[0])); ++i)
array[i] = array_data[i];
Another option would be to generate the array programmatically; it looks like a simple for loop would work.
Third option would be to write and link in your own memcpy; it shouldn't require more than a few lines of code.
The following code copies data stored in your executable to stack.
int array[12] = { blah };
I guess the optimizer uses memcpy when the array size is greater than a certain number.
You probably want to do this:
static int array[12] = { blah };
By using the static keyword you prevent the compiler from generating code which copies static data to stack.

How can I create an s-function in Simulink with an input port that is a 2d array?

I am trying to create an s-function in Simulink using s-function builder that will accept a 2d array as an input. In the input ports I specify the dimensions: 2d, rows: 4, columns: 4. When I try to access the input port using f[x][y], it gives an error: "error C2109: subscript requires array or pointer type", for the lines where the input port is adressed.
How can I create an s-function in Simulink with an input port that is a 2d array?
Relevant code:
static void mdlInitializeSizes(SimStruct *S)
{
DECL_AND_INIT_DIMSINFO(inputDimsInfo);
DECL_AND_INIT_DIMSINFO(outputDimsInfo);
ssSetNumSFcnParams(S, NPARAMS);
if (ssGetNumSFcnParams(S) != ssGetSFcnParamsCount(S)) {
return; /* Parameter mismatch will be reported by Simulink */
}
ssSetNumContStates(S, NUM_CONT_STATES);
ssSetNumDiscStates(S, NUM_DISC_STATES);
if (!ssSetNumInputPorts(S, NUM_INPUTS)) return;
/*Input Port 0 */
inputDimsInfo.width = INPUT_0_WIDTH;
ssSetInputPortDimensionInfo(S, 0, &inputDimsInfo);
ssSetInputPortMatrixDimensions( S ,0, INPUT_0_WIDTH, INPUT_DIMS_0_COL);
ssSetInputPortFrameData(S, 0, IN_0_FRAME_BASED);
ssSetInputPortDataType(S, 0, SS_DOUBLE);
ssSetInputPortComplexSignal(S, 0, INPUT_0_COMPLEX);
ssSetInputPortDirectFeedThrough(S, 0, INPUT_0_FEEDTHROUGH);
ssSetInputPortRequiredContiguous(S, 0, 1); /*direct input signal access*/
if (!ssSetNumOutputPorts(S, NUM_OUTPUTS)) return;
ssSetNumSampleTimes(S, 1);
ssSetNumRWork(S, 0);
ssSetNumIWork(S, 0);
ssSetNumPWork(S, 0);
ssSetNumModes(S, 0);
ssSetNumNonsampledZCs(S, 0);
/* Take care when specifying exception free code – see sfuntmpl_doc.c */
ssSetOptions(S, (SS_OPTION_EXCEPTION_FREE_CODE |
SS_OPTION_USE_TLC_WITH_ACCELERATOR |
SS_OPTION_WORKS_WITH_CODE_REUSE));
}
In mdlOuputs I try to treat f(the port) as a normal array.
Example:
x=f[0][0];
This throws the error.
Edit:
Well, sort of figured it out.
You set the port dimensions according to the input parameters, then you can address the values using f[x*xw+y], where x and y are the x and y positions(starting with 0) and xw is the number of columns.
Haven't found a better way yet, but this works.
I'm guessing that the S-Function builder is generating code that looks like the following in mdlOutputs:
real_T *y0 = (real_T *)ssGetOutputPortSignal(S, 0);
// OR
real_T *y0 = ssGetOutputPortRealSignal(S, 0);
With either line y0 is a pointer to a 1-D array, so when you try to access it using 2 subscripts as if it were a 2-D array, the compiler complains.
You can fix it by changing the 2-D indexing to linear indexing as you've posted in the edit. This works perfectly fine, in-fact it is what the compiler would have to do behind the scenes anyway when you index into a 2-D array using 2 subscripts.
The other option is to cast the return value of ssGetInputPortSignal (or ssGetInputPortRealSignal) to a pointer to pointer type.
real_T **y0 = (real_T **)ssGetOutputPortSignal(S, 0);
y0[1][1] = 0;
As you mention in your edit, using linear indexing is in fact the correct way to access matrices in C MEX s-functions. Take a look at mdlOutputs in the sfun_matadd.c s-function example: http://www.ligo.caltech.edu/~rana/mat/Jenne/sfun_matadd.c. The comment in the example code explains it very neatly:
/*
* Note1: Matrix signals are stored in column major order.
* Note2: Access each matrix element by one index not two indices.
* For example, if the output signal is a [2x2] matrix signal,
* - -
* | y[0] y[2] |
* | y[1] y[3] |
* - -
* Output elements are stored as follows:
* y[0] --> row = 0, col = 0
* y[1] --> row = 1, col = 0
* y[2] --> row = 0, col = 1
* y[3] --> row = 1, col = 1
*/

hunting for a particular pair of bits '10' or '01' in a character array

This may be a slightly theoretical question. I have a char array of bytes containing network packets. I want to check for the occurrence of a particular pair of bits ('01' or '10')every 66 bits. That is to say once I locate the first pair of bits I have to skip 66 bits and check the presence of same pair of bits again. I am trying to implement a program with masks and shifts and it is kind of getting complicated. I want to know if someone can suggest a better way to do the same thing.
The code I have written so far looks something like this. It is not complete though.
test_sync_bits(char *rec, int len)
{
uint8_t target_byte = 0;
int offset = 0;
int save_offset = 0;
uint8_t *pload = (uint8_t*)(rec + 24);
uint8_t seed_mask = 0xc0;
uint8_t seed_shift = 6;
uint8_t value = 0;
uint8_t found_sync = 0;
const uint8_t sync_bit_spacing = 66;
/*hunt for the first '10' or '01' combination.*/
target_byte = *(uint8_t*)(pload + offset);
/*Get all combinations of two bits from target byte.*/
while(seed_shift)
{
value = ((target_byte & seed_mask) >> seed_shift);
if((value == 0x01) || (value == 0x10))
{
save_offset = offset;
found_sync = 1;
break;
}
else
{
seed_mask = (seed_mask >> 2) ;
seed_shift-=2;
}
}
offset = offset + 8;
seed_shift = (seed_shift - 4) > 0 ? (seed_shift - 4) : (seed_shift + 8 - 4);
seed_mask = (seed_mask >> (6 - seed_shift));
}
Another idea I came up with was to use a structure defined below
typedef struct
{
int remainder_bits;
int extra_bits;
int extra_byte;
}remainder_bits_extra_bits_map_t;
static remainder_bits_extra_bits_map_t sync_bit_check [] =
{
{6, 4, 0},
{5, 5, 0},
{4, 6, 0},
{3, 7, 0},
{2, 8, 0},
{1, 1, 1},
{0, 2, 1},
};
Is my approach correct? Can anyone suggest any improvements for the same?
Lookup Table Idea
There are only 256 possible bytes. That is few enough that you can construct a lookup table of all the possible bit combinations that can happen in one byte.
The lookup table value could record the bit position of the pattern and it could also have special values that mark possible continuation start or continuation finish values.
Edit:
I decided that continuation values would be silly. Instead, to check for a pattern that overlaps a byte, shift the byte and OR in the bit from the other byte, or manually check the end bits at each byte. Maybe ((bytes[i] & 0x01) & (bytes[i+1] & 0x80)) == 0x80 and ((bytes[i] & 0x01) & (bytes[i+1] & 0x80)) == 0x01 would work for you.
You didn't say so I am also assuming that you are looking for the first match in any byte. If you are looking for every match, then checking for the end pattern at +66 bits, that's a different problem.
To create the lookup table, I would write a program to do it for me. It could be in your favorite script language or it could be in C. The program would write a file that looked something like:
/* each value is the bit position of a possible pattern OR'd with a pattern ID bit. */
/* 0 is no match */
#define P_01 0x00
#define P_10 0x10
const char byte_lookup[256] = {
/* 0: 0000_0000, 0000_0001, 0000_0010, 0000_0011 */
0, 2|P_01, 3|P_01, 3|P_01,
/* 4: 0000_0100, 0000_0101, 0000_0110, 0000_0111, */
4|P_01, 4|P_01, 4|P_01, 4|P_01,
/* 8: 0000_1000, 0000_1001, 0000_1010, 0000_1011, */
5|P_01, 5|P_01, 5|P_01, 5|P_01,
};
Tedious. That's why I would write a program to write it for me.
This is a variation of the classic de-blocking problem that often comes up when reading from a stream. That is, data comes in discrete units that don't match up to the unit size that you wish to scan. The challenges in this are 1) buffering (which doesn't affect you because you have access to the whole array) and 2) managing all of the state (as you found out). A good approach is to write a consumer function that acts something like fread() and fseek() which maintains its own state. It returns the requested data you're interested in, aligned properly to the buffers you give it.

Resources