Bitmask for obtaining the value in between X and Y bits - c

I have a char pointer which points to 16 bytes of an array (Therefore 128 bits).
These bits contain some valuable information for my task and I need to parse the values from their fixed locations. For example, there is a time information in between 23rd bit and 12nd bit. I know that I need to have bitmask to retrieve such data but I couldn't manage it.
Could anyone tell me how I can get this information?
Finally, I need to convert those retrieved bits to integer but this is already the easy part of the task.

A general approach would be to select the byte inside the square brackets, then mask off the bits you care about using the single &, and then right shift the byte by the number of bits it is from the lsb, then you have the integer value.
e.g.
int i = (int)((a[1] & 0x6) >> 1);
This will get the value of 2nd and 3rd least significant bits of the second word and put the value in an integer.
More specifically for your question:
23rd bit and 12nd bit
int i = (int)((a[1] & 0xF0) >> 4) || (((int)(a[2])) << 8);
Another approach in C/C++ would to define a packed struct with bitfields.

Related

Is bit masking comparable to "accessing an array" in bits?

For all the definitions I've seen of bit masking, they all just dive right into how to bit mask, use bitwise, etc. without explaining a use case for any of it. Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
I will say the answer is no.
When you access an array of int you'll do:
int_array[index] = 42; // Write access
int x = int_array[42]; // Read access
If you want to write similar functions to read/write a specific bit in e.g. an unsigned int in a "array like fashion" it could look like:
unsigned a = 0;
set_bit(a, 4); // Set bit number 4
unsigned x = get_bit(a, 4); // Get bit number 4
The implementation of set_bit and get_bit will require (among other things) some bitwise mask operation.
So yes - to access bits in an "array like fashion" you'll need masking but...
There are many other uses of bit level masking.
Example:
int buffer[64];
unsigned index = 0;
void add_to_cyclic_buffer(int n)
{
buffer[index] = n;
++index;
index &= 0x3f; // Masking by 0x3f ensures index is always in the range 0..63
}
Example:
unsigned a = some_func();
a |= 1; // Make sure a is odd
a &= ~1; // Make sure a is even
Example:
unsigned a = some_func();
a &= ~0xf; // Make sure a is a multiple of 16
This is just a few examples of using "masking" that has nothing to do with accessing bits as an array. Many other examples can be made.
So to conclude:
Masking can be used to write functions that access bits in an array like fashion but masking is used for many other things as well.
So there are 3 (or 4) main uses.
One, as you say, is where you use the word as a set of true/false flags, where each flag is just indexed in a symmetric manner. I use 'word' here to be the piece of discrete memory that you are accessing in a single operation. So a byte holds 8 bit values, and a 'long long' holds 64 bits. With a bit more effort an array of words can be used as an array of more packed flags.
A second is where you are doing some manipulation of the value, but still consider the word to hold one value. There are many tricks like setting or clearing bottom bits to ensure alignment, or clearing top bits to get a modulus, shifting to divide or multiply by powers of 2.
A third use is where you want to pack lots of smaller-ranged values into a word. Each of the values is a particular meaning in context. This may either be because you need to communicate with a device that has defined this as the protocol, or because you need to create so many objects that the saving in space in each object outweighs the increase in code size and code speed cost (though that might be contrasted with the increased cache misses causing slowdown if the object were bigger).
As a distinction the fourth case is where these fields are distinct 1-bit flags that have specific meanings in the context of the code. Data objects tend to collect a number of such flags, and it is simply more convenient sometimes to store them as bits in a single location, than to use separate bytes for each flag. Generally testing a particular fixed indexed bit, or a fixed masked bit is no more expensive in code size or speed than testing the whole byte, though writing can be more complex. The storage savings are clear, so often programmers will declare an enumeration of bit masks by default when faced with creating a number of flags in a structure, or when writing a function.

CRC - 16bit Lookup table using different polynomial notations

I'm using a 16bit CRC and have a lookup table(LUT) generator, which produces a LUT for a given polynomial. The generator code I used uses the Koopman notation (e.g. 0x8810 for CCITT) and therefore produces the first table row as:
0x0000, 0x8810, 0x9830, 0x1020, 0xB870, 0x3060, 0x2040, 0xA850,
I found a already computed CCITT-table with implementation on the internet however which apparently uses a different notation with the first line given as:
0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,
My question is: Do the short and long notations (0x8810 vs. 0x11021) produce the same results with different tables (i.e. the usage of the the LUT differs) or are the CRCs different using the same polynomial in different notations?
ps:As far as I know 0x8810 and 0x11021 are the non-reflected Koopman/"normal" notations and 0x8408 and 0x10811 the reflected ones (for CCITT)
pps: The "usage code" for the second table is given as:
uint16_t crc16_block(uint16_t crc, uint8_t *data, int len){
int i;
for (i = 0; i < len; i++)
crc = (crc << 8) ^ crc16_tbl[(crc >> 8) ^ data[i]];
return crc;
}
Koopman's notation represents a polynomial, but it is not a polynomial. You cannot use it as an input to the lookup table generator you used. Your first table is useless, since the implied polynomial does not have a low bit of 1.
Koopman's notation depends on the fact that all CRC polynomials end in a 1. The polynomial always has a + 1 term. When converted to binary, they always start with a 1 (the highest power of x), and always end with a 1. E.g. 10001000000100001, or 0x11021, for the CCITT polynomial, x16+x12+x5+1.
The annoying thing about that number is that it takes 17 bits to represent. You would like to have a notation that only uses 16 bits to make it easier to specify a polynomial in a computer program with 16-bit integers (or similarly, needing 32 bits instead of 33 bits to specify a 32-bit CRC).
There are two solutions. Drop the high 1, or drop the low 1. Usually you will see the high 1 dropped. I.e. 0x1021, plus you then need to also provide the length of the CRC, 16 in this case. So the specification is 16, 0x1021. (There are other things you need to specify as well, but for now we will limit ourselves to the size of the CRC and the polynomial.)
Koopman realized that if you instead dropped the low 1, you wouldn't even need to specify the length, and still specify a 16-bit CRC polynomial in 16 bits. You drop the low 1 by shifting down one. So 0x11021 becomes 0x8810. The high 1 is still there, so it implicitly defines the length of the CRC.
However, to make use of a CRC in the Koopman notation, you must shift it up by one and add one to get the polynomial for the calculation and the table.

constructing key by bit shifting 3 integers in C

I want to construct a key composed of 3 values by using bit shifting operations:
According to my understanding, the C statement code I am starting from creates a hash table by constructing its keys from certain data variables:
uint64_t key = (uint64_t)c->pos<<32 | c->isize;
My interpretation is that key is a combination of the last 32 digits
of c->pos, which must be a 64 bit unsigned integer, and c->isize, also a 64bit unsigned integer.
But I am not sure if that is the case, and maybe the | pipe operator
has a different meaning when applied to bit shifting operations.
What I want to do next is to modify the way key is constructed and
include a third c->barc element into the variable. Given the number
of possibilities of c->barc and c->isize, I was thinking that instead
of building key with 32+32 bits (pos+isize), I would build it
with 32+16+16 bits (pos+isize+barc) splitting the last 32 bits between
isize and barc.
Any ideas how to do that?
What I think you need is a solid explanation of bitmasking.
For this particular case, you should use the & operator to mask out the upper 16 bits of c->isize before shifting it up, and then use the & operator again to mask the upper 48 bits of c->barc.
Let's look at some diagrams.
let
c->pos = xxxx_xxxx_....._xxxx
c->isize = yyyy_yyyy_....._yyyy
c->barc = zzzz_zzzz_....._zzzz
where
x, y, and z are bits.
note: underscores are to identify groups of 4 bits.
If I understand correctly, you want a 64-bit number like this:
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_yyyy_yyyy_yyyy_yyyy_zzzz_zzzz_zzzz_zzzz
right?
As you already know, we get the upper 32 x's by doing
|-----32 bits of pos----|---32 0 bits--|
(uint64_t)c->pos<<32 = xxxx_xxxx_...._xxxx_xxxx_0000_...._0000
Now, we want to bitwise-or that with the following:
|----------32 0 bits----|
0000_0000_...._0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
To get that number there, we do this:
((c->isize & 0xffff) << 16)
because:
c->isize & 0xffff gives
yyyy_yyyy_yyyy_yyyy_yyyy_yyyy_yyyy_yyyy
& 0000_0000_0000_0000_1111_1111_1111_1111
---------------------------------------------
0000_0000_0000_0000_yyyy_yyyy_yyyy_yyyy
and then we shift it left by 16 to get
|--------32 0 bits------|
0000_0000_...._0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
Now, the final part, the
|-------48 0 bits-------|
0000_0000_...._0000_0000_zzzz_zzzz_zzzz_zzz
is the result plain and simply of
(c->barc & 0xffff) =
zzzz_zzzz_zzzz_zzzz_zzzz_zzzz_zzzz_zzzz
& 0000_0000_0000_0000_1111_1111_1111_1111
-------------------------------------------------
0000_0000_0000_0000_zzzz_zzzz_zzzz_zzzz
So we take all of these expressions and bitwise-or them together.
uint64_t key = ((uint64_t)c->pos << 32) | ((c->isize & 0xffff) << 16)
| (c->barc & 0xffff);
if we diagram it out, we see
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_0000_0000_0000_0000_0000_0000_0000_0000
0000_0000_0000_0000_0000_0000_0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
or 0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_zzzz_zzzz_zzzz_zzzz
-----------------------------------------------------------------------------------
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_yyyy_yyyy_yyyy_yyyy_zzzz_zzzz_zzzz_zzzz
The "pipe operator" is actually a bitwise OR operator. The code takes two (presumably) 32-bit integers, one of them shifts left by 32 bits and combines them together. Thus you get a single 64-bit number. See Wiki for more info about bitwise operations.
If you want to compose your key from three 32-bit integers, then you obviously have to manipulate them to fit them into 64 bits. You can do something like this:
uint64_t key = (uint64_t)c->pos<<32 | (c->isize & 0xFFFF0000) | (c->barc & 0xFFFF);
This code takes 32 bits from c->pos, shifts them in the higher 32 bits of the 64-bit key, then takes the higher 16 bits of c->isize and finally the lower 16 bits of c->barc. See here for more.
I wouldn't do it. It is not safe if you are not designing whole thing by yourself. But let's explain some things.
My interpretation is that key is a combination of the last 32 digits of c->pos,
Generally, yes.
which must be a 64 bit unsigned integer, and c->isize, also a 64bit unsigned integer.
No. You know nothing about size of type of pos andisize, it is cast onto uint64_t it might be any type that allows such a cast.
My bet is that both values are 32-bit. 1st value is being cast onto 64bit type, because bit shift equal to or greater than the width of the type is undefined behaviour. So to stay safe it is widened.
The code probably packs two 32bit values into a 64bit one, otherwise it would loose information.
Moreover, if it wanted to construct key from values which would overlap it would most probably use xor rather than or. Your way is not a good approach, unless you precisely know what are you doing. You should find out what types your operands are and then choose a method for creation keys out of them.

Control over array with bitmap

I've now created an array with 64 spots for 8-byte blocks.
How do I implement a bitmap that checks if these spots are used?
I created the array with 64spots with this uint64_t array[64]
Since this is (untagged) homework, I won't give code, but:
First: what's a bitmap? It's using the values of individual bits in a variable individually, rather than interpreting the value as a whole. So, if you need a bitmap to indicate which of 64 blocks are in use, you would need 64 bits - a 64-bit type would do well for this. If you need a bitmap for each of the individual bytes in a "block", the same idea holds, but you can use a smaller size map (one byte bitmap) and more of them, obviously.
Then, you need to access each bit individually - there are bitwise operators that make this easy. Bit 0 would indicate the state of block 0 (unset, or 0 = unused, 1, or set = used), bit 63 would indicate the state of block 63. A bitwise AND between a value that has the bit of the block you'd like to check, and the bitmap, will return whether that block is being used. It can be easy to visualize bits in a data type by opening up calculator programs and setting them to "programmer" mode.
Setting a bit is easy - you can bitwise OR the set bit. Unsetting is a bit trickier if you're not familiar with it, but easy once you've grasped the concept of bitwise operations.
Assuming you are numbering the bits 0 .. 4095.
Then 6 bits represent the index into the bitmap bytes and 6 bits represent the bit within the array element.
if you have
unsigned int bit ;
then
unsigned int index = (bit >> 6) & 63 ;
uint64_t mask = 1 << (bit & 63) ;
if (array [index] & mask)
// bit is set

Get byte - how is this wrong?

I want to get the designated byte from a 32 bit integer. I am getting wrong values but I don't know why.
The restrictions to this problem are:
Must use signed bits, and I can't use multiplication.
I specifically need to know what is wrong with the function as it's below.
Here is the function:
int retrieveByteFromWord(int word, int byte)
{
return (word >> (byte << 3)) & 0xFF;
}
ex:
(3) (2) (1) (0) ------ byte number
In word: 10010011 11001100 00110011 10101000
I want to return byte 2 (1100 1100).
retrieveByteFromWord(word, 2) ---- gives: 1100 1100
But for some cases it's wrong and it won't tell me what case.
Any ideas?
Here is the problem:
You just started working for a company that is implementing a set of procedures to operate on a data structure where 4 signed bytes are packed into a 32 bit unsigned. Bytes within the word are numbered from 0(LSB) to 3(MSB). You have been assigned the task of implementing a function for a machine using 2's complement arithmetic and arithmetic right shifts with the following prototype:
typedef unsigned packed_t
int xbyte(packed_t word, int bytenum);
This is the previous employees attempt which got him fired for being wrong:
int xbyte(packed_t word, int bytenum)
{
return (word >> (bytenum << 3)) & 0xFF;
}
A) What is wrong with the code?
B) Write a correct implementation using only left and right shifts and one subtraction.
I have done B but still don't know why A is wrong. Is it because the decimal numbers going in like 12, 15, 19, 55 and then getting packed into a word and then when I extract them they aren't the same number anymore??? It might be so I am going to run some tests real fast...
As this is homework I won't give you a full answer, but I'll point you in the right direction. Your problem statement says that:
4 signed bytes are packed into a 32 bit unsigned.
When you bitwise & a 32 bit signed integer with 0xFF the most significant bit - i.e. the sign bit - of the result is always 0, so the original function never returns a negative value regardless of the input.
By way of example...
When you say "retrieveByteFromWord(word, 2) ---- gives: 11001100" you're wrong.
Your return type is a 32 bit integer - not an 8 bit integer. You're not returning 11001100 you're returning 00000000 00000000 00000000 11001100.
To work with numbers, use signed integer types such as int.
To work with bits, use unsigned integer types such as unsigned. I.e. let the word argument be of type unsigned. That is what the unsigned types are for.
To multiply by 8, write just *8 (this does not mean that that part of the code is technically wrong, just that it is artificially contrived and needlessly unreadable).
Even better, create a self-describing name for that magic number 8, e.g. *bitsPerByte (the standard library calls it CHAR_BIT, which is not particularly self-describing nor readable).
Finally, at the design level, think about designing your functions so that the code that uses a function of yours – each call – becomes clear and readable. E.g. like int const b = byteAt( 2, x );. That can prevent bugs by e.g. preventing wrong actual argument order, and since designing for readability makes the code easier to read, it reduces time spent on that. :-)
Cheers & hth.,
Works fine for positive numbers. You may want to cast word to unsigned to make it work for integers with the MSB set.
int retrieveByteFromWord(int word, int byte)
{
return ((unsigned)word >> (byte << 3)) & 0xFF;
}

Resources