Bit Vector to Integer vectors of 0s and 1s

Bit Vector to Integer vectors of 0s and 1s - c

I have a char[] vector, representing a bitmap, and I want to convert this to an integer vector where the nth bit in the char array corresponds to the nth entry in the integer array.
I can only think of doing it this way:
iterate through each byte, and "extract the nth bit" every time. It's simple, but it seems unnecessarily slow.
I can think of a few optimisations where, if I use "<<" and check whether the rest of the byte is 0, I can skip a few entries, but that won't add much as I expect my vector to be dense.
Any thoughts on how to make this more efficient
Thanks

I think that by checking to see char you are iterating through is < 2^i where i is the bit being checked you are approaching a lower bound on the problem.
In the following code I am using (1 << j) to be equal to 2^j and the & is the bitwise and operator, which will check if the value of the bit at that index is 1 or 0.
This runs in liner time considering the length of a byte is constant.
int i;
for(i=0;i<sizeof charVector;i++){
int j;
for(j=0;j<8;j++){
if (charVector[i]< (1 << j)){
break;
} else {
intVector[i*8 + j] = charVector[i] & (1 << j);
}
}
}

Related

CRC calculation reduction

I have one math and programming related question about CRC calculations, to avoid recompute full CRC for a block when you must change only a small portion of it.
My problem is the following: I have a 1K block of 4 byte structures, each one representing a data field. The full 1K block has a CRC16 block at the end, computed over the full 1K. When I have to change only a 4 byte structure, I should recompute the CRC of the full block but I'm searching for a more efficient solution to this problem. Something where:
I take the full 1K block current CRC16
I compute something on the old 4 byte block
I "subtract" something obtained at step 2 from the full 1K CRC16
I compute something on the new 4 byte block
I "add" something obtained at step 4 to the result obtained at step 3
To summarize, I am thinking about something like this:
CRC(new-full) = [CRC(old-full) - CRC(block-old) + CRC(block-new)]
But I'm missing the math behind and what to do to obtain this result, considering also a "general formula".
Thanks in advance.

Take your initial 1024-byte block A and your new 1024-byte block B. Exclusive-or them to get block C. Since you only changed four bytes, C will be bunch of zeros, four bytes which are the exclusive-or of the previous and new four bytes, and a bunch more zeros.
Now compute the CRC-16 of block C, but without any pre or post-processing. We will call that CRC-16'. (I would need to see the specific CRC-16 you're using to see what that processing is, if anything.) Exclusive-or the CRC-16 of block A with the CRC-16' of block C, and you now have the CRC-16 of block B.
At first glance, this may not seem like much of a gain compared to just computing the CRC of block B. However there are tricks to rapidly computing the CRC of a bunch of zeros. First off, the zeros preceding the four bytes that were changed give a CRC-16' of zero, regardless of how many zeros there are. So you just start computing the CRC-16' with the exclusive-or of the previous and new four bytes.
Now you continue to compute the CRC-16' on the remaining n zeros after the changed bytes. Normally it takes O(n) time to compute a CRC on n bytes. However if you know that they are all zeros (or all some constant value), then it can be computed in O(log n) time. You can see an example of how this is done in zlib's crc32_combine() routine, and apply that to your CRC.
Given your CRC-16/DNP parameters, the zeros() routine below will apply the requested number of zero bytes to the CRC in O(log n) time.
// Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
// polynomial, reflected. For speed, this requires that a not be zero.
uint16_t multmodp(uint16_t a, uint16_t b) {
uint16_t m = (uint16_t)1 << 15;
uint16_t p = 0;
for (;;) {
if (a & m) {
p ^= b;
if ((a & (m - 1)) == 0)
break;
}
m >>= 1;
b = b & 1 ? (b >> 1) ^ 0xa6bc : b >> 1;
}
return p;
}
// Table of x^2^n modulo p(x).
uint16_t const x2n_table[] = {
0x4000, 0x2000, 0x0800, 0x0080, 0xa6bc, 0x55a7, 0xfc4f, 0x1f78,
0xa31f, 0x78c1, 0xbe76, 0xac8f, 0xb26b, 0x3370, 0xb090
};
// Return x^(n*2^k) modulo p(x).
uint16_t x2nmodp(size_t n, unsigned k) {
k %= 15;
uint16_t p = (uint16_t)1 << 15;
for (;;) {
if (n & 1)
p = multmodp(x2n_table[k], p);
n >>= 1;
if (n == 0)
break;
if (++k == 15)
k = 0;
}
return p;
}
// Apply n zero bytes to crc.
uint16_t zeros(uint16_t crc, size_t n) {
return multmodp(x2nmodp(n, 3), crc);
}

CRC actually makes this an easy thing to do.
When looking into this, I'm sure you've started to read that CRCs are calculated with polynomials over GF(2), and probably skipped over that part to the immediately useful information. Well, it sounds like it's probably time for you to go back over that stuff and reread it a few times so you can really understand it.
But anyway...
Because of the way CRCs are calculated, they have a property that, given two blocks A and B, CRC(A xor B) = CRC(A) xor CRC(B)
So the first simplification you can make is that you just need to calculate the CRC of the changed bits. You could actually precalculate the CRCs of each bit in the block, so that when you change a bit you can just xor it's CRC into the block's CRC.
CRCs also have the property that CRC(A * B) = CRC(A * CRC(B)), where that * is polynomial multiplication over GF(2). If you stuff the block with zeros at the end, then don't do that for CRC(B).
This lets you get away with a smaller precalculated table. "Polynomial multiplication over GF(2)" is binary convolution, so multiplying by 1000 is the same as shifting by 3 bits. With this rule, you can precalculate the CRC of the offset of each field. Then just multiply (convolve) the changed bits by the offset CRC (calculated without zero stuffing), calculate the CRC of those 8 byes, and xor them into the block CRC.

The CRC is the remainder of the long integer formed by the input stream and the short integer corresponding to the polynomial, say p.
If you change some bits in the middle, this amounts to a perturbation of the dividend by n 2^k where n has the length of the perturbed section and k is the number of bits that follow.
Hence, you need to compute the perturbation of the remainder, (n 2^k) mod p. You can address this using
(n 2^k) mod p = (n mod p) (2^k mod p)
The first factor is just the CRC16 of n. The other factor can be obtained efficiently in Log k operations by the power algorithm based on squarings.

CRC depends of the calculated CRC of the data before.
So the only optimization is, to logical split the data into N segment and store the computed CRC-state for each segment.
Then, of e.g. modifying segment 6 (of 0..9), get the CRC-state of segment 5, and continue calculating CRC beginning with segment 6 and ending with 9.
Anyway, CRC calculations are very fast. So think, if it is worth it.

bit comparison in loop on AVRs

I'm learning about bit logic in C on AVRs and I have a problem.
I want to compare an "i" bit (from the right) from int8_t variable and if it is 1, then do the next instruction, but it doesn't work. Here's what I write:
if (variable & (1<<i)==(1<<i)) instruction;
In example for following data:
uint8_t dot=0101;
PORTC=1;
for (int i=0; i<4; i++)
{
PORTB = fourDigit[i];
if (dot & (1<<i)==(1<<i)) PORTB--;
PORTC<<=1;
}
The dot (as it is connected to PB0) should illuminate on the first and third digit, but at present it lamps on every digit. What's the problem?
Thanks for your time.

It is done by bit masking. If you want to check whether or not an i'th bit of a is 1 you will do something like this:
if (a & (1 << i))
{
// Do something
}
This way all of the bits of a except the i'th one will be ANDed with zeros, thus getting a value of zero. The i'th bit will be ANDed with 1, thus not changing it's value. So the if condition will be true in case the bit is not zero, and false otherwise.
The comparison code you are presenting should work as well, but I suspect the dot variable is not containing the value you think it is containing. uint8_t dot=0101; makes it to be equal to 101 in octal base (due to the leading zero) or 65 in decimal. Not 101 in binary.

String to very long sequence of length less than 1 byte

I can't guess how to solve following problem. Assume I have a string or an array of integer-type variables (uchar, char, integer, whatever). Each of these data type is 1 byte long or more.
I would like to read from such array but read a pieces that are smaller than 1 byte, e.g. 3 bits (values 0-7). I tried to do a loop like
cout << ( (tab[index] >> lshift & lmask) | (tab[index+offset] >> rshift & rmask) );
but guessing how to set these variables is out of my reach. What is the metodology to solve such problem?
Sorry if question has been ever asked, but searching gives no answer.

I am sure this is not the best solution, as there some inefficiencies in the code that could be eliminated, but I think the idea is workable. I only tested it briefly:
void bits(uint8_t * src, int arrayLength, int nBitCount) {
int idxByte = 0; // byte index
int idxBitsShift = 7; // bit index: start at the high bit
// walk through the array, computing bit sets
while (idxByte < arrayLength) {
// compute a single bit set
int nValue = 0;
for (int i=2; i>=0; i--) {
nValue += (src[idxByte] & (1<<idxBitsShift)) >> (idxBitsShift-i);
if ((--idxBitsShift) < 0) {
idxBitsShift=8;
if (++idxByte >= arrayLength)
break;
}
}
// print it
printf("%d ", nValue);
}
}
int main() {
uint8_t a[] = {0xFF, 0x80, 0x04};
bits(a, 3, 3);
}
The thing with collecting bits across byte boundaries is a bit of a PITA, so I avoided all that by doing this a bit at a time, and then collecting the bits together in the nValue. You could have smarter code that does this three (or however many) bits at a time, but as far as I am concerned, with problems like this it is usually best to start with a simple solution (unless you already know how to do a better one) and then do something more complicated.

In short, the way the data is arranged in memory strictly depends on :
the Endianess
the standard used for computation/representation ( usually it's the IEEE 754 )
the type of the given variable
Now, you can't "disassemble" a data structure with this rationale without destroing its own meaning, simply put, if you are going to subdivide your variable in "bitfields" you are just picturing an undefined value.
In computer science there are data structure or informations structured in blocks, like many hashing algorithms/hash results, but a numerical value it's not stored like that and you are supposed to know what you are doing to prevent any data loss.
Another thing to note is that your definition of "pieces that are smaller than 1 byte" doesn't make much sense, it's also highly intrusive, you are losing abstraction here and you can also do something bad.

Here's the best method I could come up with for setting individual bits of a variable:
Assume we need to set the first four bits of variable1 (a char or other byte long variable) to 1010
variable1 &= 0b00001111; //Zero the first four bytes
variable1 |= 0b10100000; //Set them to 1010, its important that any unaffected bits be zero
This could be extended to whatever bits desired by placing zeros in the first number corresponding to the bits which you wish to set (the first four in the example's case), and placing zeros in the second number corresponding to the bits which you wish to remain neutral in the second number (the last four in the example's case). The second number could also be derived by bit-shifting your desired value by the appropriate number of places (which would have been four in the example's case).
In response to your comment this can be modified as follows to accommodate for increased variability:
For this operation we will need two shifts assuming you wish to be able to modify non-starting and non-ending bits. There are two sets of bits in this case the first (from the left) set of unaffected bits and the second set. If you wish to modify four bits skipping the first bit from the left (1 these four bits 111 for a single byte), the first shift would be would be 7 and the second shift would be 5.
variable1 &= ( ( 0b11111111 << shift1 ) | 0b11111111 >> shift2 );
Next the value we wish to assign needs to be shifted and or'ed in.
However, we will need a third shift to account for how many bits we want to set.
This shift (we'll call it shift3) is shift1 minus the number of bits we wish to modify (as previously mentioned 4).
variable1 |= ( value << shift3 );

How do I implement a bit array?

Current direction:
Start with and unsigned char which is 1 Byte on my system using sizeof. Range is 0-255.
If length is the number of bits I need then elements is the number of elements (bytes) I need in my array.
constant unsigned int elements = length/8 + (length % y > 0 ? 1 : 0);
unsigned char bit_arr[elements];
Now I add basic functionality such as set, unset, and test. Where j is the bit per byte index, i is the byte index and h = bit index. We have i = h / 8 and j = i % 8.
Psuedo-Code :
bit_arr[i] |= (1 << j); // Set
bit_arr[i] &= ~(1 << j); // Unset
if( bit_arr[i] & (1 << j) ) // Test

Looks like you have a very good idea of what needs to be done. Though instead of pow(2, j), use 1 << j. You also need to change your test code. You don't want the test to do an assignment to the array.

pow() will give you floating-point values, which you don't want. At all. It might work for you, as you use powers of two, but it can get weird as j gets bigger.
You'd do a bit better to use 1 << j instead. Removes any chance of float weirdness, and it probably performs better, too.

Large bit arrays in C

Our OS professor mentioned that for assigning a process id to a new process, the kernel incrementally searches for the first zero bit in a array of size equivalent to the maximum number of processes(~32,768 by default), where an allocated process id has 1 stored in it.
As far as I know, there is no bit data type in C. Obviously, there's something I'm missing here.
Is there any such special construct from which we can build up a bit array? How is this done exactly?
More importantly, what are the operations that can be performed on such an array?

Bit arrays are simply byte arrays where you use bitwise operators to read the individual bits.
Suppose you have a 1-byte char variable. This contains 8 bits. You can test if the lowest bit is true by performing a bitwise AND operation with the value 1, e.g.
char a = /*something*/;
if (a & 1) {
/* lowest bit is true */
}
Notice that this is a single ampersand. It is completely different from the logical AND operator &&. This works because a & 1 will "mask out" all bits except the first, and so a & 1 will be nonzero if and only if the lowest bit of a is 1. Similarly, you can check if the second lowest bit is true by ANDing it with 2, and the third by ANDing with 4, etc, for continuing powers of two.
So a 32,768-element bit array would be represented as a 4096-element byte array, where the first byte holds bits 0-7, the second byte holds bits 8-15, etc. To perform the check, the code would select the byte from the array containing the bit that it wanted to check, and then use a bitwise operation to read the bit value from the byte.
As far as what the operations are, like any other data type, you can read values and write values. I explained how to read values above, and I'll explain how to write values below, but if you're really interested in understanding bitwise operations, read the link I provided in the first sentence.
How you write a bit depends on if you want to write a 0 or a 1. To write a 1-bit into a byte a, you perform the opposite of an AND operation: an OR operation, e.g.
char a = /*something*/;
a = a | 1; /* or a |= 1 */
After this, the lowest bit of a will be set to 1 whether it was set before or not. Again, you could write this into the second position by replacing 1 with 2, or into the third with 4, and so on for powers of two.
Finally, to write a zero bit, you AND with the inverse of the position you want to write to, e.g.
char a = /*something*/;
a = a & ~1; /* or a &= ~1 */
Now, the lowest bit of a is set to 0, regardless of its previous value. This works because ~1 will have all bits other than the lowest set to 1, and the lowest set to zero. This "masks out" the lowest bit to zero, and leaves the remaining bits of a alone.

A struct can assign members bit-sizes, but that's the extent of a "bit-type" in 'C'.
struct int_sized_struct {
int foo:4;
int bar:4;
int baz:24;
};
The rest of it is done with bitwise operations. For example. searching that PID bitmap can be done with:
extern uint32_t *process_bitmap;
uint32_t *p = process_bitmap;
uint32_t bit_offset = 0;
uint32_t bit_test;
/* Scan pid bitmap 32 entries per cycle. */
while ((*p & 0xffffffff) == 0xffffffff) {
p++;
}
/* Scan the 32-bit int block that has an open slot for the open PID */
bit_test = 0x80000000;
while ((*p & bit_test) == bit_test) {
bit_test >>= 1;
bit_offset++;
}
pid = (p - process_bitmap)*8 + bit_offset;
This is roughly 32x faster than doing a simple for loop scanning an array with one byte per PID. (Actually, greater than 32x since more of the bitmap is will stay in CPU cache.)

see http://graphics.stanford.edu/~seander/bithacks.html

No bit type in C, but bit manipulation is fairly straight forward. Some processors have bit specific instructions which the code below would nicely optimize for, even without that should be pretty fast. May or may not be faster using an array of 32 bit words instead of bytes. Inlining instead of functions would also help performance.
If you have the memory to burn just use a whole byte to store one bit (or whole 32 bit number, etc) greatly improve performance at the cost of memory used.
unsigned char data[SIZE];
unsigned char get_bit ( unsigned int offset )
{
//TODO: limit check offset
if(data[offset>>3]&(1<<(offset&7))) return(1);
else return(0);
}
void set_bit ( unsigned int offset, unsigned char bit )
{
//TODO: limit check offset
if(bit) data[offset>>3]|=1<<(offset&7);
else data[offset>>3]&=~(1<<(offset&7));
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight