Efficient way to create/unpack large bitfields in C?

Efficient way to create/unpack large bitfields in C? - c

I have one microcontroller sampling from a lot of ADC's, and sending the measurements over a radio at a very low bitrate, and bandwidth is becoming an issue.
Right now, each ADC only give us 10 bits of data, and its being stored in a 16-bit integer. Is there an easy way to pack them in a deterministic way so that the first measurement is at bit 0, second at bit 10, third at bit 20, etc?
To make matters worse, the microcontroller is little endian, and I have no control over the endianness of the computer on the other side.
EDIT: So far, I like #MSN's answer the best, but I'll respond to the comments
#EvilTeach: I'm not sure if the exact bit pattern would be helpful, or how to best format it with text only, but I'll think about it.
#Jonathan Leffler: Ideally, I'd pack 8 10-bit values into 10 8-bit bytes. If it makes processing easier, I'd settle for 3 values in 4 bytes or 6 values in 8 bytes (although the 2 are equivalent to me, same amount of 'wasted' bits)

Use bit 0 and 31 to determine endianness and pack 3 10-bit values in the middle. One easy way to test matching endianness is to set bit 0 to 0 and bit 31 to 1. On the receiving end, if bit 0 is 1, assert that bit 31 is 0 and swap endianness. Otherwise, if bit 0 is 0, assert that bit 31 is 1 and extract the 3 values.

You can use bitfields, but the ordering within machine words is not defined:
That said, it would look something like:
struct adc_data {
unsigned first :10;
unsigned second :10;
unsigned third :10;
};
EDIT: Corrected, thanks to Jonathan.

The simplest thing to do about endian-ness is to simply pick one for your transmission. To pack the bits in the transmission stream, use an accumulator (of at least 17 bits in your case) in which you shift in 10 bits at a time and keep track of how many bits are in it. When you transmit a byte, you pull a byte out of the accumulator, subtract 8 from your count, and shift the accumulator by 8. I use "transmit" here loosely, your probably storing into a buffer for later transmission.
For example, if transmission is little endian, you shift in your 10 bits at the top of the acccumator (in its MS bits), and pull your bytes from the bottom. For example: for two values a and b:
Accumulator Count
(MS to LS bit)
aaaaaaaaaa 10 After storing a
aa 2 After sending first byte
bbbbbbbbbbaa 12 After storing b
bbbb 4 After sending second byte
Reception is a similar unpacking operation.

Related

Storing individual bits in memory

So I want to store random bits of length 1 to 8 (a BYTE) in memory. I know that computer aren't efficient enough to store individual bits in memory and that we must store at least a BYTE data on most modern machines. I have been doing some research on this but haven't come across any useful material. I need to find a way to store these bits so that, for example, when reading the bits back from the memory, 0 must NOT be evaluated as 00, 0000 or 00000000. To further explain, for example, 010 must NOT be read back or evaluated for that matter, as 00000010. Numbers should be unique based on the value as well as their cardinality.
Some more examples;
1 ≠ 00000001
10 ≠ 00000010
0010 ≠ 00000010
10001 ≠ 00010001
And so on...
Also one thing i want to point out again is that the bit size is always between 1 and 8 (inclusive) and is NOT a fixed number. I'm using C for this problem.

So you want to store bits in memory and read them back without knowing how long they are. This is not possible. (It's not possible with bytes either)
Imagine if you could do this. Then we could compress a file by, for example, saying that "0" compresses to "0" and "1" compresses to "00". After this "compression" (which would actually make the file bigger) we have a file with only 0's in it. Then, we compress the file with only 0's in it by writing down how many 0's there are. Amazing! Any 2GB file compresses to only 4 bytes. But we know it's impossible to compress every 2GB file into 4 bytes. So something is wrong with this idea.
You can read several bits from memory but you need to know how many you are reading. You can also do it if you don't know how many bits you are reading, but the combinations don't "overlap". So if "01" is a valid combination, then you can't have "010" because that would overlap "01". But you could have "001". This is called a prefix code and it is used in Huffman coding, a type of compression.
Of course, you could also save the length before each number. So you could save "0" as "0010" where the "001" means how many bits long the number is. With 3-digit lengths, you could only have up to 7-bit numbers. Or 8-bit numbers if you subtract 1 from the length, in which case you can't have zero-bit numbers. (so "0" becomes "0000", "101" becomes "010101", etc)

You can control bits using bit shift operators or bit fields
Make sure you understand the endianess concept, that is machine dependent. and keep in mind that bit fields needs a struct, and a struct uses a minimum of 4 bytes.
And bit fields can be very tricky.
Good luck!

If you just need to make sure a given binary number is evaluated properly, then you have two choices I can think of. You could store all of the amount of bits of each numbers alongside with the given number, which wouldn't be so efficient.
But you could also store all the binary numbers as being 8-bit, then when processing each individual number, pass through all of its digits to find its length. That way you just store the lenght of a single number at a time.
Here is some quick code, hopefully it's clear:
Uint8 rightNumber = 2; //Which is 10 in binary, or 00000010
int rightLength = 2; //Since it is 2 bits long
Uint8 bn = mySuperbBinaryValueIWantToTest;
int i;
for(i = 7; i > 0; i--)
{
if((bn & (1 << i)) != 0)break;
}
int length = i + 1;
if(bn == rightNumber && length == rightLength) printf("Correct number");
else printf("Incorrect number");
Keep in mind you can also use the same technique to calculate the amount of bits inside the right value instead of precomputing it. If it's to arbitrary values you are comparing, the same can also work.
Hope this helped, if not, feel free to criticize/re-explain your problem

Writing integer values to a file in binary using C

I am trying to write 9 bit numbers to a binary file.
For example, i want to write the integer value: 275 as 100010011 and so on. fwrite only allows one byte to be written at a time and I am not sure how to manipulate the bits to be able to do this.

You have to write a minimum of two bytes to store a 9-bits value. An easy solution is to use 16 bits per 9 bits value
Choose a 16 bits unsigned type, eg uint16_t and store the 2 bytes
uint16_t w = 275;
fwrite(&w, 1, 2, myfilep);
Reading the word w, ensure it actually uses only its 9 first bits (bits 0~8)
w &= 0x1FF;
Note that you might have endianness issues if you read the file on another system that doesn't have the same endianness as the system that wrote the word.
You could also optimize that solution using 9 bits of a 16 bits word, then using the remaining 7 bits to store the first 7 bits of the next 9 bits value etc...
See this answer that explains how to work with bit shifting in C.

How Are Little Endian Structs With Bitfields and Longwords Stored?

So, I can understand that a word of 0x1234, when stored as little-endian, becomes 0x3412 in memory. I am also seeing that byte 0x12 as a bitfield a:4 and b:4 would be stored as 0x21. But what if I have something more complex? Data like 0x1700581001FFFFFF with the following struct ordering? I'm seeing the data stored as 0x7180051001FFFFFF which is making very little sense to me. It seems 'a' and 'b' got swapped but they remained at the beginning of the struct and g remained at the end along with other seemingly random swaps. Why? Also, I left the "LONGWORD" denotion because that is there in the code. I'm not sure how 4 bits can be a longword, but perhaps that has something to do with this craziness?
LONGWORD a: 4
LONGWORD b: 4
LONGWORD c: 4
LONGWORD d: 12
LONGWORD e: 8
LONGWORD f: 8
LONGWORD g: 24

In an "implementation-defined manner". Per 6.7.2.1 Structure and union specifiers, paragraph 11, of the C Standard:
An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains, a bit-field
that immediately follows another bit-field in a structure shall be
packed into adjacent bits of the same unit. If insufficient space
remains, whether a bit-field that does not fit is put into the next
unit or overlaps adjacent units is implementation-defined. The order
of allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined. The alignment of
the addressable storage unit is unspecified.
To answer your question But what if I have something more complex? Data like 0x1700581001FFFFFF with the following struct ordering?
The proper answer in that case, if you want portable and reliable code, is to not use bit-fields. The fact that you have failed to provide enough information in your question for anyone to provide an answer as to how that data will be placed into the bit-fields you described should inform you what the problems are when using bit-fields.
For example, given your bit-fields
LONGWORD a: 4
LONGWORD b: 4
LONGWORD c: 4
LONGWORD d: 12
LONGWORD e: 8
LONGWORD f: 8
LONGWORD g: 24
If one assumes 16-bit int-type values are used for bit-fields, it would be perfectly proper to lay out the data thusly:
16-bit `int` with `c`,`b`,`a` - in that order
16-bit `int` with `d`
16-bit `int` with `f`,`e` - in that order
16-bit `int` with first 16 bits of `g`
16-bit `int` with last 8 bits of `g` - **after** 8 bits of padding.
And that's not even getting into endianness of the storage.

Questions like (and the point made in a comment about how to "designate, in order, the meaning of bits to data") inevitably boil down to: what are you trying to do with the data?
If you're declaring a data structure so that some C code can write to it, and other C code can read from it, you rarely if ever care about the byte order, or the bitfield order (or the padding, or the alignment, or any of that).
Where it gets tricky -- very tricky -- is when you try to take that data structure, as your C compiler laid it out in memory, and write it out to or read it in from the outside world. When you try to do that, you end up having to worry forever about type sizes, and byte order, and padding, and alignment, and the order in which bitfields are assigned.
In fact there are so many things to worry about, and nailing them all down is such a nuisance, that many people (myself included) recommend simply not trying to define data structures which can be directly read and written in this way, at all.
My memory is that compilers for big-endinan machines tend to lay out the bits in bitfields one way, and for little-endian the other way. But I can never remember which way is which. (And even if I thought I could remember, you shouldn't trust me.) If for some reason you care, you're going to have to do what I always do, which is to write some little test programs to construct some binary data and print it out in hex and figure out how it's done for the machine/compiler combination you're using today. (And of course you also have to decide what you're going to do about the possibility that your machine/compiler combination might change next week.)

Upon re-reading the documentation, I do not see any allowance for packing the bitfields in simply any order. There is indeed a specified ordering but it is implementation dependent in which way it is done. But it is still quite determinable. In short, from what I am seeing, it is packing up the bits by groups of 8 IN ORDER. The difference for our Little Endian compiler (or maybe some option somewhere) is that the concatenation of the bits puts the first-defined bits AFTER the next-defined bits (i.e. made the first defined less significant than the next-defined). For example:
a:3 = 111 (binary)
b:4 = 0000 (binary)
c:9 = 011111111 (binary)
Our Little Endian compiler (or, again, perhaps some other option) will take the 3 bits from 'a' and concatenate with b by adding 'a' to the end of 'b'. This, I believe, is opposite what Big Endian compilers would do which would put 'a' before the 'b'. So I'm speculating it's the endianness that does this, but ours would get 7 bits of 0000111 by making ba rather than ab. It then needs one more bit from c to create a full 8. It takes the least significant bit of 'c' which is a 1 and, again, the previous bits get tacked on to the end of that new bit. So we have 10000111. This byte, 0x87, is then stored to memory and it grabs another 8 bits. In this example the next 8 bits is 01111111 and so it stores that byte, 0x7F, after the 0x87. So in memory we now have 0x877F. Another method (likely Big Endian) would have ended up with 0xE0FF. The 0x877F which is now in memory, if interpreted as a word in Little Endian would be a value of 0x7F87 or, in binary, 0111111110000111. This happens to be the exact reverse of the data structure above concatenating 'cba'.
So let's do that same reverse ordering of the data I provided earlier:
(0x1700581001FFFFFF was meant to be parsed as below but I guess that might not have been obvious since it is a Big Endian construct I assumed)
LONGWORD a: 4 = 0x1
LONGWORD b: 4 = 0x7
LONGWORD c: 4 = 0x0
LONGWORD d: 12 = 0x058
LONGWORD e: 8 = 0x10
LONGWORD f: 8 = 0x01
LONGWORD g: 24 = 0xFFFFFF
With the Little Endian configuration we have, this could be interpreted as one giant structure with a value of 0xFFFFFF0110058071 by concatenating in the order gfedcba. If we store this back to memory in Little Endian format, we would get 0x7180051001FFFFFF which is the exact data I said we were seeing. Big Endian, in theory, would have done it in the order I assumed as obvious (0x1700581001FFFFFF) both as interpreted and stored.
Well, it makes sense to me. Hopefully it makes sense to someone else trying to figure out the same thing! I still don't get why they all say LONGWORD before them though...

Write 9 bits binary data in C

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you

No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.

You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.

You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

How to read 5 bytes to a meanful uint64_t in C?

I need to alloc an array of uint64_t[1e9] to count something, and I know the items are between (0,2^39).
So I want to calloc 5*1e9 bytes for the array.
Then I found that, if I want to make the uint64_t meanful, it is difficult to by pass the byte order.
There should be 2 ways.
One is to check the endianness first, so that we can memcpy the 5 bytes to either first or last of the whole 8 bytes.
The other is to use 5 bitshift and then bit-or them together.
I think the former should be faster.
So, under GCC or libc or GNU system, is there any header file to indicate whether the current system is Little Endian or Big Endian ? I know x86_64 is Little Endian, but I don't like to write a unportable code.
Of course any other idears are welcomed.
Add:
I need use the array to count many strings use D-left hashing. I plan to use 21bit for key and 18bit for counting.

When you say "faster"... how often is this code executed? 5 times <<8 plus an | probably costs less than 100ns. So if that code is executed 10'000 times, that adds up to 1 (one) second.
If the code is executed less times and you need more than 1 second to implement an endian-clean solution, you're wasting everyone's time.
That said, the solution to figure out the endianess is simple:
int a = 1;
char * ptr = (char*)&a;
bool littleEndian = *ptr == 1;
Now all you need it a big endian machine and a couple of test cases to make sure your memcpy solution works. Note that you need to need to call memcpy five times in one of the two cases to reorder the bytes.
Or you could simply shift and or five times...
EDIT I guess I misunderstood your question a bit. You're saying that you want to use the lowest 5 bytes (=40 bits) of the uint64_t as a counter, yes?
So the operation will be executed many, many times. Again, memcpy is utterly useless. Let's take the number 0x12345678 (32bit). In memory, that looks like so:
0x12 0x34 0x56 0x78 big endian
0x78 0x56 0x34 0x12 little endian
As you can see, the bytes are swapped. So to convert between the two, you must either use bit-shifting or byte swapping. memcpy doesn't work.
But that doesn't actually matter since the CPU will do the decoding for you. All you have to do is to shift the bits in the right place.
key = item & 0x1FFFFF
count = (item >>> 21)
to read and
item = count << 21 | key
to write. Now you just need to build the key from the five bytes and you're done:
key = (((hash[0] << 8) | (hash[1]<<8)) | ....
EDIT 2
It seems you have an array of 40bit ints and you want to read/write that array.
I have two solutions: Using memcpy should work as long as the data isn't copied between CPUs of different endianess (read: when you save/load the data to/from disk). But the function call might be too slow for such a huge array.
The other solution is to use two arrays:
int lower[];
unit8_t upper[]
that is: Save the bits 33-40 in a second array. To read/write the values, one shift+or is necessary.

If you treat numbers as numbers, and not an array of bytes, your code will be endianess-agnostic. Hence, I would go for the shift and or solution.
Having said that, I really didn't catch what you are trying to do? Do you really need one billion entries, each five bytes long? If the data you are sampling is sparse, you might get away with allocating far less memory.

Well, I just find the kernel headers come with <asm/byteorder.h>.
inline memcpy to a while(i<x+3){++*i=++*j} may still slower since cache operation is slower than registers.
another way for memcpy is:
union dat {
uint64_t a;
char b[8];
} d;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight