Write 9 bits binary data in C - c

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you

No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.

You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.

You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

Related

Data layouts used by C compilers (the alignment concept)

Below is an excerpt from the red dragon book.
Example 7.3. Figure 7.9 is a simplification of the data layout used by C compilers for two machines that we call Machine 1 and Machine 2.
Machine 1 : The memory of Machine 1 is organized into bytes consisting of 8 bits each. Even though every byte has an address, the instruction set favors short integers being positioned at bytes whose addresses are even, and integers being positioned at addresses that are divisible by 4. The compiler places short integers at even addresses, even if it has to skip a byte as padding in the process. Thus, four bytes, consisting of 32 bits, may be allocated for a character followed by a short integer.
Machine 2: each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word. The strong word orientation of the instruction set of Machine 2 has led the compiler to allocate a complete word at a time, even when fewer bits would suffice to represent all possible values of that type; e.g., only 8 bits are needed to represent a character. Hence, under alignment, Fig. 7.9 shows 64 bits for each type. Within each word, the bits for each basic type are in specified positions. Two words consisting of 128 bits would be allocated for a character followed by a short integer, with the character using only 8 of the bits in the first word and the short integer using only 24 of the bits in the second word. □
I found about the concept of alignment here ,here and here. What I could understand from them is as follows: In word addressable CPUs (where size is more than a byte), there certain paddings are introduced in the data objects, such that CPU can efficiently retrieve data from the memory with minimum no. of memory cycles.
Now the Machine 1 here is actually a byte address one. And the conditions in the Machine 1 specification are probably more difficult than a simple word addressable machine having word size of say 4 bytes. In such a 64 bit machine, we need to make sure that our data items are just word aligned ,no more difficulty. But how to find the alignment in systems like Machine 1 (as given in the table above) where the simple concept of word alignment does not work, because it is byte addressable and has much more difficult specifications.
Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).
"each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word." - Moreover how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)
First of all, the machine 1 is not special at all - it is exactly like a x86-32 or 32-bit ARM.
Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).
No, this isn't true. Alignment means that the address of the lowest addressable byte in the object must be divisible by the given number of bytes.
Additionally, with C, it is also true that within arrays sizeof (ElementType) will need to be greater than or equal to the alignment of each member and sizeof (ElementType) be divisible by alignment, thus the footnote a. Therefore on the latter computer:
struct { char a, b; }
might have sizeof 16 because the characters are in distinct addressable words, whereas
struct { char a[2]; }
could be squeezed into 8 bytes.
how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)
As for the character pointers, the 6 bits is bogus. 3 bits are needed to choose one of the 8 bytes within the 8-byte words, so this is an error in the book. An ordinary byte would select just a word with 24 bits, and a character (a byte) pointer would select the word with 24 bits, and one of the 8-bit bytes inside the word with 3 bits.

Storing individual bits in memory

So I want to store random bits of length 1 to 8 (a BYTE) in memory. I know that computer aren't efficient enough to store individual bits in memory and that we must store at least a BYTE data on most modern machines. I have been doing some research on this but haven't come across any useful material. I need to find a way to store these bits so that, for example, when reading the bits back from the memory, 0 must NOT be evaluated as 00, 0000 or 00000000. To further explain, for example, 010 must NOT be read back or evaluated for that matter, as 00000010. Numbers should be unique based on the value as well as their cardinality.
Some more examples;
1 ≠ 00000001
10 ≠ 00000010
0010 ≠ 00000010
10001 ≠ 00010001
And so on...
Also one thing i want to point out again is that the bit size is always between 1 and 8 (inclusive) and is NOT a fixed number. I'm using C for this problem.
So you want to store bits in memory and read them back without knowing how long they are. This is not possible. (It's not possible with bytes either)
Imagine if you could do this. Then we could compress a file by, for example, saying that "0" compresses to "0" and "1" compresses to "00". After this "compression" (which would actually make the file bigger) we have a file with only 0's in it. Then, we compress the file with only 0's in it by writing down how many 0's there are. Amazing! Any 2GB file compresses to only 4 bytes. But we know it's impossible to compress every 2GB file into 4 bytes. So something is wrong with this idea.
You can read several bits from memory but you need to know how many you are reading. You can also do it if you don't know how many bits you are reading, but the combinations don't "overlap". So if "01" is a valid combination, then you can't have "010" because that would overlap "01". But you could have "001". This is called a prefix code and it is used in Huffman coding, a type of compression.
Of course, you could also save the length before each number. So you could save "0" as "0010" where the "001" means how many bits long the number is. With 3-digit lengths, you could only have up to 7-bit numbers. Or 8-bit numbers if you subtract 1 from the length, in which case you can't have zero-bit numbers. (so "0" becomes "0000", "101" becomes "010101", etc)
You can control bits using bit shift operators or bit fields
Make sure you understand the endianess concept, that is machine dependent. and keep in mind that bit fields needs a struct, and a struct uses a minimum of 4 bytes.
And bit fields can be very tricky.
Good luck!
If you just need to make sure a given binary number is evaluated properly, then you have two choices I can think of. You could store all of the amount of bits of each numbers alongside with the given number, which wouldn't be so efficient.
But you could also store all the binary numbers as being 8-bit, then when processing each individual number, pass through all of its digits to find its length. That way you just store the lenght of a single number at a time.
Here is some quick code, hopefully it's clear:
Uint8 rightNumber = 2; //Which is 10 in binary, or 00000010
int rightLength = 2; //Since it is 2 bits long
Uint8 bn = mySuperbBinaryValueIWantToTest;
int i;
for(i = 7; i > 0; i--)
{
if((bn & (1 << i)) != 0)break;
}
int length = i + 1;
if(bn == rightNumber && length == rightLength) printf("Correct number");
else printf("Incorrect number");
Keep in mind you can also use the same technique to calculate the amount of bits inside the right value instead of precomputing it. If it's to arbitrary values you are comparing, the same can also work.
Hope this helped, if not, feel free to criticize/re-explain your problem

Why does an int take up 4 bytes in c or any other language? [duplicate]

This question already has answers here:
size of int variable
(6 answers)
Closed 5 years ago.
This is a bit of a general question and not completely related to the c programming language but it's what I'm on studying at the moment.
Why does an integer take up 4 bytes or How ever many bytes dependant on the system?
Why does it not take up 1 byte per integer?
For example why does the following take up 8 bytes:
int a = 1;
int b = 1;
Thanks
I am not sure whether you are asking why int objects have fixed sizes instead of variable sizes or whether you are asking why int objects have the fixed sizes they do. This answers the former.
We do not want the basic types to have variable lengths. That makes it very complicated to work with them.
We want them to have fixed lengths, because then it is much easier to generate instructions to operate on them. Also, the operations will be faster.
If the size of an int were variable, consider what happens when you do:
b = 3;
b += 100000;
scanf("%d", &b);
When b is first assigned, only one byte is needed. Then, when the addition is performed, the compiler needs more space. But b might have neighbors in memory, so the compiler cannot just grow it in place. It has to release the old memory and allocate new memory somewhere.
Then, when we do the scanf, the compiler does not know how much data is coming. scanf will have to do some very complicated work to grow b over and over again as it reads more digits. And, when it is done, how does it let you know where the new b is? The compiler has to have some mechanism to update the location for b. This is hard and complicated and will cause additional problems.
In contrast, if b has a fixed size of four bytes, this is easy. For the assignment, write 3 to b. For the addition, add 100000 to the value in b and write the result to b. For the scanf, pass the address of b to scanf and let it write the new value to b. This is easy.
The basic integral type int is guaranteed to have at least 16 bits; At least means that compilers/architectures may also provide more bits, and on 32/64 bit systems int will most likely comprise 32 bits or 64 bits (i.e. 4 bytes or 8 bytes), respectively (cf, for example, cppreference.com):
Integer types
... int (also accessible as signed int): This is the most optimal
integer type for the platform, and is guaranteed to be at least 16
bits. Most current systems use 32 bits (see Data models below).
If you want an integral type with exactly 8 bits, use the int8_t or uint8_t.
It doesn't. It's implementation-defined. A signed int in gcc on an Atmel 8-bit microcontroller, for example, is a 16-bit integer. An unsigned int is also 16-bits, but from 0-65535 since it's unsigned.
The fact that an int uses a fixed number of bytes (such as 4) is a compiler/CPU efficiency and limitation, designed to make common integer operations fast and efficient.
There are types (such as BigInteger in Java) that take a variable amount of space. These types would have 2 fields, the first being the number of words being used to represent the integer, and the second being the array of words. You could define your own VarInt type, something like:
struct VarInt {
char length;
char bytes[]; // Variable length
}
VarInt one = {1, {1}}; // 2 bytes
VarInt v257 = {2, {1,1}}; // 3 bytes
VarInt v65537 = {4, {1,0,0,1}}; // 5 bytes
and so on, but this would not be very fast to perform arithmetic on. You'd have to decide how you would want to treat overflow; resizing the storage would require dynamic memory allocation.

Writing integer values to a file in binary using C

I am trying to write 9 bit numbers to a binary file.
For example, i want to write the integer value: 275 as 100010011 and so on. fwrite only allows one byte to be written at a time and I am not sure how to manipulate the bits to be able to do this.
You have to write a minimum of two bytes to store a 9-bits value. An easy solution is to use 16 bits per 9 bits value
Choose a 16 bits unsigned type, eg uint16_t and store the 2 bytes
uint16_t w = 275;
fwrite(&w, 1, 2, myfilep);
Reading the word w, ensure it actually uses only its 9 first bits (bits 0~8)
w &= 0x1FF;
Note that you might have endianness issues if you read the file on another system that doesn't have the same endianness as the system that wrote the word.
You could also optimize that solution using 9 bits of a 16 bits word, then using the remaining 7 bits to store the first 7 bits of the next 9 bits value etc...
See this answer that explains how to work with bit shifting in C.

C - Type Name : Number?

I was wondering what the following code is doing exactly? I know it's something to do with memory alignment but when I ask for the sizeof(vehicle) it prints 20 but the struct's actual size is 22. I just need to understand how this works, thanks!
struct vehicle {
short wheels:8;
short fuelTank : 6;
short weight;
char license[16];
};
printf("\n%d", sizeof(struct vehicle));
20
Memory will be allocated as (assuming memory word size is of 8 bits)
struct vehicle {
short wheels:8; // 1 byte
short fuelTank : 6;
// padd 2 bits to make fuelTank of 1 byte.
short weight; // 2 bytes.
char license[16]; // 16 bytes.
};
1 + 1 + 2 + 16 = 20 bytes.
Consider a machine with a word size of 32bit. The two first fields fit in a whole 16bit word as they occupy 8 + 6 = 14 bits. The second field, while not a bitfield (doesn't have the :<number> thing to allocate space in bits) can fit another 16 bits word to complete a 32 bit word, so the three first fields can pack in a 32bit word (4 bytes) if the architecture allows to access the memory in 16 bit quantities. Finaly, if you add 16 characters to that, this gives the 20 bytes that sizeof operator sends to printf.
Why do you assume the sizeof (struct vehicle) is 22 bytes? You allowed the compiler to print it and it said it's 20. Compilers are free to pad (or not) the structures to achieve better performance. That's an architecture dependency, and as you have not said architecture and compiler used, it is not possible to go further.
For example, 32bit intel arch allows to pad words at even boundaries without performance penalties, so this is a good selection in order to save memory. On other architectures, perhaps it's not allowed to use 16bit integers and data must be padded to fit the third field (leading to 22 bytes for the whole structure)
The only warranty you have when sizing data is that the compiler must allocate enough space to fit everything in an efficient way, so the only thing you can assume from that declaration is that it will occupy at least the minimum space to represent one field of 8 bit, other of 6, a complete short (I'll assume a short is 16 bit) and 16 characters (assuming 8 bits per char) it ammounts to 8 + 6 + 16 + 16*8 = 158 bits minimum.
Suppose we are writing a compiler for D. Knuth MIX machine. As it's stated in his book Fundamental Algorithms, this machine has an unspecified byte size of 64..100 bytes, requiring five to construct one addressable word (plus a binary sign). If you had a byte size independent compiler (one that compiles for any MIX machine, without assumptions of byte size) you have to use no more than 64 possible values per byte, leading to 6 bit per byte. You then would assume the second field fills one complete byte (and the sign drawn from the word it belongs to) and the first field needs two complete bytes (using half of the values for negative values) The third field might be in the second word, filling three complete bytes (6*3 = 18) and the sign of that word. The next 16 chars can begin on the next word, summing up to five complete words, so the whole structure will have 1 + 1 + 4 = 6 words, or 30 bytes. But if you want to handle effectively three signed fields, you'll need three complete words for the three fields (as each has a sign field only) leading to 7 words or 35 bytes.
I have suggested this example because of the particular characteristics of this architecture, that makes one to think on not so uncommon architectures that some time ago where in common use (the first machines ever built where not binary based, like some of these MIX machines)
Note
You can try to print the actual offsets of the fields, to see where in the structure are located and see where the compiler is padding.
#define OFFSET(Typ, field) ((int)&((Typ *)0)->field)
(Note, edited)
This macro will tell you the offset as an int. Use it as OFFSET(struct vehicle, weight) or OFFSET(struct vehicle, license[3])
Note
I had to edit the last macro definition as it complains on some architectures as the conversion of pointer -> int is not always possible (on 64bit architectures, it looses some bits) so it's better to compute the difference of two pointers, which is a proper size_t value, than to convert it directly from pointer.
#define OFFSET(Typ, field) ((char *)&((Typ *)0)->field - (char *)0)

Resources