Reading a binary file bit by bit - c

I know the function below:
size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
It only reads byte by byte, my goal is to be able to read 12 bits at a time and then take them into an array. Any help or pointers would be greatly appreciated!

Adding to the first comment, you can try reading one byte at a time (declare a char variable and write there), and then use the bitwise operators >> and << to read bit by bit. Read more here: http://www.cprogramming.com/tutorial/bitwise_operators.html

Many years ago, I wrote some I/O routines in C for a Huffman encoder. This needs to be able to read and write on the granularity of bits rather than bytes. I created functions analogous to read(2) and write(2) that could be asked to (say) read 13 bits from a stream. To encode, for example, bytes would be fed into the coder and variable numbers of bits would emerge the other side. I had a simple structure with a bit pointer into the current byte being read or written. Every time it went off the end, it flushed the completed byte out and reset the pointer to zero. Unfortunately that code is long gone, but it might be an idea to pull apart an open-source Huffman coder and see how the problem was solved there. Similarly, base64 coding takes 3 bytes of data and turns them into 4 (or vice versa).

I've implemented a couple of methods to read/write files bit by bit. Here they are. Whether it is viable or not for your use case, you have to decide that for yourself. I've tried to make the most readable, optimized code i could, not being a seasoned C developer (for now).
Internally, it uses a "bitCursor" to store information about previous bits that don't yet fit a full byte. It has who data fields: d stores data and s stores size, or the amount of bits stored in the cursor.
You have four functions:
newBitCursor(), which returns a bitCursor object with default values
{0,0}. Such a cursor is needed at the beginning of a sequence of
read/write operations to or from a file.
fwriteb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which writes sizeb rightmost bits of the value stored in ptr to fp.
fcloseb(FILE *fp, bitCursor *c), which writes a remaining byte, if
the previous writes did not exactly encapsulate all data needed to
be written, that is probably almost always the case...
freadb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which reads sizeb bits and bitwise ORs them to *ptr. (it is, therefore, your responsibility to init *ptr as 0)
More info is provided in the comments. Have Fun!
Edit: It has come to my knowledge today that when i made that i assumed Little Endian! :P Oops! It's always nice to realize how much of a noob i still am ;D.
Edit: GNU's Binary File Descriptors.

Read the first two bytes from your a_file file pointer and check the bits in the least or greatest byte — depending on the endianness of your platform (x86 is little-endian) — using bitshift operators.
You can't really put bits into an array, as there isn't a datatype for bits. Rather than keeping 1's and 0's in an array, which is inefficient, it seems cheaper just to keep the two bytes in a two-element array (say, of type unsigned char *) and write functions to map those two bytes to one of 4096 (2^12) values-of-interest.
As a complication, on subsequent reads, if you want to fread through the pointer every 12 bits, you would read only one byte, using the left-over bits from the previous read to build a new 12-bit value. If there are no leftovers, you would need to read two bytes.
Your mapping functions would need to address the second case where bits were used from previous read, because the two bytes would need different mapping. To do this efficiently, a modulus on a read-counter could be used to swap between two mappings.

read 2 bytes and do bit wise operations will get it done for the next time read 2nd bytes onwards apply the bit-wise operations will get back you expected . . . .

For your problem you can see this demo program which read 2byte but actual information is only 12 bit.As well as this type of things are used it bit wise access.
fwrite() is a standard library function which take the size argument as byte and of type int.So it is not possible exactly 12bit read.If the file you create then create like below as well as read as below it solve your problem.
If that file is special file which not written by you then follow the standard provided for that file to read I think they also writing like this only.Or you can provide the axact where it I may help you.
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data:12;
}NODE;
int main()
{
FILE *fp;
fp=fopen("t","w");
NODE.data=1024;
printf("%d\n",NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
NODE.data=0;
NODE.data=2048;
printf("%d\n",(unsigned)NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
fclose(fp);
fp=fopen("t","r");
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fclose(fp);
}

Related

C How fread reads different data blocks in a binary file?

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.
I've read about fread() and on the web.
fread(&(targetObj->data), sizeof(TestObj), 1, file);
Now, file is a big binary file with lots of data in it.
What I want to know is how I can do this in C#.
Let me explain:
I think that line of code does this:
TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data
What I don't understand is:
I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?
Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#
For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?
and if I do
fread(&(targetObj->data), sizeof(TestObj), 5, file);
it is expected that testObj->data is an array of 5 unsigned shorts?
And the code will read the first 5 unsigned shorts that it finds in the whole binary file?
I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file
fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).
So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).
In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.
fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read
From man fread
size_t
fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612
And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

fread and endianness: will fread(pointer,sizeof(some),1,file pointer) have the same result?

Suppose I have:
FILE* fp = fopen("myfile.bin", "r");
char something[30];
fread(something,sizeof(char)*30,1,fp);
For fread(something,sizeof(char),30,fp), endianness matters, but for the above code, does endianness matter?
For fread(something,sizeof(char),30,fp), endianness matters, but for
the above code, does endianness matter?
Why do you think those two snippets are different as far as endianness is concerned? To me they both read 30 bytes of data - albeit slightly differently, one specifies in arguments to read one element of size 30 and the other one, specifies it other way around.
But again till now you have just read some number of bytes. Now how you interpret these bytes is where endianness might come in. Read more on endianness: here. Then it depends if you just read some ASCII text endianness might not apply, if you read integer written in binary way endianness might be concern.
ps Also you might want to specify rb in fopen
Endianess won't matter for either case when reading into a char array. For cases where endianess does matter, how you specify the total size won't matter.
It's more of a historical artifact that fread() takes both a size and an nmemb parameter. The only difference you're likely to see is in the return value, which is a count in size units.
Excuse the self-plug, but you can find some source code analysis for glibc in this answer. The gist of it is that size and nmemb are simply multiplied together, with the separated values only being used to calculate the return value.

Printing bits in a buffer with C?

What is the best way in C to write and read to/from a file a specific amount of bits at a time say the first 16 bits or 12 of the lower half of an integer. I can't seem to find any threads or documentation on it other than to use fwrite. I may not be sure but I do not think I can write a specific amount of bits and would need a buffer but can anyone direct me in the correct way to perform this?
With the available APIs, the smallest amount of info you can write in a file at a time is 1 byte. To achieve what you want, you have to read the byte from the file, modify it using bitwise operators and write it back to the file. In case you are writing data as a stream, you would have to call fwrite once each byte is complete or if you are done. You would then have to pad the last byte with zeros or ones whichever is more appropriate in view of the fact that the file system will keep track of the file size in bytes. To do otherwise would require a file system that provides bit level operations and the corresponding support at the operating system level.
In fact, the smallest physical amount of data that can be written to a disk is a sector of 512 bytes and more recently 4096 bytes. At the file system level, several sectors are bundled together into a block. The operating system "hides" this fact and allows us to deal with individual bytes.
What seems to make this question sound stupid is the fact that we are so used to the current file abstraction that it has become like second nature. However, behind the scenes a lot is going on to support this illusion.

What's a good coding style for reading different bits of data from a binary file in C?

I'm novice programmer and am writing a simple wav-player in C as a pet project. Part of the file loading process requires reading specific data (sampling rate, number of channels,...) from the file header.
Currently what I'm doing is similar to this:
Scan for a sequence of bytes and skip past it
Read 2 bytes into variable a
Check value and return on error
Skip 4 bytes
Read 4 bytes into variable b
Check value and return on error
...and so on. (code see: https://github.com/qgi/Player/blob/master/Importer.c)
I've written a number of helper functions to do the scanning/skipping/reading bit. Still I'm repeating the reading, checking, skipping part several times, which doesn't seem to be neither very effective nor very smart. It's not a real issue for my project, but as this seems to be quite a common task when handling binary files, I was wondering:
Is there some kind of a pattern on how to do this more effectively with cleaner code?
Most often, people define structs (often with something like #pragma pack(1) to assure against padding) that matches the file's structures. They then read data into an instance of that with something like fread, and use the values from the struct.
The cleanest option that I've come across is the scanf-like function unpack presented by Kernighan & Pike on page 219 of The Practice of Programming, which can be used like
// assume we read the file header into buf
// and the header consists of magic (4 bytes), type (2) and length (4).
// "l" == 4 bytes (long)
// "s" == 2 bytes (short)
unpack(buf, "lsl", &magic, &type, &length);
For efficiency using a buffer of say size 4096 to read into and then doing your parsing on the data in the buffer would be more efficient, and ofcource doing a single scan parsing where you only go forward is most efficient.

Understanding `read, write` system calls in Unix

My Systems Programming project has us implementing a compression/decompression program to crunch down ASCII text files by removing the zero top bit and writing the output to a separate file, depending on whether the compression or decompression routine is working. To do this, the professor has required us to use the binary files and Unix system calls, which include open, close, read, write, etc.
From my understanding of read and write, it reads the binary data by defined byte chunks. However, since this data is binary, I'm not sure how to parse it.
This is a stripped down version of my code, minus the error checking:
void compress(char readFile[]){
char buffer[BUFFER] //buffer size set to 4096, but tunable to system preference
int openReadFile;
openReadFile= open(readFile, O_RDONLY);
}
If I use read to read the data into buffer, will the data in buffer be in binary or character format? Nothing I've come across addresses that detail, and its very relevant to how I parse the contents.
read() will read the bytes in without any interpretation (so "binary" mode).
Being binary, and you want to access the individual bytes, you should use a buffer of unsigned char
unsigned char buffer[BUFFER]. You can regard char/unsigned char as bytes, they'll be 8 bits on linux.
Now, since what you're dealing with is 8 bit ascii compressed down to 7 bit, you'll have to convert those 7 bits into 8 bits again so you can make sense of the data.
To explain what's been done - consider the text Hey .That's 3 bytes. The bytes will have 8 bits each, and in ascii that's the bit patterns :
01001000 01100101 01111001
Now, removing the most significant bit from this, you shift the remaining bits one bit to the left.
X1001000 X1100101 X1111001
Above, X is the bit to removed. Removing those, and shifting the others you end up with bytes with this pattern:
10010001 10010111 11001000
The rightmost 3 bits is just filled in with 0. So far, no space is saved though. There's still 3 bytes.
With a string of 8 bytes, we'd saved 1 byte as that would compress down to 7 bytes.
Now you have to do the reverse on the bytes you've read back in
I'll quote the manual of the fopen function (that is based on the open function/primitive) from http://www.kernel.org/doc/man-pages/online/pages/man3/fopen.3.html
The mode string can also include the
letter 'b' either as a last character
or as a character between the
characters in any of the two-character
strings described above. This is
strictly for compatibility with C89
and has no effect; the 'b' is ignored
on all POSIX conforming systems,
including Linux
So even the high level function ignores the mode :-)
It will read the binary content of the file and load it in the memory buffer points to. Of course, a byte is 8 bits, and that's why a char is 8 bits, so, if the file was a regular plain text document you'll end up with a printable string (be careful with how it ends, read returns the number of bytes (characters in a ascii-encoded plain text file) read).
Edit: in case the file you're reading isn't a text file, and is a collection of binary representations, you can make the type of the buffer the one of the file, even if it's a struct.

Resources