Writing number as raw binary to file

Writing number as raw binary to file - c

So I'm trying to write the value of a long as raw binary to a file with fwrite(), here's a test file:
#include<stdio.h>
int main() {
FILE* stream;
stream = fopen("write", "wb");
long number = 0xfff;
fwrite(&number, sizeof(long), 1, stream);
}
When opening the written file in a text editor, this is the output:
ff0f 0000 0000 0000
while I was expecting something like 0000 0000 0000 0fff.
How do I write the desired result correctly as raw binary?

Read about Endianness, which states how a bytes are arranged in a word (or double/quad word, et cetera) in a computer system.
I'm assuming you've coded and compiled this example on a X86 system, which is little-endian, so, the least significant bits COME FIRST. The opposite of that arrangement is called big-endian.
Now, it is clear that your objective in this exercise is to marshall (or pickle, depending on how your prefer your jargon) some bytes to be later retrieved, possibly by another program.
If you develop a program that uses fread() and and reads the data in the same way (using sizeof(long) so you don't read too much data) and in a machine with the same endianness, it will magically work, and the number you expect is gonna be back. But, if you compile and run the "read" tool in a machine with the opposite endianness, reading the same input file, your number will be garbled.
If your objective is to marshall data, you should be better off with a tool to help you marshall your bytes in a way that is endianness-agnostic, that is, a library that helps you get the data in the correct order. There are libraries out there that take care of that for you.

There's no problem. You're seeing it as ff0f 0000 0000 0000 cause of the endian of the machine!
Try using fread() instead!

as other have pointed out in the comments, this is an endianness "issue". That it, it is only an issue if you are going to run your software on systems with an other endianness.
This very useful resource is the first result for the "c endian" google search, at least for me. I hope it helps you.
Edit: I will dig a bit more into the details below.
To determine what is the endianness of the machine you are currently running on, write a known value (for example, 0xAABBCCDD) into memory, then take a pointer to that value and cast it to a char* or other 1-byte data type. Read the same value again, byte by byte, and compare the ordering with what you wrote. If they are the same, you are on a big endian machine. If not... Then you could be on a little (more probable) or middle (less probable) endian machine. You can in either case generate a "swap map" for reordering the bytes; which is the reason why I chose four different byte values in the above example.
But, how to swap those values? An easy way, as indicated here, is to use bit shifts with masks. You can also use tables and swap their values around.

Related

Appending bits to a file in C

I'm working on a CRC32 program for a project and I've hit another stumbling block. We get a 32-bit UINT back from the ASM code we have, and in order to test the algorithm, we need to append those exact bits to the end of the text file we threw into the algorithm, and we're kind of at a loss of how to do that. We tried fprint, but that transformed the int into a char and changed the bits. Same deal with fwrite. Is there some way to do this with fwrite we're missing? Any help would be appreciated.

You have to open the file in binary mode.
It's also possible you have to flip the bytes (if the ASM code returns them in a different endianness than expected. If the target is big-endian, than htonl will work).

embedding chars in int and vice versa

I have smart card on which I can store bytes (multiple of 16).
If I do: Save(byteArray, length) then I can do Receive(byteArray,length)
and I think I will get byte array in the same order I stored.
Now, I have such issue. I realized if I store integer on this card,
and some other machine (with different endianness) reads it, it may get wrong data.
So, I thought maybe solution is I always store data on this card, in a little
endian way, and always retrieve the data in a little endian way (I will write apps to read and write, so I am free to interpret numbers as I like.). Is this possible?
Here is something I have come up with:
Embed integer in char array:
int x;
unsigned char buffer[250];
buffer[0] = LSB(x);
buffer[1] = LSB(x>>8);
buffer[2] = LSB(x>>16);
buffer[3] = LSB(x>>24);
Important is I think that LSB function should return the least significant byte regardless of the endiannes of the machine, how would such LSB function look like?
Now, to reconstruct the integer (something like this):
int x = buffer[0] | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24);
As I said I want this to work, regardless of the endiannes of the machine who reads it and writes it. Will this work?

The 'LSB' function may be implemented via a macro as below:-
#define LSB(x) ((x) & 0xFF)
Provided x is unsigned.

If your C library is posix compliant, then you have standard functions available to do exactly what you are trying to code. ntohl, ntohs, htonl, htons (network to host long, network to host short, ...). That way you don't have to change your code if you want to compile it for a big-endian or for a little-endian architecture. The functions are defined in arpa/inet.h (see http://linux.die.net/man/3/ntohl).

I think the answer for your question is YES, you can write data on a smart card such that it is universally (and correctly) read by readers of both big endian AND little endian orientation. With one big caveat: it would be incumbent on the reader to do the interpretation, not your smart card interpreting the reader, would it not? That is, as you know there are many routines to determine endianess (1, 2, 3). But it is the readers that would have to contain code to test endianess, not your card.
Your code example works, but I am not sure it would be necessary given the nature of the problem as it is presented.
By the way, HERE is a related post.

GIF LZW decompression hints?

I've read through numerous articles on GIF LZW decompression, but I'm still confused as to how it works or how to solve, in terms of coding, the more fiddly bits of coding.
As I understand it, when I get to the byte stream in the GIF for the LZW compressed data, the stream tells me:
Minimum code size, AKA number of bits the first byte starts off with.
Now, as I understand it, I have to either add one to this for the clear code, or add two for clear code and EOI code. But I'm confused as to which of these it is?
So say I have 3 colour codes (01, 10, 11), with EOI code assumed (as 00) will the byte that follows the minimum code size (of 2) be 2 bits, or will it be 3 bits factoring in the clear code? Or is the clear code/EOI code both already factored into the minimum size?
The second question is, what is the easiest way to read in dynamically sized bits from a file? Because reading an odd numbers of bits (3 bits, 12 bits etc) from an even numbered byte (8) sounds like it could be messy and buggy?

To start with your second question: yes you have to read the dynamically sized bits from an 8bit bytestream. You have to keep track of the size you are reading, and the number of unused bits left from previous read operations (used for correctly putting the 'next byte' from the file).
IIRC there is a minimum code size of 8 bits, which would give you a clear code of 256 (base 10) and an End Of Input of 257. The first stored code is then 258.
I am not sure why you did not looked up the source of one of the public domain graphics libraries. I know I did not because back in 1989 (!) there were no libraries to use and no internet with complete descriptions. I had to implement a decoder from an example executable (for MS-DOS from Compuserve) that could display images and a few GIF files, so I know that can be done (but it is not the most efficient way of spending your time).

Reading a binary file bit by bit

I know the function below:
size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
It only reads byte by byte, my goal is to be able to read 12 bits at a time and then take them into an array. Any help or pointers would be greatly appreciated!

Adding to the first comment, you can try reading one byte at a time (declare a char variable and write there), and then use the bitwise operators >> and << to read bit by bit. Read more here: http://www.cprogramming.com/tutorial/bitwise_operators.html

Many years ago, I wrote some I/O routines in C for a Huffman encoder. This needs to be able to read and write on the granularity of bits rather than bytes. I created functions analogous to read(2) and write(2) that could be asked to (say) read 13 bits from a stream. To encode, for example, bytes would be fed into the coder and variable numbers of bits would emerge the other side. I had a simple structure with a bit pointer into the current byte being read or written. Every time it went off the end, it flushed the completed byte out and reset the pointer to zero. Unfortunately that code is long gone, but it might be an idea to pull apart an open-source Huffman coder and see how the problem was solved there. Similarly, base64 coding takes 3 bytes of data and turns them into 4 (or vice versa).

I've implemented a couple of methods to read/write files bit by bit. Here they are. Whether it is viable or not for your use case, you have to decide that for yourself. I've tried to make the most readable, optimized code i could, not being a seasoned C developer (for now).
Internally, it uses a "bitCursor" to store information about previous bits that don't yet fit a full byte. It has who data fields: d stores data and s stores size, or the amount of bits stored in the cursor.
You have four functions:
newBitCursor(), which returns a bitCursor object with default values
{0,0}. Such a cursor is needed at the beginning of a sequence of
read/write operations to or from a file.
fwriteb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which writes sizeb rightmost bits of the value stored in ptr to fp.
fcloseb(FILE *fp, bitCursor *c), which writes a remaining byte, if
the previous writes did not exactly encapsulate all data needed to
be written, that is probably almost always the case...
freadb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which reads sizeb bits and bitwise ORs them to *ptr. (it is, therefore, your responsibility to init *ptr as 0)
More info is provided in the comments. Have Fun!
Edit: It has come to my knowledge today that when i made that i assumed Little Endian! :P Oops! It's always nice to realize how much of a noob i still am ;D.
Edit: GNU's Binary File Descriptors.

Read the first two bytes from your a_file file pointer and check the bits in the least or greatest byte — depending on the endianness of your platform (x86 is little-endian) — using bitshift operators.
You can't really put bits into an array, as there isn't a datatype for bits. Rather than keeping 1's and 0's in an array, which is inefficient, it seems cheaper just to keep the two bytes in a two-element array (say, of type unsigned char *) and write functions to map those two bytes to one of 4096 (2^12) values-of-interest.
As a complication, on subsequent reads, if you want to fread through the pointer every 12 bits, you would read only one byte, using the left-over bits from the previous read to build a new 12-bit value. If there are no leftovers, you would need to read two bytes.
Your mapping functions would need to address the second case where bits were used from previous read, because the two bytes would need different mapping. To do this efficiently, a modulus on a read-counter could be used to swap between two mappings.

read 2 bytes and do bit wise operations will get it done for the next time read 2nd bytes onwards apply the bit-wise operations will get back you expected . . . .

For your problem you can see this demo program which read 2byte but actual information is only 12 bit.As well as this type of things are used it bit wise access.
fwrite() is a standard library function which take the size argument as byte and of type int.So it is not possible exactly 12bit read.If the file you create then create like below as well as read as below it solve your problem.
If that file is special file which not written by you then follow the standard provided for that file to read I think they also writing like this only.Or you can provide the axact where it I may help you.
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data:12;
}NODE;
int main()
{
FILE *fp;
fp=fopen("t","w");
NODE.data=1024;
printf("%d\n",NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
NODE.data=0;
NODE.data=2048;
printf("%d\n",(unsigned)NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
fclose(fp);
fp=fopen("t","r");
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fclose(fp);
}

Understanding `read, write` system calls in Unix

My Systems Programming project has us implementing a compression/decompression program to crunch down ASCII text files by removing the zero top bit and writing the output to a separate file, depending on whether the compression or decompression routine is working. To do this, the professor has required us to use the binary files and Unix system calls, which include open, close, read, write, etc.
From my understanding of read and write, it reads the binary data by defined byte chunks. However, since this data is binary, I'm not sure how to parse it.
This is a stripped down version of my code, minus the error checking:
void compress(char readFile[]){
char buffer[BUFFER] //buffer size set to 4096, but tunable to system preference
int openReadFile;
openReadFile= open(readFile, O_RDONLY);
}
If I use read to read the data into buffer, will the data in buffer be in binary or character format? Nothing I've come across addresses that detail, and its very relevant to how I parse the contents.

read() will read the bytes in without any interpretation (so "binary" mode).
Being binary, and you want to access the individual bytes, you should use a buffer of unsigned char
unsigned char buffer[BUFFER]. You can regard char/unsigned char as bytes, they'll be 8 bits on linux.
Now, since what you're dealing with is 8 bit ascii compressed down to 7 bit, you'll have to convert those 7 bits into 8 bits again so you can make sense of the data.
To explain what's been done - consider the text Hey .That's 3 bytes. The bytes will have 8 bits each, and in ascii that's the bit patterns :
01001000 01100101 01111001
Now, removing the most significant bit from this, you shift the remaining bits one bit to the left.
X1001000 X1100101 X1111001
Above, X is the bit to removed. Removing those, and shifting the others you end up with bytes with this pattern:
10010001 10010111 11001000
The rightmost 3 bits is just filled in with 0. So far, no space is saved though. There's still 3 bytes.
With a string of 8 bytes, we'd saved 1 byte as that would compress down to 7 bytes.
Now you have to do the reverse on the bytes you've read back in

I'll quote the manual of the fopen function (that is based on the open function/primitive) from http://www.kernel.org/doc/man-pages/online/pages/man3/fopen.3.html
The mode string can also include the
letter 'b' either as a last character
or as a character between the
characters in any of the two-character
strings described above. This is
strictly for compatibility with C89
and has no effect; the 'b' is ignored
on all POSIX conforming systems,
including Linux
So even the high level function ignores the mode :-)

It will read the binary content of the file and load it in the memory buffer points to. Of course, a byte is 8 bits, and that's why a char is 8 bits, so, if the file was a regular plain text document you'll end up with a printable string (be careful with how it ends, read returns the number of bytes (characters in a ascii-encoded plain text file) read).
Edit: in case the file you're reading isn't a text file, and is a collection of binary representations, you can make the type of the buffer the one of the file, even if it's a struct.