Difference between binary and text file in terms of storage? - c

I want to ask few questions about bits and bytes as I am very confused.
For example, suppose I have a short int x = 345;. I know that short take 16 bits, so when I write the short in a file it is written as char '3' , '4' ,'5' each containing 1 bytes so total of 24 bits.
Is there a way to write the number (short integer) in file as short integer taking 16 bits?
Also am I right in my concept? What will be the difference in writing to my file if the file is binary or text in terms of bytes?

Yes, there is a way.
uint16_t sh = 345;
fwrite(&sh, sizeof(sh), 1, fp);
In the case you mentioned 345 is written as text (for example ASCII if that's what you use). In the example I posted, the binary representation of sh is written in the file and it will take only 2 bytes.
What will be the difference in writing to my file if the file is
binary or text in terms of bytes?
Text write (fprintf)
0000000: 00110011 00110100 00110101
3 4 5
Binary write (fwrite)
0000000: 01011001 00000001
#Little endian. Read as: 00000001 01011001 = 345
If interoperabillity is an issue (i.e. you want to send the file to another machine) the text format is a superior choice as it's portable.

If you write the value as a string, it will occupy at least three bytes for the three digits; there would often be a newline or space as well to mark the end of the value.
Yes, you can write the value as 2 bytes. One way would be:
fwrite(&x, sizeof(x), 1, fp);
The difference between binary and text is that you can transport the text between different types of machine with almost complete impunity and all the machines will interpret the data the same way. The binary file can only be interpreted on machines that have the same endian-ness (big-endian or non-Intel vs little-endian or Intel). On the other class of machines, you have to swap the order of the bytes to get the binary data interpreted correctly.

Related

Writing integer values to a file in binary using C

I am trying to write 9 bit numbers to a binary file.
For example, i want to write the integer value: 275 as 100010011 and so on. fwrite only allows one byte to be written at a time and I am not sure how to manipulate the bits to be able to do this.
You have to write a minimum of two bytes to store a 9-bits value. An easy solution is to use 16 bits per 9 bits value
Choose a 16 bits unsigned type, eg uint16_t and store the 2 bytes
uint16_t w = 275;
fwrite(&w, 1, 2, myfilep);
Reading the word w, ensure it actually uses only its 9 first bits (bits 0~8)
w &= 0x1FF;
Note that you might have endianness issues if you read the file on another system that doesn't have the same endianness as the system that wrote the word.
You could also optimize that solution using 9 bits of a 16 bits word, then using the remaining 7 bits to store the first 7 bits of the next 9 bits value etc...
See this answer that explains how to work with bit shifting in C.

Write hex bytes to file using fwrite (little endian issue)

I am trying to write hex bytes to file using fwrite but it's in wrong order due to little endian. Does anyone know an easy way to fix the problem without writing another function to swap the bytes? Thanks.
My code is actually to convert every 8 bytes binary sequence to 2 bytes hexdecimal value and write to the file. For example, 00000000 -> 00 , 00010010 -> 12, ..., and write each 2 byte result to the file.
What I used to write to the file is:
unsigned char hex;
char two_bytes[8]=""; // 8 bytes binary sequence
hex = strtol(two_bytes, NULL, 2);
fwrite(&hex, sizeof(hex), 1, fd); //but write in the wrong order
You can convert from whatever your host endianness is to "network endian" (big endian) with htonl.
hex = htonl(strtol(two_bytes, NULL, 2));
... if you don't mind pulling in some fancy headers.

fwrite not behaving as it should be

I have a C program that writes to a file using fwrite(..) and the result is not consistent with the function's arguments I provide.
uint32_t x = 1009716161;
FILE * file = fopen("helloop.txt", "wb+");
rewind(file);
fwrite( &x, sizeof(uint32_t), 1, file);
fclose(file);
When I check the file afterward it seems to contains symbols that does not translate into anything
>cat helloop.txt
>Á/<
as I should be getting this
>cat helloop.txt
>000000003C2F0BC1
I checked the file's permissions and I chmodded it
chmod 777 helloop.txt
The way I see it I have a 1 element of size 32 bit integer that I want to write to file,
what am I doing wrong ?
Your program did exactly what you told it to.
In general, if a program you've written seems to be misbehaving, you shouldn't jump to the conclusion that the operating system, the compiler, or the runtime library is at fault. It's far more likely that there's a bug in your program, or that you've misunderstood something. That remains true no matter how many years of experience you have.
The value 1009716161, expressed in hexadecimal, is 0x3c2f0bc1. When you write that value to a binary file, you write 4 8-bit bytes, with values 0x3c, 0x2f, 0x0b, and 0xc1. In ASCII, those are '<', '/', and two characters outside the printable ASCII range. The order in which they're written depends on the endianness your system, but the contents you report seem consistent with that.
I"m not sure why you'd expect to see 000000003C2F0BC1. That's 16 byte when you only wrote 4 bytes to the file. Furthermore, binary files do not contain an ASCII rendering of the hexadecimal representation of the data you wrote -- they just contain the data.
If you examine the file by converting it from raw binary to hexadecimal (by using the hexdump or od -x command if your system has it), you should see something recognizable.
And if you open the file in binary mode and use fread to read the data back into a uint32_t object, you should get the original value 1009716161 back -- which is the whole point.
cat helloop.txt
Á/<
cat prints character data. It doesn't print a 4-byte value in a file as a 32-bit integer.
as I should be getting this
cat helloop.txt
000000003C2F0BC1
No you shouldn't, not with cat. You'd have to write the character string "000000003C2F0BC1" to the file if you expect to get that. The file would probably be 16 characters long. I'd bet right now if you run ls -l helloop.txt you'll see size 4 because you wrote a uint32_t binary integer to the file.
what am I doing wrong ?
As far as I can tell, the only thing you've done wrong is expecting cat to print out your uint32_t as a hexadecimal representation. (Though I did not check your hex value, so that may be incorrect as well)
See if you have hexdump on your Linux machine, it may give you better results.
EDIT: If you actually DO want to print a uint32_t as a hex string, you can use fprintf(..., "%x", x) with the x or X format specifier, but keep in mind this is not compatible with fwrite/fread, so to read it back in you'll have to read in the string and convert back from hex to int.

Write 9 bits binary data in C

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you
No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.
You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.
You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

read signed int from a bin file in C gives me wrong results

I want to read data from a .bin file. Actually If I preview the data of my bin file I see something like this:
0000000 3030 3030 3030 3730 300a 3030 3030 3030
0000010 0a35 3330 3030 3030 3030 300a 3031 3030
So I just want to read first the first 2 32-bit signed int.
My code is this:
int data,data2;
fread(&data,4,1, ptr_myfile);
printf("First Data read in hex is: %x\n",data);
/*read the second 32 bit integer*/
fread(&data2,sizeof(int),1, ptr_myfile);
printf("Second data read in hex is: %x\n",data2);
My output is this:
First Data read in hex is: 30303030
Second data read in hex is: 37303030
So my first question is why they are read in this order? and the second one is not 30303730? and which one is the correct under the assumption that I have to read the first two signed 32 bit integers?
And more important
The second number declares the rest 32 bit signed ints that should exist in the bin file. There are some notes that describe this bin file and I know that the second number should be equal to 4, or a little bit bigger but at no case 37303030 which is extremely large number.
I think there is something wrong with my conversion or the way I read the bin file.
The bin file is supposed to contain:
EDIT:
The bin file is ASCII text with UNIX-style line-endings. It
consist of a series of 32-bit signed integers in hexadecimal only
Any help on what am I missing here?
When comparing multi-byte binaries in a raw format, you might have to consider Endianess. At least that's the first thing that comes to my mind when the number's bytes appear "reversed" when you display them after reading them and printing.
This does look like an encoding issue.
That file seems to contain plain ASCII values of the numbers - at least judging from the sample you gave us.
The ASCII-Hex-Code 30 is the character 0, 31=1,..., 39=9.
That ASCII-Hex-Code 0A is a linefeed.
I would not be too surprised if you also found the values
41 until 46 which would resolve to A - F, or maybe 61 until 66 which would be the lowercase variants a - f.
As that description claims those number to be 32bit wide, I would suggest you try to read up to 8 characters (bytes) per number and convert those ASCII values into numeric values by e.g. using sscanf - as said, numbers most likely are delimited by that linefeed.
However, the confusing part is that your given sample contains numbers that are obviously made from more than 8 characters;
35 3330 3030 3030 3030 30
That would resolve to
5300 0000 00 which is more than 32bit wide and hence not fitting the description.
So something is wrong, either the description (which should say e.g. 64bit), or your quote (you somehow mixed things up while copying) or the entire file format is broken (less likely).
Well, I would start by parsing them line by line and trying to convert them to binary values by using sscanf monster.
It's very likely what Henrik said, and using printf is adding additional murkiness to the waters.
Try this:
unsigned char data[4],data2[4];
fread(&data,4,1, ptr_myfile);
printf("First Data read in hex is: %x, %x, %x, %x\n",data[0], data[1], data[2], data[3]);
/*read the second 32 bit integer*/
fread(&data2,4,1, ptr_myfile);
printf("Second data read in hex is: %x, %x, %x, %x\n",data2[0], data2[1], data2[2], data2[3]);
This will give you the bytes as you read them, without any byteswapping due to endedness issues.

Resources