I'm trying to read the bitmap header. I have defined the following struct:
typedef struct {
char magic[2];
char size[4];
char reserved[4];
char offset[4];
char dibbytes[4];
char width[4];
char height[4];
char colorplanes[2];
char bpp[2];
char rawsize[4];
char hor_res[4];
char ver_res[4];
char colors[4];
char important[4];
} bmp_t;
And I open a bitmap image with this function:
void open(char * filename) {
long filesize;
char * buffer;
FILE * file = fopen(filename, "rb");
fseek(file, 0, SEEK_END);
filesize = ftell(file);
rewind(file);
buffer = (char *) malloc(sizeof(char) * filesize);
fread(buffer, 1, filesize, file);
bmp_t * bmp = (bmp_t *) buffer;
printf("Size in hex: %02x %02x %02x %02x\n", bmp->size[0], bmp->size[1], bmp->size[2], bmp->size[3]);
fclose(file);
}
To test this, I made a new bitmap, width = 1000 pixels, height = 3 pixels. The filesize is 9054 bytes. However, the output of my program is:
Size in hex: 5e 23 00 00
That's a bit strange because 0x5e23 reversed is 0x235e, which is 9054 in decimal (the correct filesize). So the values are saved in reverse. For example, if I make a 1000x1000 bitmap, I should get a filesize of 3000054 in decimal, but I get 0xf6c62d. And the reverse, 0x2dc6f6, is 3000054 (again, the correct filesize).
So I thought I could just sprintf the string in reverse and then use atoi to convert it to int:
sprintf(bmp->size, "%c%c%c%c", bmp->size[3], bmp->size[2], bmp->size[1], bmp->size[0]);
printf("Size reversed: %02x %02x %02x %02x\n", bmp->size[0] & 0xFF, bmp->size[1] & 0xFF, bmp->size[2] & 0xFF, bmp->size[3] & 0xFF);
int size = atoi(bmp->size);`
printf("Size: %d\n", size);
For the image I get the following result:
Size reversed: 00 00 23 5e
Size: 0
So the sprintf is working great, my string is reversed and 0x235e is the correct answer. But the atoi doesn't work on the string, and I don't know why. Besides that, I think this method of printing a reverse string is a very strange way of reading the values.
What am I doing wrong here and what is the correct way of doing this? Thanks in advance.
Update!
Turns out it is an endianness problem. But how should I read it then? I just want to get an integer with the filesize (or width, height, anything).
This happens because everything in memory is saved in reversed order, at least on little endian machines (x86s fall in this category). Check here for details about endianness and here if you want to know why little endian is used.
Welcome to the real world, where different computers have different endianness. Reading binary fields directly is never a good idea, you need to be endian-aware.
There probably is a well-defined byte order for the BMP file format (probably little-endian due to its roots on x86), and you need to make sure you always read any multi-byte fields in the proper way.
UPDATE: Wikipedia states:
All of the integer values are stored in little-endian format (i.e. least-significant byte first).>
The storing of things in "reverse" is called the endianness of the machine. X86 machines store their bytes in little-endian order.
You're trying to convert a hexadecimal value with a function that converts an ASCII value to a int. This is the wrong tool for the job.
You're not doing it wrong.
This is called little-endian data.
Read this:
http://en.wikipedia.org/wiki/Endianness
There is more than one way to store data in files :-)
Related
Please explain each case in detail, what is happening under the hood and why I am getting 55551 and -520103681 specifically.
typedef uint_8 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :255
typedef uint_16 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :55551
typedef uint_32 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :-520103681
I am reading from a file having first four bytes as 255 216 255 244.
In your 3 cases, before the printf statement, the 4 first bytes of the arr array (in hexadecimal format) are: FF D8 FF E0, which corresponds to 255 216 255 224.
Here are explanations of each case:
arr[0] has uint8_t type, so its 1-byte value is 0xFF, so printf("%i", arr[0]); prints 255, which corresponds to 0xFF in signed decimal integer format required by %i, integer size being 4 bytes.
arr[0] has uint16_t type, so its 2-bytes value is 0xFFD8, so printf("%i", arr[0]); prints 55551, which corresponds to 0xFFD8 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8 is interpreted with little-endianness (the byte 0xD8 is the MSB).
arr[0] has uint32_t type, so its 4-bytes value is 0xFFD8FFE0, so printf("%i", arr[0]); prints -520103681, which corresponds to 0xFFD8FFE0 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8FFE0 is interpreted with little-endianness (the byte 0xE0 is the MSB).
Note: I voluntarily changed "255 216 255 244" from your post into "255 216 255 224". I think you made a typo.
You seem to mixup a lot of misunderstandings. Three out of four lines in your examples are questionable. So let's disect them.
typedef uint_8 BYTE; why do you have this typedef and where is uint_8 coming from I suggest you just use uint8_t from stdint.h and for a minimal example you could skip your typedef to byte completly.
The documentation of fread tells us that:
the second parameter is the size of each element to be read
the third parameter is the number of elements to be read
There are other ways to get the values into memory and to make the program reproducable by copy and paste we can just enter the corresponding values to memory. If you have problems with fread that would be a different question.
So it would be one of those lines (your last value has to be 224 not 244 to get -520103681)
uint8_t arr[512] ={0xFF, 0xD8, 0xFF, 0xE0}; //{255, 216, 255, 224}
uint16_t arr[512] = {0xD8FF, 0xF4FF };//{255<<8 + 216, 255<<8 + 224} reversed because of the endianess
uint32_t arr[512] = {0xE0FFD8FF}; //{255<<24 + 216<<16 + 255<<8 + 224}
Now you can see that the arrays are of different size and 16/32 bit hardly qualify as a BYTE
In the last line you use printf() wrong. If you look up the output and length specifiers for printf() you can see that i is used for int (which probably is 32bit).
Basically you tell it read arr[0] (of whatever type) as signed int.
This results in the values you see above. The (nearly) correct specifiers would be %hhu for unsigned char, %hu for unsigned short and %u for unsigned int.
But as you use size defined variables it would be better to use inttypes.h and the specifiers :
PRIu8, PRIu16 and PRIu32 accordingly like this
printf("%"PRIu8,arr[0]);
putting them all together yields:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
uint32_t arr[512] = {0xE0FFD8FF};
printf("%"PRIu32,arr[0]);
return 0;
}
If you make eleminate all the problems in the code we get closer to the problem.
Problem 1 might have been that you forgot about endianess so you might expect the bytes in a different order.
Also if you use the signed specifier for printf and the MSB is 1 you get a negative value, that won't happen if you use the correct specifier for unsigned values.
I'm struggling with a problem that requires I perform a hex dump to an object file I've created with the function fopen().
I've declared the necessary integer variable (in HEX) as follows:
//Declare variables
int code = 0xCADE;
The output must be big Endian so I've swapped the bytes in this manner:
//Swap bytes
int swapped = (code>>8) | (code<<8);
I then opened the file for binary output in this manner:
//Open file for binary writing
FILE *dest_file = fopen(filename, "wb");
Afterwards, I write the variable code (which corresponds to a 16 bit word) to the file in the following manner using fwrite():
//Write out first word of header (0xCADE) to file
fwrite(&swapped, sizeof(int), 1, dest_file);
After compiling, running, and performing a hexdump on the file in which the contents have been written to, I observe the following output:
0000000 ca de ca 00
0000004
Basically everything is correct up until the extra "ca 00". I am unsure why that is there and need it removed so that my output is just:
0000000 ca de
0000004
I know the Endianness problem has been addressed extensively on the stack, but after performing a serach, I am unclear as to how to classify this problem. How can I approach this problem so that "ca 00" is removed?
Thanks very much.
EDIT:
I've changed both:
//Declare variables
int code = 0xCADE;
//Swap bytes
int swapped = (code>>8) | (code<<8);
to:
//Declare variables
unsigned short int code = 0xCADE;
//Swap bytes
unsigned short int swapped = (code>>8) | (code<<8);
And I observe:
0000000 ca de 00 00
0000004
Which gets me closer to what I need but there's still that extra "00 00". Any help is appreciated!
You are telling fwrite to write sizeof(int) bytes, which on your system evaluates to 4 bytes (the size of int is 4). If you want to write two bytes, just do:
fwrite(&swapped, 2, 1, dest_file);
To reduce confusion, code that reorders bytes should use bytes (uint8 or char) and not multi-byte types like int.
To swap two bytes:
char bytes[2];
char temp;
fread(bytes, 2, 1, file1);
temp = bytes[0];
bytes[0] = bytes[1];
bytes[1] = temp;
fwrite(bytes, 2, 1, file2);
If you use int, you probably deceive yourself assuming that its size is 2 (while it's most likely 4), and assuming anything about how your system writes int to files, which may be incorrect. While if you work with bytes, there cannot be any surprises - your code does exactly what it looks like it does.
What I am doing:
I am trying to read Byte-By-Byte from a .wav file and trying to show some information about the header present in the file. (For my project work).
My Code (A small Part) :-
#include<stdio.h>
void main()
{
char a[4],temp[4],i,j;
int test;
long value=0;
FILE *file;
file=fopen("hellomono.wav","rb");
//
//4 bytes - chunkID # "RIFF"
fread(a,4,1,file);
printf("ChunkID is: %s\n", a);
//4 bytes - ChunkSize
fread(a,4,1,file);
for (i=0;i<4;i++) value=value+((long)a[i]<<(8*i));
printf("ChunkSize is: %ld bits \n", value);
printf("%02x:%02x:%02x:%02x\n",a[0],a[1],a[2],a[3] );
value=0;
}
My Problems:-
Now as the ChunkSize is of size 4 bytes and in little endian format, I am converting it into long value to print the correct value.
The printf statement with hex output shows: ffffffb4:4f:02:00 but I have specified the format as %02x so it should show hex at most of 2 value, which is good for part 4f:02:00 but the first part ffffffb4 is not so. Why?
Let's assume the 4 Byte read is b4:4f:02:00 in little endian, thats 0x00024FB4 in big endian, its 151476 but in the code, the 2nd last printf prints the following: ChunkSize is: 151220 bits , why?
Thank You.
but I have specified the format as %02x so it should show hex at most of 2 value
It does not mean that, it means it shows at least 2 digits padded with 0 if necessary. Cast the argument to unsigned char to get rid of the f digits.
fread(cur, 2, 1, fin)
I am sure I will feel stupid when I get an answer to this, but what is happening?
cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading.
If my file is 00101000 01000000
what I get in the end is
code_cur = 01000000 00101000
Why is that? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.
And, in case this is the norma, how can I obtain the desired effect?
P.S.
I should probably add that, in order to 'view' the bytes, I am printing their integer value.
printf("%d\n",code_cur)
I tried it a couple times and it seemed reliable.
As others have pointed out you need to learn more on endianness.
You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian). Your machine is little endian, so a correction is needed. Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.
Do somethig similar to this:
{
uint16_t tmp;
if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
code_cur = ntohs(tmp);
} else {
/* Treat error however you see fit */
perror("Error reading file");
exit(EXIT_FAILURE); // requires #include <stdlib.h>
}
}
ntohs() will convert your value from file order to your machine's order, whatever it is, big or little endian.
This is why htonl and htons (and friends) exist. They're not part of the C standard library, but they're available on pretty much every platform that does networking.
"htonl" means "host to network, long"; "htons" means "host to network, short". In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits). Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*". When you're writing out, you pass it through "hton*"
You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)
As others have pointed out, this is an endianess issue.
The Most Significant Byte differs in your file and your machine. Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).
To understand what's happening, let's create a file with some binary data:
uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
fwrite(buffer, 1, 2, fp); // write two bytes to file
fclose(fp);
The file.bin was created and holds the binary value 00101000 01000000, let's read it:
uint8_t buffer[2] = {0, 0};
FILE * fp = fopen("file.bin", "rb");
fread(buffer, 1, 2, fp); // read two bytes from file
fclose(fp);
printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
// The above prints 0x28, 0x40, as expected and in the order we wrote previously
So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).
Anyways, as you noticed, here's what happens when you try to read the short directly:
FILE * fp_alt = fopen("file.bin", "rb");
short incorrect_short = 0;
fread(&incorrect_short, 1, 2, fp_alt);
fclose(fp_alt);
printf("Read short as machine endianess: %hu\n", incorrect_short);
printf("In hex, that is 0x%04x\n", incorrect_short);
// We get the incorrect decimal of 16424 and hex of 0x4028!
// The machine inverted our short because of the way the endianess works internally
The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!
It's nice to use ntohs from arpa/inet.h to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.
In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE method, straight to the point and easy to use.
Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):
#include <stdint.h> // necessary for specific int types
// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[0] << 8 + buffer[1];
return 1;
}
Usage is simple (error handling omitted for brevity):
FILE * fp = fopen("file.bin", "rb"); // Open file as binary
short code_cur = 0;
freadInt16BE(&code_cur, fp);
fclose(fp);
printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
printf("In hex, that is 0x%04x\n", code_cur);
// The above code prints 0x2840 correctly (decimal: 10304)
The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.
As a bonus, if you happen to find a file that is little-endian, you can use this function:
// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[1] << 8 + buffer[0];
return 1;
}
I have one encrypted file named encrypt.
Here I calculated crc 16 for this file and store this crc result in unsigned short this unsigned short size is 2 byte(16 bits).
Now I want to append 2 byte of crc value at the end of this file and read these last 2 bytes from file and have to compare this crc so how can I achieve this thing?
I used this code
fseek(readFile, filesize, SEEK_SET);
fprintf(readFile,"%u",result);
Here filesize is my file original encrypted file size and after this i add result which is unsigned short but in file its write 5 bytes.
file content after this
testsgh
30549
original file data is testsgh but here crc is 30459 I want to store this value in 2 byte. so how can I do?
You should open the file in binary append mode:
FILE *out = fopen("myfile.bin", "ab");
This will eliminate the need to seek to the end.
Then, you need to use a direct write, not a print which converts the value to a string and writes the string. You want to write the bits of your unsigned short checksum:
const size_t wrote = fwrite(&checksum, sizeof checksum, 1, out);
This succeeded if and only if the value of wrote is 1.
However, please note that this risks introducing endianness errors, since it writes the value using your machine's local byte order. To be on the safe side, it's cleaner to decide on a byte ordering and implement it directly. For big-endian:
const unsigned char check_bytes[2] = { checksum >> 8, checksum & 255 };
const size_t wrote = fwrite(check_bytes, sizeof check_bytes, 1, out);
Again, we expect wrote to be 1 after the call to indicate that both bytes were successfully written.
Use fwrite(), not fprintf. I don't have access to a C compiler atm but fwrite(&result, sizeof(result), 1, readFile); should work.
You could do something like this:
unsigned char c1, c2;
c1 = (unsigned char)(result >> 8);
c2 = (unsigned char)( (result << 8) >> 8);
and then append c1 and c2 at the end of the file. When you read the file back, just do the opposite:
result = ( (unsigned)c1 << 8 ) + (unsigned)c2;
Hope that helps.
you can write single characters with %c formating. e.g.
fprintf(readfile, "%c%c", result % 256, result / 256)
btw: readfile is misleading, when you write to it :-)