C - binary reading, fread is inverting the order - c

fread(cur, 2, 1, fin)
I am sure I will feel stupid when I get an answer to this, but what is happening?
cur is a pointer to a code_cur, a short (2 bytes), fin is a stream open for binary reading.
If my file is 00101000 01000000
what I get in the end is
code_cur = 01000000 00101000
Why is that? I am not putting any contest yet because the problem really boils down to this (at least for me) unexpected behaviour.
And, in case this is the norma, how can I obtain the desired effect?
P.S.
I should probably add that, in order to 'view' the bytes, I am printing their integer value.
printf("%d\n",code_cur)
I tried it a couple times and it seemed reliable.

As others have pointed out you need to learn more on endianness.
You don't know it but your file is (luckily) in Network Byte Order (which is Big Endian). Your machine is little endian, so a correction is needed. Needed or not, this correction is always recommended as this will guarantee that your program runs everywhere.
Do somethig similar to this:
{
uint16_t tmp;
if (1 == fread(&tmp, 2, 1, fin)) { /* Check fread finished well */
code_cur = ntohs(tmp);
} else {
/* Treat error however you see fit */
perror("Error reading file");
exit(EXIT_FAILURE); // requires #include <stdlib.h>
}
}
ntohs() will convert your value from file order to your machine's order, whatever it is, big or little endian.

This is why htonl and htons (and friends) exist. They're not part of the C standard library, but they're available on pretty much every platform that does networking.
"htonl" means "host to network, long"; "htons" means "host to network, short". In this context, "long" means 32 bits, and "short" means 16 bits (even if the platform declares "long" to be 64 bits). Basically, whenever you read something from the "network" (or in your case, the stream you're reading from), you pass it through "ntoh*". When you're writing out, you pass it through "hton*"
You can permutate those function names in whatever way you want, except for the silly ones (no, there is no ntons, and no stonl either)

As others have pointed out, this is an endianess issue.
The Most Significant Byte differs in your file and your machine. Your file has big-endian (MSB first) and your machine is little-endian (MSB last or LSB first).
To understand what's happening, let's create a file with some binary data:
uint8_t buffer[2] = {0x28, 0x40}; // hexadecimal for 00101000 01000000
FILE * fp = fopen("file.bin", "wb"); // opens or creates file as binary
fwrite(buffer, 1, 2, fp); // write two bytes to file
fclose(fp);
The file.bin was created and holds the binary value 00101000 01000000, let's read it:
uint8_t buffer[2] = {0, 0};
FILE * fp = fopen("file.bin", "rb");
fread(buffer, 1, 2, fp); // read two bytes from file
fclose(fp);
printf("0x%02x, 0x%02x\n", buffer[0], buffer[1]);
// The above prints 0x28, 0x40, as expected and in the order we wrote previously
So everything works well because we are reading byte-by-byte and bytes don't have endianess (technically they do, they are always Most Significant Bit first regardless of your machine, but you may think as if they didn't to simplify the understanding).
Anyways, as you noticed, here's what happens when you try to read the short directly:
FILE * fp_alt = fopen("file.bin", "rb");
short incorrect_short = 0;
fread(&incorrect_short, 1, 2, fp_alt);
fclose(fp_alt);
printf("Read short as machine endianess: %hu\n", incorrect_short);
printf("In hex, that is 0x%04x\n", incorrect_short);
// We get the incorrect decimal of 16424 and hex of 0x4028!
// The machine inverted our short because of the way the endianess works internally
The worst part is that if you're using a big-endian machine, the above results would not return incorrect number leaving you unaware that your code is endian-specific and not portable between processors!
It's nice to use ntohs from arpa/inet.h to convert the endianess, but I find it strange since it's a whole (non-standard) library made for network communication to solve an issue that comes from reading files, and it solves it by reading it incorrectly from the file and then 'translating' the incorrect value instead of just reading it correctly.
In higher languages we often see functions to handle reading endianess from file instead of converting the value because we (usually) know how a file structure is and its endianess, just look at Javascript Buffer's readInt16BE method, straight to the point and easy to use.
Motivated by this simplicity, I created a function that reads a 16-bit integer below (but it's very easy to change to 8, 32 or 64 bits if you need to):
#include <stdint.h> // necessary for specific int types
// Advances and reads a single signed 16-bit integer from the file descriptor as Big Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16BE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[0] << 8 + buffer[1];
return 1;
}
Usage is simple (error handling omitted for brevity):
FILE * fp = fopen("file.bin", "rb"); // Open file as binary
short code_cur = 0;
freadInt16BE(&code_cur, fp);
fclose(fp);
printf("Read Big-Endian (MSB first) short: %hu\n", code_cur);
printf("In hex, that is 0x%04x\n", code_cur);
// The above code prints 0x2840 correctly (decimal: 10304)
The function will fail (return 0) if the file either: doesn't exist, can't be open, or did not contain the 2 bytes to be read at the current position.
As a bonus, if you happen to find a file that is little-endian, you can use this function:
// Advances and reads a single signed 16-bit integer from the file descriptor as Little Endian
// Writes the value to 'result' pointer
// Returns 1 if succeeds or 0 if it fails
int8_t freadInt16LE(int16_t * result, FILE * f) {
uint8_t buffer[sizeof(int16_t)];
if (!result || !f || sizeof(int16_t) != fread((void *) buffer, 1, sizeof(int16_t), f))
return 0;
*result = buffer[1] << 8 + buffer[0];
return 1;
}

Related

memcpy misses one byte when copying to struct

I am trying to extract a particular region of my message and interpret it as a struct.
void app_main(void)
{
esp_err_t err;
uint8_t injected_input[]={0xCE,0x33,0xE1,0x00,0x11,0x22,0x33,0x44,0x55,0x66};
model_sensor_data_t stuff = {0};
model_sensor_data_t* sensor_buf = &stuff;
if (extract_sensor_data_msgA(injected_input, sensor_buf) == -1)
{
ESP_LOGE(TAG, "Error in extract_sensor_data_msgA");
}
ESP_LOGI(TAG, "extracted sensor data is 0x%12x", *sensor_buf);
}
typedef struct __attribute__((packed))
{
uint8_t byte0;
uint8_t byte1;
uint8_t byte2;
uint8_t byte3;
uint8_t byte4;
} model_sensor_data_t;
int32_t extract_sensor_data_msgA(uint8_t *buf, model_sensor_data_t *sensor_buf)
{
if (buf == NULL || sensor_buf == NULL)
{
return -1;
}
//do other checks, blah blah
memcpy(sensor_buf, buf + 5, sizeof(model_sensor_data_t)); //problem lies here
return 0;
}
I expect to get CLIENT: extracted sensor data is 0x2233445566 but i am getting CLIENT: extracted sensor data is 0x 55443322
It seems to me there are two problems i need to fix. First one is the endianness issue as the extracted values are all flipped. The second problem is the memcpy with padding(?) concern. I thought the second problem would be fixed if i use attribute((packed)) but it doesn't seem to fix the second problem. Any kind soul can provide an alternative way for me to go about this so as to resolve it? I have referred to https://electronics.stackexchange.com/questions/617711/problems-casting-a-uint8-t-array-to-a-struct and C memcpy copies bytes with little endianness but i am still unsure how to resolve the issue.
ESP_LOGI(TAG, "extracted sensor data is 0x%12x", *sensor_buf)
assuming this is going to a printf-family function (seems likely), it will be expecting a unsigned int as the argument, but you're passing a model_sensor_data_t, so you get undefined behavior.
What is probably happening is that an unsigned int is a 32-bit little-endian value being accessed in the bottom 32 bits of a register, while your calling convention will pass the model_sensor_data_t in a 64-bit register, so you're seeing the first 4 bytes as a little-endian unsigned. Alternately, printf is expecting a 32-bit value on the stack, and you are passing a 40-bit value (probably padded out to 8 bytes for alignment). Either way, it seems almost certain you're using a little-endian machine, such as an x86 of some flavor.
To print this properly, you need to print each byte. Something like
ESP_LOGI(TAG, "extracted sensor data is 0x%02x%02x%02x%02x%02x", sensor_buf->byte0,
sensor_buf->byte1, sensor_buf->byte2, sensor_buf->byte3, sensor_buf->byte4);
will print the extracted data as a 40-bit big-endian hex value.

Issue in converting little Endian hexdump output to Big Endian (C-programming)

I'm struggling with a problem that requires I perform a hex dump to an object file I've created with the function fopen().
I've declared the necessary integer variable (in HEX) as follows:
//Declare variables
int code = 0xCADE;
The output must be big Endian so I've swapped the bytes in this manner:
//Swap bytes
int swapped = (code>>8) | (code<<8);
I then opened the file for binary output in this manner:
//Open file for binary writing
FILE *dest_file = fopen(filename, "wb");
Afterwards, I write the variable code (which corresponds to a 16 bit word) to the file in the following manner using fwrite():
//Write out first word of header (0xCADE) to file
fwrite(&swapped, sizeof(int), 1, dest_file);
After compiling, running, and performing a hexdump on the file in which the contents have been written to, I observe the following output:
0000000 ca de ca 00
0000004
Basically everything is correct up until the extra "ca 00". I am unsure why that is there and need it removed so that my output is just:
0000000 ca de
0000004
I know the Endianness problem has been addressed extensively on the stack, but after performing a serach, I am unclear as to how to classify this problem. How can I approach this problem so that "ca 00" is removed?
Thanks very much.
EDIT:
I've changed both:
//Declare variables
int code = 0xCADE;
//Swap bytes
int swapped = (code>>8) | (code<<8);
to:
//Declare variables
unsigned short int code = 0xCADE;
//Swap bytes
unsigned short int swapped = (code>>8) | (code<<8);
And I observe:
0000000 ca de 00 00
0000004
Which gets me closer to what I need but there's still that extra "00 00". Any help is appreciated!
You are telling fwrite to write sizeof(int) bytes, which on your system evaluates to 4 bytes (the size of int is 4). If you want to write two bytes, just do:
fwrite(&swapped, 2, 1, dest_file);
To reduce confusion, code that reorders bytes should use bytes (uint8 or char) and not multi-byte types like int.
To swap two bytes:
char bytes[2];
char temp;
fread(bytes, 2, 1, file1);
temp = bytes[0];
bytes[0] = bytes[1];
bytes[1] = temp;
fwrite(bytes, 2, 1, file2);
If you use int, you probably deceive yourself assuming that its size is 2 (while it's most likely 4), and assuming anything about how your system writes int to files, which may be incorrect. While if you work with bytes, there cannot be any surprises - your code does exactly what it looks like it does.

Encode and combine int to 32bit int in C binary file

Lets say I have 2 variables:
int var1 = 1; //1 byte
int var2 = 2; //1 byte
I want to combine these and encode as a 32bit unsigned integer (uint32_t). By combining them, it would be 2 bytes. I'd then fill the remaining space with 2 bytes of 0 padding. This is to write to a file, hence the need for this specific type of encoding.
So by combining the above example variables, the output I need is:
1200 //4 bytes
There's no need to go the roundabout way of "combining" the values into an uint32_t. Binary files are streams of bytes, so writing single bytes is very possible:
FILE * const out = fopen("myfile.bin", "wb");
const int val1 = 1;
const int val2 = 2;
if(out != NULL)
{
fputc(val1, out);
fputc(val2, out);
// Pad the file to four bytes, as originally requested. Not needed, though.
fputc(0, out);
fputc(0, out);
fclose(out);
}
This uses fputc() to write single bytes to the file. It takes an integer argument for the value to write, but treats it as unsigned char internally, which is essentially "a byte".
Reading back would be just as simple, using e.g. fgetc() to read out the two values, and of course checking for failure. You should check these writes too, I omitted it because error handling.

How can I add 2 byte CRC at the end of File

I have one encrypted file named encrypt.
Here I calculated crc 16 for this file and store this crc result in unsigned short this unsigned short size is 2 byte(16 bits).
Now I want to append 2 byte of crc value at the end of this file and read these last 2 bytes from file and have to compare this crc so how can I achieve this thing?
I used this code
fseek(readFile, filesize, SEEK_SET);
fprintf(readFile,"%u",result);
Here filesize is my file original encrypted file size and after this i add result which is unsigned short but in file its write 5 bytes.
file content after this
testsgh
30549
original file data is testsgh but here crc is 30459 I want to store this value in 2 byte. so how can I do?
You should open the file in binary append mode:
FILE *out = fopen("myfile.bin", "ab");
This will eliminate the need to seek to the end.
Then, you need to use a direct write, not a print which converts the value to a string and writes the string. You want to write the bits of your unsigned short checksum:
const size_t wrote = fwrite(&checksum, sizeof checksum, 1, out);
This succeeded if and only if the value of wrote is 1.
However, please note that this risks introducing endianness errors, since it writes the value using your machine's local byte order. To be on the safe side, it's cleaner to decide on a byte ordering and implement it directly. For big-endian:
const unsigned char check_bytes[2] = { checksum >> 8, checksum & 255 };
const size_t wrote = fwrite(check_bytes, sizeof check_bytes, 1, out);
Again, we expect wrote to be 1 after the call to indicate that both bytes were successfully written.
Use fwrite(), not fprintf. I don't have access to a C compiler atm but fwrite(&result, sizeof(result), 1, readFile); should work.
You could do something like this:
unsigned char c1, c2;
c1 = (unsigned char)(result >> 8);
c2 = (unsigned char)( (result << 8) >> 8);
and then append c1 and c2 at the end of the file. When you read the file back, just do the opposite:
result = ( (unsigned)c1 << 8 ) + (unsigned)c2;
Hope that helps.
you can write single characters with %c formating. e.g.
fprintf(readfile, "%c%c", result % 256, result / 256)
btw: readfile is misleading, when you write to it :-)

Reading bitmap header, getting reversed values

I'm trying to read the bitmap header. I have defined the following struct:
typedef struct {
char magic[2];
char size[4];
char reserved[4];
char offset[4];
char dibbytes[4];
char width[4];
char height[4];
char colorplanes[2];
char bpp[2];
char rawsize[4];
char hor_res[4];
char ver_res[4];
char colors[4];
char important[4];
} bmp_t;
And I open a bitmap image with this function:
void open(char * filename) {
long filesize;
char * buffer;
FILE * file = fopen(filename, "rb");
fseek(file, 0, SEEK_END);
filesize = ftell(file);
rewind(file);
buffer = (char *) malloc(sizeof(char) * filesize);
fread(buffer, 1, filesize, file);
bmp_t * bmp = (bmp_t *) buffer;
printf("Size in hex: %02x %02x %02x %02x\n", bmp->size[0], bmp->size[1], bmp->size[2], bmp->size[3]);
fclose(file);
}
To test this, I made a new bitmap, width = 1000 pixels, height = 3 pixels. The filesize is 9054 bytes. However, the output of my program is:
Size in hex: 5e 23 00 00
That's a bit strange because 0x5e23 reversed is 0x235e, which is 9054 in decimal (the correct filesize). So the values are saved in reverse. For example, if I make a 1000x1000 bitmap, I should get a filesize of 3000054 in decimal, but I get 0xf6c62d. And the reverse, 0x2dc6f6, is 3000054 (again, the correct filesize).
So I thought I could just sprintf the string in reverse and then use atoi to convert it to int:
sprintf(bmp->size, "%c%c%c%c", bmp->size[3], bmp->size[2], bmp->size[1], bmp->size[0]);
printf("Size reversed: %02x %02x %02x %02x\n", bmp->size[0] & 0xFF, bmp->size[1] & 0xFF, bmp->size[2] & 0xFF, bmp->size[3] & 0xFF);
int size = atoi(bmp->size);`
printf("Size: %d\n", size);
For the image I get the following result:
Size reversed: 00 00 23 5e
Size: 0
So the sprintf is working great, my string is reversed and 0x235e is the correct answer. But the atoi doesn't work on the string, and I don't know why. Besides that, I think this method of printing a reverse string is a very strange way of reading the values.
What am I doing wrong here and what is the correct way of doing this? Thanks in advance.
Update!
Turns out it is an endianness problem. But how should I read it then? I just want to get an integer with the filesize (or width, height, anything).
This happens because everything in memory is saved in reversed order, at least on little endian machines (x86s fall in this category). Check here for details about endianness and here if you want to know why little endian is used.
Welcome to the real world, where different computers have different endianness. Reading binary fields directly is never a good idea, you need to be endian-aware.
There probably is a well-defined byte order for the BMP file format (probably little-endian due to its roots on x86), and you need to make sure you always read any multi-byte fields in the proper way.
UPDATE: Wikipedia states:
All of the integer values are stored in little-endian format (i.e. least-significant byte first).>
The storing of things in "reverse" is called the endianness of the machine. X86 machines store their bytes in little-endian order.
You're trying to convert a hexadecimal value with a function that converts an ASCII value to a int. This is the wrong tool for the job.
You're not doing it wrong.
This is called little-endian data.
Read this:
http://en.wikipedia.org/wiki/Endianness
There is more than one way to store data in files :-)

Resources