Reading double from binary file (byte order?) - c

I have a binary file, and I want to read a double from it.
In hex representation, I have these 8 bytes in a file (and then some more after that):
40 28 25 c8 9b 77 27 c9 40 28 98 8a 8b 80 2b d5 40 ...
This should correspond to a double value of around 10 (based on what that entry means).
I have used
#include<stdio.h>
#include<assert.h>
int main(int argc, char ** argv) {
FILE * f = fopen(argv[1], "rb");
assert(f != NULL);
double a;
fread(&a, sizeof(a), 1, f);
printf("value: %f\n", a);
}
However, that prints
value: -261668255698743527401808385063734961309220864.000000
So clearly, the bytes are not converted into a double correctly. What is going on?
Using ftell, I could confirm that 8 bytes are being read.

Just like integer types, floating point types are subject to platform endianness. When I run this program on a little-endian machine:
#include <stdio.h>
#include <stdint.h>
uint64_t byteswap64(uint64_t input)
{
uint64_t output = (uint64_t) input;
output = (output & 0x00000000FFFFFFFF) << 32 | (output & 0xFFFFFFFF00000000) >> 32;
output = (output & 0x0000FFFF0000FFFF) << 16 | (output & 0xFFFF0000FFFF0000) >> 16;
output = (output & 0x00FF00FF00FF00FF) << 8 | (output & 0xFF00FF00FF00FF00) >> 8;
return output;
}
int main()
{
uint64_t bytes = 0x402825c89b7727c9;
double a = *(double*)&bytes;
printf("%f\n", a);
bytes = byteswap64(bytes);
a = *(double*)&bytes;
printf("%f\n", a);
return 0;
}
Then the output is
12.073796
-261668255698743530000000000000000000000000000.000000
This shows that your data is stored in the file in little endian format, but your platform is big endian. So, you need to perform a byte swap after reading the value. The code above shows how to do that.

Endianness is convention. Reader and writer should agree on what endianness to use and stick to it.
You should read your number as int64, convert endianness and then cast to double.

Related

Making conversion similar to BN_hex2bn + BN_bn2bin without openSSL in C

I currently use openSSL to convert values from encrypted string to what I thought was a binary array. I then decrypt this "array" (pass to EVP_DecryptUpdate). I make the conversion like this:
BIGNUM *bnEncr = BN_new();
if (0 == BN_hex2bn(&bnEncr, encrypted)) { // from hex to big number
printf("ERROR\n");
}
unsigned int numOfBytesEncr = BN_num_bytes(bnEncr);
unsigned char encrBin[numOfBytesEncr];
if (0 == BN_bn2bin(bnEncr, encrBin)) { // from big number to binary
printf("ERROR\n");
}
Then I pass encrBin to EVP_DecryptUpdate and decryption works.
I do this in many places in my code and now want to write my own C function of converting hex to binary array, which I can then pass to EVP_DecryptUpdate. I had a go at this and converted my encrypted hex string to an array of 0s and 1s, but turns out that EVP_DecryptUpdate won't work with that. From what I could find online, BN_bn2bin "creates a representation that is truly binary (i.e. a sequence of bits). More specifically, it creates a big-endian representation of the number." So this is not just an array of 0s and 1s, right?
Can someone explain how I can make the hex->(truly) binary conversion myself in C, so I would get the format that EVP_DecryptUpdate expects? Is this complicated?
BN_bn2bin "creates a representation that is truly binary (i.e. a
sequence of bits). More specifically, it creates a big-endian
representation of the number." So this is not just an array of 0s and
1s, right?
The sequence of bits mentioned here is represented as an array of bytes. With each of those bytes containing 8 bits, this can be interpreted as an "array of 0s and 1s". It is not an "array of integers that have the value 0 or 1", if that is what you are asking.
Since you are unclear of the workings of BN_bn2bin(), it helps to just analyze the end result of your code snippet. You could do that like this (omitting any error checking):
#include <stdio.h>
#include <openssl/bn.h>
int main(
int argc,
char **argv)
{
const char *hexString = argv[1];
BIGNUM *bnEncr = BN_new();
BN_hex2bn(&bnEncr, hexString);
unsigned int numOfBytesEncr = BN_num_bytes(bnEncr);
unsigned char encrBin[numOfBytesEncr];
BN_bn2bin(bnEncr, encrBin);
fwrite(encrBin, 1, numOfBytesEncr, stdout);
}
This outputs the contents of encrBin to the standard output, which is never a nice thing to happen, but you can then pipe it through a tool like hexdump, or redirect it to a file for analyzing with a hex editor. It looks like this:
$ ./bntest 74162ac74759e85654e0e7762c2cdd26 | hexdump -C
00000000 74 16 2a c7 47 59 e8 56 54 e0 e7 76 2c 2c dd 26 |t.*.GY.VT..v,,.&|
00000010
Or, if you do want to see those 0s and 1s:
$ ./bntest 74162ac74759e85654e0e7762c2cdd26 | xxd -b -c 4
00000000: 01110100 00010110 00101010 11000111 t.*.
00000004: 01000111 01011001 11101000 01010110 GY.V
00000008: 01010100 11100000 11100111 01110110 T..v
0000000c: 00101100 00101100 11011101 00100110 ,,.&
This shows that your question
Can someone explain how I can make the hex->(truly) binary conversion
myself in C, so I would get the format that EVP_DecryptUpdate expects?
Is this complicated?
is essentially the same as the SO question How to turn a hex string into an unsigned char array?, like I commented.
It is unclear why you want this, and it's definitely not advisable to roll your own implementation of the conversion functions (they may stop working with any number of internal changes to OpenSSL) but if you're interested in what it looks like:
static int bn2binpad(const BIGNUM *a, unsigned char *to, int tolen)
{
int n;
size_t i, lasti, j, atop, mask;
BN_ULONG l;
/*
* In case |a| is fixed-top, BN_num_bytes can return bogus length,
* but it's assumed that fixed-top inputs ought to be "nominated"
* even for padded output, so it works out...
*/
n = BN_num_bytes(a);
if (tolen == -1) {
tolen = n;
} else if (tolen < n) { /* uncommon/unlike case */
BIGNUM temp = *a;
bn_correct_top(&temp);
n = BN_num_bytes(&temp);
if (tolen < n)
return -1;
}
/* Swipe through whole available data and don't give away padded zero. */
atop = a->dmax * BN_BYTES;
if (atop == 0) {
OPENSSL_cleanse(to, tolen);
return tolen;
}
lasti = atop - 1;
atop = a->top * BN_BYTES;
for (i = 0, j = 0, to += tolen; j < (size_t)tolen; j++) {
l = a->d[i / BN_BYTES];
mask = 0 - ((j - atop) >> (8 * sizeof(i) - 1));
*--to = (unsigned char)(l >> (8 * (i % BN_BYTES)) & mask);
i += (i - lasti) >> (8 * sizeof(i) - 1); /* stay on last limb */
}
return tolen;
}

fwrite() in c writes bytes in a different order

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *int_pointer = (int *) malloc(sizeof(int));
// open output file
FILE *outptr = fopen("test_output", "w");
if (outptr == NULL)
{
fprintf(stderr, "Could not create %s.\n", "test_output");
return 1;
}
*int_pointer = 0xabcdef;
fwrite(int_pointer, sizeof(int), 1, outptr);
//clean up
fclose(outptr);
free(int_pointer);
return 0;
}
this is my code and when I see the test_output file with xxd it gives following output.
$ xxd -c 12 -g 3 test_output
0000000: efcdab 00 ....
I'm expecting it to print abcdef instead of efcdab.
Which book are you reading? There are a number of issues in this code, casting the return value of malloc for example... Most importantly, consider the cons of using an integer type which might vary in size and representation from system to system.
An int is guaranteed the ability to store values between the range of -32767 and 32767. Your implementation might allow more values, but to be portable and friendly with people using ancient compilers such as Turbo C (there are a lot of them), you shouldn't use int to store values larger than 32767 (0x7fff) such as 0xabcdef. When such out-of-range conversions are performed, the result is implementation-defined; it could involve saturation, wrapping, trap representations or raising a signal corresponding to computational error, for example, the latter of two which could cause undefined behaviour later on.
You need to translate to an agreed-upon field format. When sending data over the write, or writing data to a file to be transferred to other systems, it's important that the protocol for communication be agreed upon. This includes using the same size and representation for integer fields. Both output and input should be followed by a translation function (serialisation and deserialisation, respectively).
Your fields are binary, and so your file should be opened in binary mode. For example, use fopen(..., "wb") rather than "w". In some situations, '\n' characters might be translated to pairs of \r\n characters, otherwise; Windows systems are notorious for this. Can you imagine what kind of havoc and confusion this could wreak? I can, because I've answered a question about this problem.
Perhaps uint32_t might be a better choice, but I'd choose unsigned long as uint32_t isn't guaranteed to exist. On that note, for systems which don't have htonl (which returns uint32_t according to POSIX), that function could be implemented like so:
uint32_t htonl(uint32_t x) {
return (x & 0x000000ff) << 24
| (x & 0x0000ff00) << 8
| (x & 0x00ff0000) >> 8
| (x & 0xff000000) >> 24;
}
As an example inspired by the above htonl function, consider these macros:
typedef unsigned long ulong;
#define serialised_long(x) serialised_ulong((ulong) x)
#define serialised_ulong(x) (x & 0xFF000000) / 0x1000000 \
, (x & 0xFF0000) / 0x10000 \
, (x & 0xFF00) / 0x100 \
, (x & 0xFF)
typedef unsigned char uchar;
#define deserialised_long(x) (x[3] <= 0x7f \
? deserialised_ulong(x) \
: -(long)deserialised_ulong((uchar[]) { 0x100 - x[0] \
, 0xFF - x[1] \
, 0xFF - x[2] \
, 0xFF - x[3] })
#define deserialised_ulong(x) ( x[0] * 0x1000000UL \
+ x[1] * 0x10000UL \
+ x[2] * 0x100UL \
+ x[3] )
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *f = fopen("test_output", "wb+");
if (f == NULL)
{
fprintf(stderr, "Could not create %s.\n", "test_output");
return 1;
}
ulong value = 0xABCDEF;
unsigned char datagram[] = { serialised_ulong(value) };
fwrite(datagram, sizeof datagram, 1, f);
printf("%08lX serialised to %02X%02X%02X%02X\n", value, datagram[0], datagram[1], datagram[2], datagram[3]);
rewind(f);
fread(datagram, sizeof datagram, 1, f);
value = deserialised_ulong(datagram);
printf("%02X%02X%02X%02X deserialised to %08lX\n", datagram[0], datagram[1], datagram[2], datagram[3], value);
fclose(f);
return 0;
}
Use htonl()
It converts from whatever the host-byte-order is (endianness of your machine) to network byte order. So whatever machine you're running on you will get the the same byte order. These calls are used so that regardless of the host you're running on the bytes are sent over the network in the right order, but it works for you too.
See the man pages of htonl and byteorder. There are various conversion functions available, also for different integer sizes, 16-bit, 32-bit, 64-bit ...
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
int main(void) {
int *int_pointer = (int *) malloc(sizeof(int));
// open output file
FILE *outptr = fopen("test_output", "w");
if (outptr == NULL) {
fprintf(stderr, "Could not create %s.\n", "test_output");
return 1;
}
*int_pointer = htonl(0xabcdef); // <====== This ensures correct byte order
fwrite(int_pointer, sizeof(int), 1, outptr);
//clean up
fclose(outptr);
free(int_pointer);
return 0;
}

C 40bit byte swap (endian)

I'm reading/writing a binary file in little-endian format from big-endian using C and bswap_{16,32,64} macros from byteswap.h for byte-swapping.
All values are read and written correctly, except a bit-field of 40 bits.
The bswap_40 macro doesn't exist and I don't know how do it or if a better solution is possible.
Here is a small code showing this problem:
#include <stdio.h>
#include <inttypes.h>
#include <byteswap.h>
#define bswap_40(x) bswap_64(x)
struct tIndex {
uint64_t val_64;
uint64_t val_40:40;
} s1 = { 5294967296, 5294967296 };
int main(void)
{
// write swapped values
struct tIndex s2 = { bswap_64(s1.val_64), bswap_40(s1.val_40) };
FILE *fp = fopen("index.bin", "w");
fwrite(&s2, sizeof(s2), 1, fp);
fclose(fp);
// read swapped values
struct tIndex s3;
fp = fopen("index.bin", "r");
fread(&s3, sizeof(s3), 1, fp);
fclose(fp);
s3.val_64 = bswap_64(s3.val_64);
s3.val_40 = bswap_40(s3.val_40);
printf("val_64: %" PRIu64 " -> %s\n", s3.val_64, (s1.val_64 == s3.val_64 ? "OK" : "Error"));
printf("val_40: %" PRIu64 " -> %s\n", s3.val_40, (s1.val_40 == s3.val_40 ? "OK" : "Error"));
return 0;
}
That code is compiled with:
gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
swap_40.c -o swap_40
How can I define bswap_40 macro for read and write these values of 40 bits doing byte-swap?
By defining bswap_40 to be the same as bswap_64, you're swapping 8 bytes instead of 5. So if you start with this:
00 00 00 01 02 03 04 05
You end up with this:
05 04 03 02 01 00 00 00
Instead of this:
00 00 00 05 04 03 02 01
The simplest way to handle this is to take the result of bswap_64 and right shift it by 24:
#define bswap_40(x) (bswap_64(x) >> 24)
EDIT
I got better performance writing this macro (comparing with my initial code, this produced less assembly instructions):
#define bswap40(s) \
((((s)&0xFF) << 32) | (((s)&0xFF00) << 16) | (((s)&0xFF0000)) | \
(((s)&0xFF000000) >> 16) | (((s)&0xFF00000000) >> 32))
use:
s3.val_40 = bswap40(s3.val_40);
... but it might be an optimizer issue. I thinks they should be optimized to the same thing.
Original Post
I love dbush's answer better... I was about to write this:
static inline void bswap40(void* s) {
uint8_t* bytes = s;
bytes[0] ^= bytes[3];
bytes[1] ^= bytes[2];
bytes[3] ^= bytes[0];
bytes[2] ^= bytes[1];
bytes[0] ^= bytes[3];
bytes[1] ^= bytes[2];
}
It's a destructive inline function for switching the bytes...
I'm reading/writing a binary file in little-endian format from big-endian using C and bswap_{16,32,64} macros from byteswap.h for byte-swapping.
Suggest a different way of approaching this problem: Far more often, code needs to read a file in a known endian format and then convert to the code's endian. This may involve a byte swap, it may not the trick is to write code that works under all conditions.
unsigned char file_data[5];
// file data is in big endidan
fread(file_data, sizeof file_data, 1, fp);
uint64_t y = 0;
for (i=0; i<sizeof file_data; i++) {
y <<= 8;
y |= file_data[i];
}
printf("val_64: %" PRIu64 "\n", y);
uint64_t val_40:40; is not portable. Bit ranges on types other that int, signed int, unsigned are not portable and have implementation specified behavior.
BTW: Open the file in binary mode:
// FILE *fp = fopen("index.bin", "w");
FILE *fp = fopen("index.bin", "wb");

Store C structs for multiple platform use - would this approach work?

Compiler: GNU GCC
Application type: console application
Language: C
Platforms: Win7 and Linux Mint
I wrote a program that I want to run under Win7 and Linux. The program writes C structs to a file and I want to be able to create the file under Win7 and read it back in Linux and vice versa.
By now, I have learned that writing complete structs with fwrite() will give almost 100% assurance that it won't be read back correctly by the other platform. This due to padding and maybe other causes.
I defined all structs myself and they (now, after my previous question on this forum) all have members of type int32_t, int64_t and char. I am thinking about writing a WriteStructname() function for each struct that will write the individual members as int32_t, int64_t and char to the outputfile. Likewise, a ReadStructname() function to read the individual struct members from the file and copy them to an empty struct again.
Would this approach work? I prefer to have maximum control over my sourcecode, so I'm not looking for libraries or other dependencies to achieve this unless I really have to.
Thanks for reading
Element-wise writing of data to a file is your best approach, since structs will differ due to alignment and packing differences between compilers.
However, even with the approach you're planning on using, there are still potential pitfalls, such as different endianness between systems, or different encoding schemes (ie: two's complement versus one's complement encoding of signed numbers).
If you're going to do this, you should consider something like a JSON parser to encode and decode your data so you don't corrupt it due to the issues mentioned above.
Good luck!
If you use GCC or any other compiler that supports "packed" structs, as long you avoid yourself from using anything but [u]intX_t types in the struct, and execute endianness fix in any field where type is bigger than 8 bits, you are platform safe :)
This is an example code where you get portability between platforms, do not forget to manually edit the endianness UIP_BYTE_ORDER.
#include <stdint.h>
#include <stdio.h>
/* These macro are set manually, you should use some automated detection methodology */
#define UIP_BIG_ENDIAN 1
#define UIP_LITTLE_ENDIAN 2
#define UIP_BYTE_ORDER UIP_LITTLE_ENDIAN
/* Borrowed from uIP */
#ifndef UIP_HTONS
# if UIP_BYTE_ORDER == UIP_BIG_ENDIAN
# define UIP_HTONS(n) (n)
# define UIP_HTONL(n) (n)
# define UIP_HTONLL(n) (n)
# else /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
# define UIP_HTONS(n) (uint16_t)((((uint16_t) (n)) << 8) | (((uint16_t) (n)) >> 8))
# define UIP_HTONL(n) (((uint32_t)UIP_HTONS(n) << 16) | UIP_HTONS((uint32_t)(n) >> 16))
# define UIP_HTONLL(n) (((uint64_t)UIP_HTONL(n) << 32) | UIP_HTONL((uint64_t)(n) >> 32))
# endif /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
#else
#error "UIP_HTONS already defined!"
#endif /* UIP_HTONS */
struct __attribute__((__packed__)) s_test
{
uint32_t a;
uint8_t b;
uint64_t c;
uint16_t d;
int8_t string[13];
};
struct s_test my_data =
{
.a = 0xABCDEF09,
.b = 0xFF,
.c = 0xDEADBEEFDEADBEEF,
.d = 0x9876,
.string = "bla bla bla"
};
void save()
{
FILE * f;
f = fopen("test.bin", "w+");
/* Fix endianness */
my_data.a = UIP_HTONL(my_data.a);
my_data.c = UIP_HTONLL(my_data.c);
my_data.d = UIP_HTONS(my_data.d);
fwrite(&my_data, sizeof(my_data), 1, f);
fclose(f);
}
void read()
{
FILE * f;
f = fopen("test.bin", "r");
fread(&my_data, sizeof(my_data), 1, f);
fclose(f);
/* Fix endianness */
my_data.a = UIP_HTONL(my_data.a);
my_data.c = UIP_HTONLL(my_data.c);
my_data.d = UIP_HTONS(my_data.d);
}
int main(int argc, char ** argv)
{
save();
return 0;
}
Thats the saved file dump:
fanl#fanl-ultrabook:~/workspace-tmp/test3$ hexdump -v -C test.bin
00000000 ab cd ef 09 ff de ad be ef de ad be ef 98 76 62 |..............vb|
00000010 6c 61 20 62 6c 61 20 62 6c 61 00 00 |la bla bla..|
0000001c
This is a good approach. If all fields are integer types of a specific size such as int32_t, int64_t, or char, and you read/write the appropriate number of them to/from arrays, you should be fine.
The one thing you need to watch out for is endianness. Any integer type should be written in a known byte order and read back in the proper byte order for the system in question. The simplest way to do this is with the ntohs and htons functions for 16-bit ints and the ntohl and htonl functions for 32-bit ints. There's no corresponding standard functions for 64-bit ints, but that shouldn't be to difficult to write.
Here's a sample of how you could write these functions for 64 bit:
uint64_t htonll(uint64_t val)
{
uint8_t v[8];
uint64_t *result = (uint64_t *)v;
int i;
for (i=0; i<8; i++) {
v[i] = (uint8_t)(val >> ((7-i) * 8));
}
return *result;
}
uint64_t ntohll(uint64_t val)
{
uint8_t *v = (uint8_t *)&val;
uint64_t result = 0;
int i;
for (i=0; i<8; i++) {
result |= (uint64_t)v[i] << ((7-i) * 8);
}
return result;
}

Why does fread mess with my byte order?

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.
typedef struct{
short magic_number;
int file_size;
short reserved_bytes[2];
int data_offset;
}BMPHeader;
...
BMPHeader header;
...
The hex data is 42 4D 36 00 03 00 00 00 00 00 36 00 00 00;
I am loading the hex data into the struct by fread(&header,14,1,fileIn);
My problem is where the magic number should be 0x424d //'BM' fread() it flips the bytes to be 0x4d42 // 'MB'
Why does fread() do this and how can I fix it;
EDIT: If I wasn't specific enough, I need to read the whole chunk of hex data into the struct not just the magic number. I only picked the magic number as an example.
This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.
Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:
/* CHAR_BIT == 8 assumed */
uint16_t le16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
}
uint16_t be16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
}
You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:
/* note adjustments to type definition */
typedef struct BMPHeader
{
uint8_t magic_number[2];
uint32_t file_size;
uint8_t reserved[4];
uint32_t data_offset;
} BMPHeader;
/* in general this is _not_ equal to sizeof(BMPHeader) */
#define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
/* returns 0=success, -1=error */
int read_bmp_header(BMPHeader *hdr, FILE *fp)
{
uint8_t buf[BMP_WIRE_HDR_LEN];
if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
return -1;
hdr->magic_number[0] = buf[0];
hdr->magic_number[1] = buf[1];
hdr->file_size = le32_to_cpu(buf+2);
hdr->reserved[0] = buf[6];
hdr->reserved[1] = buf[7];
hdr->reserved[2] = buf[8];
hdr->reserved[3] = buf[9];
hdr->data_offset = le32_to_cpu(buf+10);
return 0;
}
You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.
You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.
Conversion in the opposite direction, btw, looks like this:
void cpu_to_le16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0x00FF);
buf[1] = (val & 0xFF00) >> 8;
}
void cpu_to_be16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0xFF00) >> 8;
buf[1] = (val & 0x00FF);
}
Conversion of 32-/64-bit quantities left as an exercise.
I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.
Demonstrated in this code:
#include <stdio.h>
int main()
{
union {
short sval;
unsigned char bval[2];
} udata;
udata.sval = 1;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x424d;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x4d42;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
return 0;
}
Gives the following output
DEC[ 1] HEX[0001] BYTES[01][00]
DEC[16973] HEX[424d] BYTES[4d][42]
DEC[19778] HEX[4d42] BYTES[42][4d]
So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.
Subsequent question:
I ask only because my file size is 3 instead of 196662
This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:
Offset
short magic_number; 00 - 01
int file_size; 02 - 05
short reserved_bytes[2]; 06 - 09
int data_offset; 0A - 0D
BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:
short magic_number; 00 - 01
<<unused padding>> 02 - 03
int file_size; 04 - 07
short reserved_bytes[2]; 08 - 0B
int data_offset; 0C - 0F
So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.
Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.

Resources