Reading SQLite header

Reading SQLite header - c

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:
struct Header_info {
char *filename;
char *sql_string;
uint16_t page_size;
};
int read_header(FILE *db, struct Header_info *header)
{
assert(db);
uint8_t sql_buf[100] = {0};
/* load the header */
if(fread(sql_buf, 100, 1, db) != 1) {
return ERR_SIZE;
}
/* copy the string */
header->sql_string = strdup((char *)sql_buf);
/* verify that we have a proper header */
if(strcmp(header->sql_string, "SQLite format 3") != 0) {
return ERR_NOT_HEADER;
}
memcpy(&header->page_size, (sql_buf + 16), 2);
return 0;
}
Here are the relevant bytes of the file I'm testing it on:
0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e .....# ........
Following this spec, the code looks correct to me.
Later I print header->page_size with this line:
printf("\tPage size: %"PRIu16"\n", header->page_size);
But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.

It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.
Your problem is therefore that memcpy is not an appropriate tool to read the value.
See the following section of the SQLite file format spec :
1.2.2 Page Size
The two-byte value beginning at offset 16 determines the page size of
the database. For SQLite versions 3.7.0.1 and earlier, this value is
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported. The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01. This value can be interpreted as a
big-endian 1 and thought of is as a magic number to represent the
65536 page size. Or one can view the two-byte field as a little endian
number and say that it represents the page size divided by 256. These
two interpretations of the page-size field are equivalent.

It seems an endianness issue. If you are on a little-endian machine this line:
memcpy(&header->page_size, (sql_buf + 16), 2);
copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.
You can do this instead:
header->page_size = sql_buf[17] | (sql_buf[16] << 8);
Update
For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).

Related

Print unsigned short int in hex and decimal C

I've been trying to print unsigned short int values in C with no luck. As far as I know, it's a 16 bit value, so I've tried several different methods to print this 2 bytes together, but I've only been able to print it correctly when doing it byte by byte.
Note that I want to print this 16 bits in both a decimal and hexadecimal format, for example 00 01 as 1, another example would be: 00 ff as 255.
I have this struct:
struct arphdr {
unsigned short int ar_hrd;
unsigned short int ar_pro;
unsigned char ar_hln;
unsigned char ar_pln;
unsigned short int ar_op;
// I'm commenting the following part because I won't need it now
// for the explanation
/* unsigned char __ar_sha[ETH_ALEN];
unsigned char __ar_sip[4];
unsigned char __ar_tha[ETH_ALEN];
unsigned char __ar_tip[4]; */
};
And this function:
void print_ARP_msg(struct arphdr arp_hdr) {
// I need to print the arp_hdr.ar_hrd in both decimal and hex.
printf("Format HW: %04x\n", arp_hdr.ar_hrd);
printf("Format Proto: %04x\n", arp_hdr.ar_pro);
printf("HW Len: %x\n", arp_hdr.ar_hln);
printf("Proto Len: %x\n", arp_hdr.ar_pln);
printf("Command: %04x\n", arp_hdr.ar_op);
}
The print_ARP_msg function returns me this:
Format HW: 0100
Format Proto: 0008
HW Len: 6
Proto Len: 4
Command: 0100
The hex values of the struct are "00 01 08 00 06 04 00 01", so I don't know why it's returning me 0100 in the arp_hdr.ar_hrd value.
Also, I made a function which prints the struct in hex, to make sure that I'm doing it right, and I was able to check, that all the fields where correctly assigned.
PS: before sending this question, I realized that it's printing the correct Hex values but disordered. Could it be related to the little/big endian "difference"?

Could it be related to the little/big endian "difference"?
Yes. If you're dealing with packets that have arrived over a network - and you're printing fields of an ARP packet, so that's exactly what you're doing - you may have to convert from the byte order of the fields as they are when sent over the network to the byte order on the machine on which you're running.
For example:
printf("Format HW: %04x\n", ntohs(arp_hdr.ar_hrd));
In that particular case, you can get away without the ntohs() call on a big-endian machine (SPARC, System/3x0, z/Architecture, PowerPC/Power ISA running AIX, PowerPC/Power ISA running Mac OS X, possibly PowerPC/Power ISA running Linux, etc.), but you can't get away without it on a little-endian machine (anything with an x86 processor, including x86-64 processors, probably most ARM, etc.).
You can use it on both types of processor.

The hex values of the struct are "00 01 08 00 06 04 00 01", so I don't know why it's returning me 0100 in the arp_hdr.ar_hrd value.
Looks like your platform uses the little endian system.
What appears to be 00 01 in memory is interpreted as 01 x 2^8 + 00. In other words, the hex representation of the number is 0100.

ARP packets, like all internet protocol packets, are transmitted in "network order", which is big-endian. The client must use, for example, ntohs (network to host short) to convert from network order to the machine's local byte order, and htons to go the other way.
See man byteorder for details.

Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

I am parsing a Photoshop raw, 16 bit/channel, RGB file in C and trying to keep a log of exceptional data points. I need a very fast C analysis of up to 36 MPix images with 16 bit quanta or 216 MB Photoshop .RAW files.
<1% of the points have weird skin tones and I want to graph them with PerlMagick or Perl GD to see where they are coming from.
The first 4 bytes of the C data file contain the unsigned image width as a uint32_t. In Perl, I read the whole file in binary mode and extract the first 32 bits:
Xres=1779105792l = 0x6a0b0000
It looks a lot like the C log file:
DA: Color anomalies=14177=0.229%:
DA: II=1) raw PIDX=0x10000b25, XCols=[0]=0x00000b6a
Dec(0x00000b6a) = 2922, the Exact X_Columns_Width of a small test file.
Clearly a case of intel's 1972 8008 NUXI architecture. How hard could it possibly be to translate 0x6a0b0000 to 0x6a0b0000; swap 2 bytes and 2 nibbles and you're done. Slicing the 8 characters and rearranging them could be done but that is the kind of ugly hack I am trying to avoid.
Grab the same 32 bit vector from file offset zero and unpack it as "VAX" unsigned long.
$xres = vec($bdat, 0, 32); # vec EXPR,OFFSET,BITS
$vul = unpack("V", vec($bdat, 0, 32));
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731
Every single hex character is mangled. Obviously wrong Endian, it is not VAX. The "Other" one is Network Big-endian
http://perldoc.perl.org/functions/pack.html
N An unsigned long (32-bit) in "network" (big-endian) order.
V An unsigned long (32-bit) in "VAX" (little-endian) order.
$nul = unpack("N", vec($bdat, 0, 32)); # Network Unsigned Long 32b
printf("Xres=0x%08x, NET ulong=%ul=0x%08x\n", $xres, $nul, $nul);
Xres=0x6a0b0000, NET ulong=825702201l=0x31373739
The $XRES still shows the right hex in the wrong order. The "NETWORK" long 32 bit uint extracted from the same bits is unrecognizable. Try Binary
$bits = unpack("b*", vec($bdat, 0, 32));
printf("bits=$bits, len=%d\n", length $bits);
bits=10001100111011001110110010011100100011000000110010101100111011001001110001001100, len=80
I clearly asked for 32 bits and got 80 bits. What gives?
Try for 4, unsigned, 8bit bytes which can NOT be swapped:
for($ii = 0; $ii < 4; $ii++) {
$bit_off=$ii*8; # Bit offset
$uc = unpack("C", vec($bdat, $bit_off, 8)); # C An unsigned char
printf("II $ii, bo $bit_off, d=%d, u=%u, x=0x%x\n",
$uc,$uc, $uc);
}
II 0, bo 0, d=49, u=49, x=0x31
II 1, bo 8, d=51, u=51, x=0x33
II 2, bo 16, d=49, u=49, x=0x31
II 3, bo 24, d=49, u=49, x=0x31
I am looking for hex 0, 6, a or b. There are no "3"s or "1"s in the right answer. Try pirating from a C file:
http://cpansearch.perl.org/src/MHX/Convert-Binary-C-0.76/tests/include/include/bits/byteswap.h
$x = $xres;
$x= (((($x) & 0xff000000) >> 24) | ((($x) & 0x00ff0000) >> 8) | ((($x) & 0x0000ff00) << 8) | ((($x) & 0x000000ff) << 24));
printf("\$xres=0x%08x -> \$x=0x%08x = %u\n", $xres, $x, $x);
$xres=0x6a0b0000 -> $x=0x00000b6a = 2922
It WORKS! But, this is uglier than converting the original, wrong order hex number to a string to untangle it:
$stupid_str = sprintf("%08x", $xres);
$stupid_num = join('', reverse ($stupid_str =~ m/../g));
printf("Stupid_num '%s'->0x%08x=%d\n", $stupid_num, $dec=hex $stupid_num, $dec);
Stupid_num '00000b6a'->0x00000b6a=2922
It's like judging the Ugliest Dog contest, but I would still rather have to maintain the text version than the even more abominable C version.
I know there are ways to do this in Java/Python/Go/Ruby/.....
I know there are command line utilities that do exactly this.
I must figure out how I am misusing either VEC or Unpack, both of which I have used a zillion times. It is the Brain Teasing aspect which is driving me nuts! EndianNess == EndianMess!!!
TYVM!
=================================================
Borodin,
Thanks for lookin' at this.
My intel processor is little-endian. When I read it back, it was trans-mutilated by vec to the "correct" big-endian, network format.
I just tried reading it VERBATIM from a BINARY file read and it works fine:
($b4 = $bdat) =~ s/^(....).*$/$1/msg; # Give me my 4 bytes back without mutilation!
printf("B4='%s'=>0x%08x=<0x%08x\n", $b4, unpack("L>", $b4), unpack("L<", $b4));
B4='j...' = >0x6a0b0000 = <0x00000b6a <<< THE RIGHT ANSWER!!!
If you try unpack 'V', $bdat then you will find that it works
That was my first attempt:
$vul = unpack("V", vec($bdat, 0, 32)); # UNPACK V!
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731 <<<< TOTALLY WRONG!
I had already verified that the $BDAT info was the right data in the wrong format. It just needed some rearrangement.
I just used vec() to generate 1 bit and 4 bit graphics files and it worked faithfully, returning the exact bits I wrote. It must have mistaken my Intel i7 for my IBM System/370. I7/37??? Easy mistake to make. :)
I read the [confusing] part about "converted to a number as with pack ...". That's why my number was backward. The >>unpack("V", vec($bdat"<< ... was my ill-fated attempt to byte-swap the backward number in $BDAT from the WRONG VEC()-preferred FORMAT to the native format supported by my architecture.
Now I understand why I saw so many examples of people extracting by the byte, to avoid Big Brother's helping hand!
Data::BitStream::Vec "uses a Perl vec to store the data. The vector is accessed in 1-bit units"
Thanks 1E6,
B

You are confusing things by combining vec with unpack
The correct way is simply
unpack 'V', $bdat
which returns a value of 0x00000B6A as you expect
vec($bdat, 0, 32) is equivalent to unpack 'N', $bdat as you can see from the value of $xres in your first code block, and the documentation for vec confirms this with
If BITS is 16 or more, bytes of the input string are grouped into chunks of size BITS/8, and each group is converted to a number as with pack()/unpack() with big-endian formats n/N
The line
$vul = unpack("V", vec($bdat, 0, 32))
is very wrong, because the decimal value of vec($bdat, 0, 32) is 1779105792, so you are then calling unpack on the string "1779105792" which doesn't do anything useful at all

FAT BPB and little endian reversal

My CPU is little endian, which documentation has told me conforms to the byte-order of the FAT specification. Why then, am I getting a valid address for the BS_jmpBoot, bytes 0-3 of first sector, but not getting a valid number for BPB_BytesPerSec, bytes 11-12 of the first sector.
116 int fd = open (diskpath, O_RDONLY, S_IROTH);
117
118 read (fd, BS_jmpBoot, 3);
119 printf("BS_jmpBoot = 0x%02x%02x%02x\n", BS_jmpBoot[0], S_jmpBoot[1], S_jmpBoot[2]);
120
121 read (fd, OEMName, 8);
122 OEMName[8] = '\0';
123 printf("OEMName = %s\n", OEMName);
124
125 read (fd, BPB_BytesPerSec, 2);
126 printf("BPB_BytesPerSec = 0x%02x%02x\n",BPB_BytesPerSec[0], BPB_BytesPerSec[1]);
Yields
BS_jmpBoot = 0xeb5890 //valid address, while 0x9058eb would not be
OEMName = MSDOS5.0
BPB_BytesPerSec = 0x0002 //Should be 0x0200
I would like figure out why BS_jmpBoot and OEMName print valid but BPB_BytesPerSec does not. If anyone could enlighten me I would be greatly appreciative.
Thanks
EDIT: Thanks for the help everyone, it was my types that were making everything go awry. I got it to work by writing the bytes to an unsigned short, as uesp suggested(kinda), but I would still like to know why this didn't work:
unsigned char BPB_BytesPerSec[2];
...
125 read (fd, BPB_BytesPerSec, 2);
126 printf("BPB_BytesPerSec = 0x%04x\n", *BPB_BytesPerSec);
yielded
BPB_BytesPerSec = 0x0000
I would like to use char arrays to allocate the space because I want to be sure of the space I'm writing to on any machine; or should I not?
Thanks again!

You are reading BPB_BytesPerSec incorrectly. The structure of the Bpb is (from here):
BYTE BS_jmpBoot[3];
BYTE BS_OEMName[8];
WORD BPB_BytesPerSec;
...
The first two fields are bytes so their endianness is irrelevant (I think). BPB_BytesPerSec is a WORD (assuming 2 bytes) so you should define/read it like:
WORD BPB_BytesPerSec; //Assuming WORD is defined on your system
read (fd, &BPB_BytesPerSec, 2);
printf("BPB_BytesPerSec = 0x%04x\n", BPB_BytesPerSec);
Since when you read the bytes directly you get 00 02, which is 0x0200 in little endian, you should correctly read BPB_BytesPerSec like this.

First of all, this line:
printf("BPB_BytesPerSec = 0x%02x%02x\n",BPB_BytesPerSec[0], BPB_BytesPerSec[1]);
is printing the value out in big endian format. If it prints 0x0002 here, the actual value would be 0x0200 in little endian.
As for the BS_jmpBoot value, according to this site:
The first three bytes EB 3C and 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code.
In other words, the first 3 bytes are opcodes which are three separate bytes, not one little endian value.

htonl printing garbage value

The variable 'value' is uint32_t
value = htonl(value);
printf("after htonl is %ld\n\n",value);
This prints -201261056
value = htons(value);
printf("after htons is %ld\n\n",value);
This prints 62465
Please suggest what could be the reason?

I guess your input is 500, isn't it?
500 is 2**8+2**7+2**6+2**5+2**4+2**2 or 0x00 0x00 0x01 0xF4 in little endian order.
TCP/IP uses big endian. So after the htonl, the sequence is 0xF4 0x01 0x00 0x00.
If you print it as signed integer, since the first digit is 1, it is negative then. Negative numbers are regarded as complement, The value is -(2**27 + 2**25+2**24+2**23+2**22+2**21+2**20+2**19+2**18+2**17+2**16) == -201261056

Host Order is the order which your machine understands the data (assuming your machine is little endian) correctly. Network Order is Big Endian, which cannot be understood by your system properly. This is the reason for your so called garbage values.
So, basically, there is nothing with the code. : )
Google "Endianness" to get all the details about Big Endian and Little Endian.
Providing some more info, In Big endian, first byte or lowest address will have the most significant byte and in little endian, at the same place, the least significant byte will be present. So, when you use htonl, your first byte will now contain the most significant byte, but your system will think it is as the least significant byte.
Considering the wikipedia example of decimal 1000 (hex 3E8) in big endian will be 03 E8 and in little endian will be E8 03. Now, if you pass 03 E8 to a little machine, it will consider to be decimal 59395.

htonl() and htons() are functions which is used to convert the data from host's endianess to networks endiness.
Network uses big-endian. so if your system is X86, then it is little-endian.
Host to Network byte order(long data) is htonl(). i.e converts 32bit value to network byte order.
Host to Network byte order(short data) is htons(). i.e converts 16bit value to network byte order.
sample program to show how htonl() works as well as effect of using 32bit value in htons() function.
#include <stdio.h>
#include <arpa/inet.h>
int main()
{
long data = 0x12345678;
printf("\n After htonl():0x%x , 0x%x\n", htonl(data), htons(data));
return 0;
}
It will print After htonl():0x78563412 , 0x7856 on X86_64.
Reference:
http://en.wikipedia.org/wiki/Endianess
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738557%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738556%28v=vs.85%29.aspx

#halfelf>I jut want to put my findings.I just tried the below program with the same
value 500.I guess you have mistakenely mentioned output of LE as BE and vice versa
Actual output i got is 0xf4 0x01 0x00 0x00 as little Endian format.My Machine is
LE
#include <stdio.h>
#include <netinet/in.h>
/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n)
{
int i;
for (i = 0; i < n; i++)
printf(" %.2x-%p", start[i],start+i);
printf("\n");
}
/*Main function to call above function for 0x01234567*/
int main()
{
int i = 500;//0x01234567;
int y=htonl(i); --->(1)
printf("i--%d , y---%d,ntohl(y):%d\n",i,y,ntohl(ntohl(y)));
printf("_------LITTLE ENDIAN-------\n");
show_mem_rep((char *)&i, sizeof(i));
printf("-----BIG ENDIAN-----\n");/* i just used int y=htonl(i)(-1) for reversing
500 ,so that
i can print as if am using a BE machine. */
show_mem_rep((char *)&y, sizeof(i));
getchar();
return 0;
}
output is
i--500 , y----201261056,ntohl(y):-201261056
_------LITTLE ENDIAN-------
f4-0xbfda8f9c 01-0xbfda8f9d 00-0xbfda8f9e 00-0xbfda8f9f
-----BIG ENDIAN-----
00-0xbfda8f98 00-0xbfda8f99 01-0xbfda8f9a f4-0xbfda8f9b

Why does fread mess with my byte order?

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.
typedef struct{
short magic_number;
int file_size;
short reserved_bytes[2];
int data_offset;
}BMPHeader;
...
BMPHeader header;
...
The hex data is 42 4D 36 00 03 00 00 00 00 00 36 00 00 00;
I am loading the hex data into the struct by fread(&header,14,1,fileIn);
My problem is where the magic number should be 0x424d //'BM' fread() it flips the bytes to be 0x4d42 // 'MB'
Why does fread() do this and how can I fix it;
EDIT: If I wasn't specific enough, I need to read the whole chunk of hex data into the struct not just the magic number. I only picked the magic number as an example.

This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.
Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:
/* CHAR_BIT == 8 assumed */
uint16_t le16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
}
uint16_t be16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
}
You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:
/* note adjustments to type definition */
typedef struct BMPHeader
{
uint8_t magic_number[2];
uint32_t file_size;
uint8_t reserved[4];
uint32_t data_offset;
} BMPHeader;
/* in general this is _not_ equal to sizeof(BMPHeader) */
#define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
/* returns 0=success, -1=error */
int read_bmp_header(BMPHeader *hdr, FILE *fp)
{
uint8_t buf[BMP_WIRE_HDR_LEN];
if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
return -1;
hdr->magic_number[0] = buf[0];
hdr->magic_number[1] = buf[1];
hdr->file_size = le32_to_cpu(buf+2);
hdr->reserved[0] = buf[6];
hdr->reserved[1] = buf[7];
hdr->reserved[2] = buf[8];
hdr->reserved[3] = buf[9];
hdr->data_offset = le32_to_cpu(buf+10);
return 0;
}
You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.
You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.
Conversion in the opposite direction, btw, looks like this:
void cpu_to_le16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0x00FF);
buf[1] = (val & 0xFF00) >> 8;
}
void cpu_to_be16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0xFF00) >> 8;
buf[1] = (val & 0x00FF);
}
Conversion of 32-/64-bit quantities left as an exercise.

I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.
Demonstrated in this code:
#include <stdio.h>
int main()
{
union {
short sval;
unsigned char bval[2];
} udata;
udata.sval = 1;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x424d;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x4d42;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
return 0;
}
Gives the following output
DEC[ 1] HEX[0001] BYTES[01][00]
DEC[16973] HEX[424d] BYTES[4d][42]
DEC[19778] HEX[4d42] BYTES[42][4d]
So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.
Subsequent question:
I ask only because my file size is 3 instead of 196662
This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:
Offset
short magic_number; 00 - 01
int file_size; 02 - 05
short reserved_bytes[2]; 06 - 09
int data_offset; 0A - 0D
BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:
short magic_number; 00 - 01
<<unused padding>> 02 - 03
int file_size; 04 - 07
short reserved_bytes[2]; 08 - 0B
int data_offset; 0C - 0F
So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.

Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading SQLite header - c

Related

Print unsigned short int in hex and decimal C

Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

FAT BPB and little endian reversal

htonl printing garbage value

Why does fread mess with my byte order?

Categories

Resources