FAT BPB and little endian reversal

FAT BPB and little endian reversal - c

My CPU is little endian, which documentation has told me conforms to the byte-order of the FAT specification. Why then, am I getting a valid address for the BS_jmpBoot, bytes 0-3 of first sector, but not getting a valid number for BPB_BytesPerSec, bytes 11-12 of the first sector.
116 int fd = open (diskpath, O_RDONLY, S_IROTH);
117
118 read (fd, BS_jmpBoot, 3);
119 printf("BS_jmpBoot = 0x%02x%02x%02x\n", BS_jmpBoot[0], S_jmpBoot[1], S_jmpBoot[2]);
120
121 read (fd, OEMName, 8);
122 OEMName[8] = '\0';
123 printf("OEMName = %s\n", OEMName);
124
125 read (fd, BPB_BytesPerSec, 2);
126 printf("BPB_BytesPerSec = 0x%02x%02x\n",BPB_BytesPerSec[0], BPB_BytesPerSec[1]);
Yields
BS_jmpBoot = 0xeb5890 //valid address, while 0x9058eb would not be
OEMName = MSDOS5.0
BPB_BytesPerSec = 0x0002 //Should be 0x0200
I would like figure out why BS_jmpBoot and OEMName print valid but BPB_BytesPerSec does not. If anyone could enlighten me I would be greatly appreciative.
Thanks
EDIT: Thanks for the help everyone, it was my types that were making everything go awry. I got it to work by writing the bytes to an unsigned short, as uesp suggested(kinda), but I would still like to know why this didn't work:
unsigned char BPB_BytesPerSec[2];
...
125 read (fd, BPB_BytesPerSec, 2);
126 printf("BPB_BytesPerSec = 0x%04x\n", *BPB_BytesPerSec);
yielded
BPB_BytesPerSec = 0x0000
I would like to use char arrays to allocate the space because I want to be sure of the space I'm writing to on any machine; or should I not?
Thanks again!

You are reading BPB_BytesPerSec incorrectly. The structure of the Bpb is (from here):
BYTE BS_jmpBoot[3];
BYTE BS_OEMName[8];
WORD BPB_BytesPerSec;
...
The first two fields are bytes so their endianness is irrelevant (I think). BPB_BytesPerSec is a WORD (assuming 2 bytes) so you should define/read it like:
WORD BPB_BytesPerSec; //Assuming WORD is defined on your system
read (fd, &BPB_BytesPerSec, 2);
printf("BPB_BytesPerSec = 0x%04x\n", BPB_BytesPerSec);
Since when you read the bytes directly you get 00 02, which is 0x0200 in little endian, you should correctly read BPB_BytesPerSec like this.

First of all, this line:
printf("BPB_BytesPerSec = 0x%02x%02x\n",BPB_BytesPerSec[0], BPB_BytesPerSec[1]);
is printing the value out in big endian format. If it prints 0x0002 here, the actual value would be 0x0200 in little endian.
As for the BS_jmpBoot value, according to this site:
The first three bytes EB 3C and 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code.
In other words, the first 3 bytes are opcodes which are three separate bytes, not one little endian value.

Related

Reading bytes from progmem

I'm trying to write a simple program (as a pre-cursor to a more complicated one) that stores an array of bytes to progmem, and then reads and prints the array. I've looked through a million blog/forums posts online and think I'm doing everything fine, but I'm still getting utter gibberish as output.
Here is my code, any help would be much appreciated!
void setup() {
byte hello[10] PROGMEM = {1,2,3,4,5,6,7,8,9,10};
byte buffer[10];
Serial.begin(9600);
memcpy_P(buffer, (char*)pgm_read_byte(&hello), 10);
for(int i=0;i<10;i++){
//buffer[i] = pgm_read_byte(&(hello[i])); //output is wrong even if i use this
Serial.println(buffer[i]);
}
}
void loop() {
}
If I use memcpy, I get the output:
148
93
0
12
148
93
0
12
148
93
And if I use the buffer = .... statement in the for loop (instead of memcpy):
49
5
9
240
108
192
138
173
155
173

You're thinking about two magnitudes too complicated.
memcpy_P wants a source pointer, a destination pointer and a byte count. And the PROGMEM pointer is simply the array. So, your memcpy_P line should like like
memcpy_P (buffer, hello, 10);
that's it.
memcpy (without the "P") will not be able to reach program memory and copy stuff from data RAM instead. That is not what you want.

Is .xz file format description telling it all?

I've been reading the description of xz file format ( http://tukaani.org/xz/xz-file-format.txt ). But when I try to look into an xz file with binary editor, it doesn't seem to follow the structure defined in the description. What am I missing?
I compressed the description file (xz-file-format.txt) with xz cli utility in linux (xz version 4.999.9beta) and these are the first 32 bytes I get:
FD 37 7A 58 5A 00 00 04 E6 D6 B4 46 02 00 21 01 16 00 00 00 74 2F E5 A3 E0 A9 28 2A 99 5D 00 05
Overall structure of the file should be: stream - stream padding - stream - and so on. And in this case I think there should be only one stream since there is only one file compressed in the file. Structure of the stream is: stream header - block - block - ... - block - index - stream footer. And structure of the stream header is: header magic bytes - stream flags - crc code.
I can find the stream header from my file, but after the first sixteen bytes it doesn't seem to follow the description anymore.
First six bytes above are clearly the magic bytes. Next two bytes are the stream flags. Stream flags indicate that CRC64 is being used, so the CRC code takes next eight bytes. Seventeenth byte (I count from one) should then be the first byte of the first block.
Structure of a block is: block header - compressed data - block padding - check. Structure of block header should be: block header size - block flags - compressed size - uncompressed size - list of filter flags - header padding - CRC. So the seventeenth byte should then be block header size (0x16 in my file). That's possible, but the eighteenth byte seems a bit weird. It should be the block flags bit field. In my file it's null - so no flags set. Not even the number of filters, which according to description should be 1-4.
Since bits 6 and 7 of the block flags are also zeros, compressed and uncompressed sizes should not be present in the file and the next bytes should be the list of filter flags. Structure of the list is: filter ID - size of properties - filter properties. Nineteenth byte should then be filter ID. This is null in my file which is not any of officially defined filter IDs. If it would be a custom ID it would take nine bytes, but as I understand the encoding of sizes described in section 1.2 of the description it can't be, since according to the description: "All but the last byte of the multibyte representation have the highest (eighth) bit set.", but in my file the twentieth byte is also null.
So is there something I don't understand or is the file not following the description?

I asked the question a bit hastily and came up with a solution myself. Just in case someone would be interested, I answer my own question.
I had misunderstood the meaning of the stream flags in stream header. They don't affect the CRC code in the header (which is always CRC32), just CRCs in the stream itself (as the name stream flags implies). This means that the CRC in the header is only four bytes long and thus bytes 13-24 form a valid block header.
In the block header, the block flags field is again a null byte, which I saw as a problem before. According to the description, number of filters should be between 1 and 4. So I expected a decimal value of at least one. Since number of filters is expressed with two bits the maximum decimal value is 3, but number of possible values (zero included) is of course four and thus zero means one filter.
Since also the last two bits of the block flags are zeros, no compressed size or uncompressed size fields are present in the block header. This means that bytes 15-17 are the filter flags for the first (and only) filter. Filter id 0x21 is the id of LZMA2 filter. Size of properties 0x01 means size of one byte. And dictionary size 0x16 means size of 4096 KiB.

How can I deal with given situtaion related to Hardware change

I am maintaining a Production code related to FPGA device .Earlier resisters on FPGA are of 32 bits and read/write to these registers are working fine.But Hardware is changed and so did the FPGA device and with latest version of FPGA device we have trouble in read and write to FPGA register .After some R&D we came to know FPGA registers are no longer 32 bit ,it is now 31 bit registers and same has been claimed by FPGA device vendor.
So there is need to change small code as well.Earlier we were checking that address of registers are 4 byte aligned or not(because registers are of 32 bits)now with current scenario we have to check address are 31 bit aligned.So for the same we are going to check
if the most significant bit of the address is set (which means it is not a valid 31 bit).
I guess we are ok here.
Now second scenario is bit tricky for me.
if read/write for multiple registers that is going to go over the 0x7fff-fffc (which is the maximum address in 31 bit scheme) boundary, then have to handle request carefully.
Reading and Writing for multiple register takes length as an argument which is nothing but number of register to be read or write.
For example, if the read starts with 0x7fff-fff8, and length for the read is 5. Then actually, we can only read 2 registers (which is 0x7fff-fff8, and 0x7fff-fffc).
Now could somebody suggest me some kind of pseudo code to handle this scenario
Some think like below
while(lenght>1)
{
if(!(address<<(lenght*31) <= 0x7fff-fffc))
{
length--;
}
}
I know it is not good enough but something in same line which I can use.
EDIT
I have come up with a piece of code which may fulfill my requirement
int count;
Index_addr=addr;
while(Index_add <= 7ffffffc)
{
/*Wanted to move register address to next register address,each register is 31 bit wide and are at consecutive location. like 0x0,0x4 and 0x8 etc.*/
Index_add=addr<<1; // Guess I am doing wrong here ,would anyone correct it.
count++;
}
length=count;

The root problem seems to be that the program is not properly treating the FPGA registers.
Data encapsulation would help, and, instead of treating the 31-bit FPGA registers as memory locations, they should be abstracted.
The FPGA should be treated as a vector (a one-dimensional array) of registers.
The vector of N FPGA registers should be addressable by an register index in the range of 0x0000 through N-1.
The FPGA registers are memory mapped at base addr.
So the memory address = 4 * FPGA register index + base addr.
Access to the FPGA registers should be encapsulated by read and write procedures:
int read_fpga_reg(int reg_index, uint32_t *reg_valp)
{
if (reg_index < 0 || reg_index >= MAX_REG_INDEX)
return -1; /* error return */
*reg_valp = *(uint32_t *)(reg_index << 2 + fpga_base_addr);
return 0;
}
As long as MAX_REG_INDEX and fpga_base_addr are properly defined, then this code will never generate an invalid memory access.

I'm not absolutely sure I'm interpreting the given scenario correctly. But here's a shot at it:
// Assuming "address" starts 4-byte aligned and is just defined as an integer
unsigned uint32_t address; // (Assuming 32-bit unsigned longs)
while ( length > 0 ) // length is in bytes
{
// READ 4-byte value at "address"
// Mask the read value with 0x7FFFFFFF since there are 31 valid bits
// 32 bits (4 bytes) have been read
if ( (--length > 0) && (address < 0x7ffffffc) )
address += 4;
}

Reading SQLite header

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:
struct Header_info {
char *filename;
char *sql_string;
uint16_t page_size;
};
int read_header(FILE *db, struct Header_info *header)
{
assert(db);
uint8_t sql_buf[100] = {0};
/* load the header */
if(fread(sql_buf, 100, 1, db) != 1) {
return ERR_SIZE;
}
/* copy the string */
header->sql_string = strdup((char *)sql_buf);
/* verify that we have a proper header */
if(strcmp(header->sql_string, "SQLite format 3") != 0) {
return ERR_NOT_HEADER;
}
memcpy(&header->page_size, (sql_buf + 16), 2);
return 0;
}
Here are the relevant bytes of the file I'm testing it on:
0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300 SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e .....# ........
Following this spec, the code looks correct to me.
Later I print header->page_size with this line:
printf("\tPage size: %"PRIu16"\n", header->page_size);
But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.

It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.
Your problem is therefore that memcpy is not an appropriate tool to read the value.
See the following section of the SQLite file format spec :
1.2.2 Page Size
The two-byte value beginning at offset 16 determines the page size of
the database. For SQLite versions 3.7.0.1 and earlier, this value is
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported. The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01. This value can be interpreted as a
big-endian 1 and thought of is as a magic number to represent the
65536 page size. Or one can view the two-byte field as a little endian
number and say that it represents the page size divided by 256. These
two interpretations of the page-size field are equivalent.

It seems an endianness issue. If you are on a little-endian machine this line:
memcpy(&header->page_size, (sql_buf + 16), 2);
copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.
You can do this instead:
header->page_size = sql_buf[17] | (sql_buf[16] << 8);
Update
For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).

Why does fread mess with my byte order?

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.
typedef struct{
short magic_number;
int file_size;
short reserved_bytes[2];
int data_offset;
}BMPHeader;
...
BMPHeader header;
...
The hex data is 42 4D 36 00 03 00 00 00 00 00 36 00 00 00;
I am loading the hex data into the struct by fread(&header,14,1,fileIn);
My problem is where the magic number should be 0x424d //'BM' fread() it flips the bytes to be 0x4d42 // 'MB'
Why does fread() do this and how can I fix it;
EDIT: If I wasn't specific enough, I need to read the whole chunk of hex data into the struct not just the magic number. I only picked the magic number as an example.

This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.
Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:
/* CHAR_BIT == 8 assumed */
uint16_t le16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
}
uint16_t be16_to_cpu(const uint8_t *buf)
{
return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
}
You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:
/* note adjustments to type definition */
typedef struct BMPHeader
{
uint8_t magic_number[2];
uint32_t file_size;
uint8_t reserved[4];
uint32_t data_offset;
} BMPHeader;
/* in general this is _not_ equal to sizeof(BMPHeader) */
#define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
/* returns 0=success, -1=error */
int read_bmp_header(BMPHeader *hdr, FILE *fp)
{
uint8_t buf[BMP_WIRE_HDR_LEN];
if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
return -1;
hdr->magic_number[0] = buf[0];
hdr->magic_number[1] = buf[1];
hdr->file_size = le32_to_cpu(buf+2);
hdr->reserved[0] = buf[6];
hdr->reserved[1] = buf[7];
hdr->reserved[2] = buf[8];
hdr->reserved[3] = buf[9];
hdr->data_offset = le32_to_cpu(buf+10);
return 0;
}
You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.
You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.
Conversion in the opposite direction, btw, looks like this:
void cpu_to_le16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0x00FF);
buf[1] = (val & 0xFF00) >> 8;
}
void cpu_to_be16(uint8_t *buf, uint16_t val)
{
buf[0] = (val & 0xFF00) >> 8;
buf[1] = (val & 0x00FF);
}
Conversion of 32-/64-bit quantities left as an exercise.

I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.
Demonstrated in this code:
#include <stdio.h>
int main()
{
union {
short sval;
unsigned char bval[2];
} udata;
udata.sval = 1;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x424d;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
udata.sval = 0x4d42;
printf( "DEC[%5hu] HEX[%04hx] BYTES[%02hhx][%02hhx]\n"
, udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
return 0;
}
Gives the following output
DEC[ 1] HEX[0001] BYTES[01][00]
DEC[16973] HEX[424d] BYTES[4d][42]
DEC[19778] HEX[4d42] BYTES[42][4d]
So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.
Subsequent question:
I ask only because my file size is 3 instead of 196662
This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:
Offset
short magic_number; 00 - 01
int file_size; 02 - 05
short reserved_bytes[2]; 06 - 09
int data_offset; 0A - 0D
BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:
short magic_number; 00 - 01
<<unused padding>> 02 - 03
int file_size; 04 - 07
short reserved_bytes[2]; 08 - 0B
int data_offset; 0C - 0F
So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.

Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.