Bit/Byte adressing - Little/Big-endnian - c

Consider the 16-Bit data packet below, which is sent through the network in network byte order ie Big Endian:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (Byte num)
34 67 89 45 90 AB FF 23 65 37 56 C6 56 B7 00 00 (Value)
Lets say 8945 is a 16 bit value. All others are 8 bit data bytes.
On my system, which is little endian, how would the data be received and stored?
Lets say, we are configured to receive 8 bytes at a time. RxBuff is the Rx buffer where data will be received.
Buff is the storage buffer where data would be stored.
Please point out which case is correct for data storage after reading 8 bytes at a time:
1) Buff[] = {0x34, 0x67, 0x45, 0x89, 0x90, 0xAB....... 0x00};
2) Buff[] = {0x00, 0x00, .......0x67, 0x89, 0x45, 0x34};
Would the whole 16 bytes data be reversed or only the 2 bytes value contained in this packet?

Only the 2 bytes value contained in the packet will be reversed.
Endianess concern byte order, and not bit order.
This is explain on wikipedia

Related

PIC Embedded C printf Corrupted Output - Very odd

I have a 69 element array MBRespon[i] of hex data which I'm sending out of two USARTs on a PIC18F46K40. The first USART loops through the data and transmits everything fine, and when that's done the code goes through a 2nd loop where it prints the formatted data out of the 2nd UART using:
printf(" Byte %02i : 0x%02x \r\n", i, MBRespon[i]);
At first it looks like the data is being printed out fine, however upon closer inspection around the 57th byte it sends the wrong thing. At first I thought this might have been a EUSART2_TX_BUFFER_SIZE issue, and indeed changing the buffer sizes does have a impact; more gets corrupt when it's anything different than 32.
#define EUSART2_TX_BUFFER_SIZE 32 // 32 Works (Sort of)
#define EUSART2_RX_BUFFER_SIZE 32 // 32 Works (Sort of)
If i reduce the number of elements in the array to 58 or less it's fine, anything more and it's corrupt.
My code:
void PrintModRespon(){
int i=0;
printf("Modbus Response Count %i:\r\n",MBResCnt);
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
for(i=0; i< MBResCnt ; i++ ){
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
printf(" Byte %02i : 0x%02x \r\n", i, MBRespon[i]);
while(!EUSART2_is_tx_done()); // Hold until done.
}
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
printf("\r\n\n");
while(!EUSART2_is_tx_done());
}
I added in while(!EUSART2_is_tx_ready()); and while(!EUSART2_is_tx_done()); as I thought it may be the UART wasn't ready/busy, but they didn't make any difference.
UART1 Output (The good one):
06 03 40 00 01 00 02 00 03 ....etc.... 00 1a 00 1b 00 1c 00 1d 00 1e 00 1f 00 20 5c 30
UART2 Output (The bad one):
Modbus Response Count 69:
Byte 00 : 0x06
Byte 01 : 0x03
Byte 02 : 0x40
Byte 03 : 0x00
Byte 04 : 0x01
Byte 05 : 0x00
Byte 06 : 0x02
Byte 07 : 0x00
Byte 08 : 0x03
..etc...
Byte 53 : 0x00
Byte 54 : 0x1a
Byte 55 : 0x00
Byte 56 : 0x1b
Byte 57 : 0x14 // <-- This is NOT in the array!
Byte 58 : 0x1c
Byte 59 : 0x00
Byte 60 : 0x1d
Byte 61 : 0x00
Byte 62 : 0x1e
Byte 63 : 0x00
Byte 64 : 0x1f
Byte 65 : 0x00
Byte 66 : 0x20
Byte 67 : 0x5c
Byte 68 : 0x30
Oddly, as another test, I changed the printf function to:
printf(" Byte %02i : 0x%02x \r\n", i, 0x00 +i);
I.E Not using the array of data, and it output 69x incremented values fine! Which suggests it's not a UART buffer/timing issue?
Modbus Response Count 69:
Byte 00 : 0x00
Byte 01 : 0x01
Byte 02 : 0x02
...etc.
Byte 51 : 0x33
Byte 52 : 0x34
Byte 53 : 0x35
Byte 54 : 0x36
Byte 55 : 0x37
Byte 56 : 0x38
Byte 57 : 0x39
Byte 58 : 0x3a
Byte 59 : 0x3b
Byte 60 : 0x3c
Byte 61 : 0x3d
Byte 62 : 0x3e
Byte 63 : 0x3f
Byte 64 : 0x40
Byte 65 : 0x41
Byte 66 : 0x42
Byte 67 : 0x43
Byte 68 : 0x44
Any suggestions greatly appreciated. I've been stuck on this for days now!
This is very likely a padding issue since the first element of your structure has an odd number of bytes.
When defining structures, the compiler will create one-byte padding between elements if they do not fall on the default alignment of the processor. A 16-bit processor will want most things aligned on word boundaries (i.e. even addresses), including byte arrays (e.g. arrays of char). The exception is elements that only occupy one byte.
On a 16-bit Microchip MPU, if you try to access word or dword data on odd address, it will cause a memory fault. The compiler is trying to keep this from happening.
If your structure only contains byte-sized elements (char, uint8_t, etc.), or arrays of the same, you can force byte alignment by adding the qualifier __attribute__((packed)) to the structure declaration. This can be dangerous because you may end up with elements having odd addresses. This is OK as long as the odd-addressed elements are only accessed as bytes (e.g. a char array), but proceed with caution.

Array in Hexadecimal in Assembly x86 MASM

If:
(I believe the registers are adjacent to one another...)
A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
B WORD 0d30, 0d40, 0d70, 0hB
D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
Then:
What is [A+2]? The answer is 0d20 or 0x15
What is [B+2]? The answer is 40 or 0x28
What is [D+4]? Not sure
What is [D-10]? Not sure
I think those are the answers but I am not sure. Since a WORD is 1 BYTE, AND DWORD is 2 WORDS, then as a result when you are counting the array of [B+2] for example, you should be starting at 0d30, then 0d40 (count two WORD). And [A+2] is 0d20 because you are counting two bytes. What am I doing wrong? Please help. Thank you
EDIT
So is it because: Taking into account that the first value of A,B, and D are offsets x86 is little endian... A = 0d10, count 2 more from there B...bytes (in decimal) = 30,0,40,0,70,0,11,0 B is 0d40, count 2 more bytes from that D...bytes (in hex) = 0x200, 0,0,0,...0,2,0,0,...0x10,3,0,0,...0,4,0,0,...0,5,0,0,...0,6,‌​0,0 D is 0x200. Count 4 bytes from there. Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C? Thank you
Also if I did [B-3], would it be 0d13? I was told it actually is between 0d10 and 0d13 such that it will be 0A0D and due to little endian will be 0D0A. Is that correct? Thank you!!
EDIT
WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).
Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):
address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A
WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.
So WORD 0x1234 is stored in memory at address 0xDEAD as:
address | value
----------------
DEAD | 34
DEAE | 12
Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).
That means you have no registers in your question, and it makes no sense to talk about registers in it.
In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).
[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.
Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).
I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.
edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.
; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00
Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.
Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).
Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.
Can you explain how did you get 0d13 of A and through to 0d30 of B #Jester?
Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.
Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)
each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?
It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.
first value of A,B, and D are offsets x86 is little endian
"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).
OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).
In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.
B is 0d40
[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".
Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).
D...bytes (in hex) = 0x200, 0,0,0,...
0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.
Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.
D is 0x200.
No, D is address. And even [D] is 0xB0.
Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?
B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.

Dissecting a binary file in C

I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.
Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.

Endian representation of 64-bit values

Suppose I have unsigned long long x = 0x0123456789ABCDEF.
Which of the following is correct? (I can verify only the first one):
On a 32-bit little-endian processor, it will appear in memory as 67 45 23 01 EF CD AB 89.
On a 64-bit little-endian processor, it will appear in memory as EF CD AB 89 67 45 23 01.
On a 32-bit big-endian processor, it will appear in memory as 01 23 45 67 89 AB CD EF.
On a 64-bit big-endian processor, it will appear in memory as 01 23 45 67 89 AB CD EF.
The first one is wrong. On ia32 at least the layout is EF CD AB 89 67 45 23 01.
The others are correct.
Little endian means the least-significant bits are in the first byte, and big endian means the least-significant bits are in the last byte:
0x0123456789ABCDEF big endian is 0x01, 0x23, 0x45 ...
0x0123456789ABCDEF little endian is 0xEF, 0xCD, 0xAB ...
The native word endianess and size of the processor is inconsequential; the appearance in memory is dictated by the endian.
I'd say the 32-bit solution is very much up to the compiler. It can choose to represent this type that it lacks native support for in any way it pleases, as long as the size is the expected one.
The 64-bit ones I'd agree with as being correct.

store 300*1024*1024 in 64bit variable as low and high bit

I am trying to understand how 300*1024*1024 value will be stored in a 64bit variable on a big endian machine and how will we evaluate the high and low bytes?
Build a union with long integer and an array of 8 unsigned chars and see for yourself. You can view the unsigned chars in hex if you want.
Big-endian hardware stores the most significant byte first in memory. Little-endian hardware stores the least significant byte first. In hex 300*1024*1024 is 0x12C00000.
So, for your big-endian hardware it will be stored like so:
byte number 1 2 3 4 5 6 7 8
value 00 00 00 00 12 C0 00 00
On LE hardware the bytes will be stored in reverse order:
byte number 1 2 3 4 5 6 7 8
value 00 00 C0 12 00 00 00 00

Resources