Endian representation of 64-bit values - c

Suppose I have unsigned long long x = 0x0123456789ABCDEF.
Which of the following is correct? (I can verify only the first one):
On a 32-bit little-endian processor, it will appear in memory as 67 45 23 01 EF CD AB 89.
On a 64-bit little-endian processor, it will appear in memory as EF CD AB 89 67 45 23 01.
On a 32-bit big-endian processor, it will appear in memory as 01 23 45 67 89 AB CD EF.
On a 64-bit big-endian processor, it will appear in memory as 01 23 45 67 89 AB CD EF.

The first one is wrong. On ia32 at least the layout is EF CD AB 89 67 45 23 01.
The others are correct.

Little endian means the least-significant bits are in the first byte, and big endian means the least-significant bits are in the last byte:
0x0123456789ABCDEF big endian is 0x01, 0x23, 0x45 ...
0x0123456789ABCDEF little endian is 0xEF, 0xCD, 0xAB ...
The native word endianess and size of the processor is inconsequential; the appearance in memory is dictated by the endian.

I'd say the 32-bit solution is very much up to the compiler. It can choose to represent this type that it lacks native support for in any way it pleases, as long as the size is the expected one.
The 64-bit ones I'd agree with as being correct.

Related

128 bit floating point binary representation error

Let's say we have some 128bit floating point number, for example x = 2.6 (1.3 * 2^1 ieee-754).
I put in in union like this:
union flt {
long double flt;
int64_t byte8[OCTALC];
} d;
d = x;
Then i run this to get it hexadecimal representation in memory:
void print_bytes(void *ptr, int size)
{
unsigned char *p = ptr;
int i;
for (i=0; i<size; i++) {
printf("%02hhX ", p[i]);
}
printf("\n");
}
// some where in the code
print_bytes(&d.byte8[0], 16);
And i get something like
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
So by assumption i expect to see one of the leading bits(the left ones) to be 1(because exponent of 2.6 is 1) but in fact i see right bits to be 1(like it treating value big-endian). If i flip sign the output changes to:
66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00
So it seems like sign bit is righter than i thought. And if you count the bytes it seems like there is only 10 bytes used remaining 6 is like truncated or something.
I trying to find out why this happens any help?
You have a number of misconceptions.
First of all, you don't have a 128-bit floating point number. long double is probably a float in the x86 extended precision format on an x86-64. This is an 80 bit (10 byte) value, which is padded to 16 bytes. (I suspect this is for alignment purposes.)
And of course, it's going to be in little-endian byte order (since this is an x86/x86-64). This doesn't refer to the order of the bits in each byte, it refers to the order of the bytes in the whole.
And finally, the exponent is biased. An exponent of 1 isn't stored as 1. It's stored as 1+0x3FFF. This allows for negative exponents.
So we get the following:
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
Demo on Compiler Explorer
If we remove the padding and reverse the bytes to better match the image in the Wikipedia page, we get
4000A666666666666666
This translates to
+0x1.4CCCCCCCCCCCCCCC × 2^(0x4000-0x3FFF)
(0xA66...6 = 0b1010 0110 0110...0110 ⇒ 0b1.0100 1100 1100...110[0] = 0x1.4CC...C)
or
+1.29999999999999999995663191310057982263970188796520233154296875 × 2^1
Decimal conversion obtained using
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "4CCCCCCCCCCCCCCC" ) /
Math::BigFloat->from_hex( "10000000000000000" )
'
or
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "A666666666666666" ) /
Math::BigFloat->from_hex( "8000000000000000" )
'
You've been bamboozled by some very strange aspects of the way extended-precision floating-point is typically implemented in C on Intel architectures. So don't feel too bad. :-)
What you're seeing is that although sizeof(long double) may be 16 (== 128 bits), deep down inside what you're really getting is the 80-bit Intel extended format. It's being padded out with 6 bytes, which in your case happen to be 0. So, yes, "the sign bit is righter than you thought".
I see the same thing on my machine, and it's something I've always wondered about. It seems like a real waste, doesn't it? I used to think it was for some kind of compatibility with machines which actually do have 128-bit long doubles. But that can't be it, because this 0-padded 16-byte format is not binary-compatible with true IEEE 128-bit floating point, among other things because the padding is on the wrong end.

Array in Hexadecimal in Assembly x86 MASM

If:
(I believe the registers are adjacent to one another...)
A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
B WORD 0d30, 0d40, 0d70, 0hB
D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
Then:
What is [A+2]? The answer is 0d20 or 0x15
What is [B+2]? The answer is 40 or 0x28
What is [D+4]? Not sure
What is [D-10]? Not sure
I think those are the answers but I am not sure. Since a WORD is 1 BYTE, AND DWORD is 2 WORDS, then as a result when you are counting the array of [B+2] for example, you should be starting at 0d30, then 0d40 (count two WORD). And [A+2] is 0d20 because you are counting two bytes. What am I doing wrong? Please help. Thank you
EDIT
So is it because: Taking into account that the first value of A,B, and D are offsets x86 is little endian... A = 0d10, count 2 more from there B...bytes (in decimal) = 30,0,40,0,70,0,11,0 B is 0d40, count 2 more bytes from that D...bytes (in hex) = 0x200, 0,0,0,...0,2,0,0,...0x10,3,0,0,...0,4,0,0,...0,5,0,0,...0,6,‌​0,0 D is 0x200. Count 4 bytes from there. Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C? Thank you
Also if I did [B-3], would it be 0d13? I was told it actually is between 0d10 and 0d13 such that it will be 0A0D and due to little endian will be 0D0A. Is that correct? Thank you!!
EDIT
WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).
Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):
address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A
WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.
So WORD 0x1234 is stored in memory at address 0xDEAD as:
address | value
----------------
DEAD | 34
DEAE | 12
Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).
That means you have no registers in your question, and it makes no sense to talk about registers in it.
In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).
[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.
Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).
I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.
edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.
; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00
Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.
Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).
Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.
Can you explain how did you get 0d13 of A and through to 0d30 of B #Jester?
Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.
Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)
each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?
It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.
first value of A,B, and D are offsets x86 is little endian
"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).
OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).
In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.
B is 0d40
[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".
Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).
D...bytes (in hex) = 0x200, 0,0,0,...
0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.
Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.
D is 0x200.
No, D is address. And even [D] is 0xB0.
Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?
B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.

Dissecting a binary file in C

I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.
Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.

store 300*1024*1024 in 64bit variable as low and high bit

I am trying to understand how 300*1024*1024 value will be stored in a 64bit variable on a big endian machine and how will we evaluate the high and low bytes?
Build a union with long integer and an array of 8 unsigned chars and see for yourself. You can view the unsigned chars in hex if you want.
Big-endian hardware stores the most significant byte first in memory. Little-endian hardware stores the least significant byte first. In hex 300*1024*1024 is 0x12C00000.
So, for your big-endian hardware it will be stored like so:
byte number 1 2 3 4 5 6 7 8
value 00 00 00 00 12 C0 00 00
On LE hardware the bytes will be stored in reverse order:
byte number 1 2 3 4 5 6 7 8
value 00 00 C0 12 00 00 00 00

Seeing value being stored in memory to check endianess?

If I debug this C code:
unsigned int i = 0xFF;
Will the debugger show me in memory 0x00FF if it's a little endian system and 0xFF00 if it's a big endian system?
If:
Your system has 8-bit chars.
Your system uses 16-bit unsigned ints
And your debugger displays memory as simply bytes of hex
You would see this at &i if it's little-endian:
ff 00 ?? ?? ?? ?? ?? ??
and this if it's big-endian:
00 ff ?? ?? ?? ?? ?? ??
The question marks simply represent the bytes following i in memory.
Note that:
A little-endian machine will store the byte with the lowest value at the lowest address.
A big-endian machine will store the byte with the largest value at the lowest address.
C doesn't specify how many bits are in an unsigned int. It's up to the implementation.
You can find out using CHAR_BIT * sizeof (unsigned int).
If instead your machine uses 32-bit unsigned char, you would see:
ff 00 00 00 ?? ?? ?? ??
in the little-endian case, and on a big-endian machine you would see:
00 00 00 ff ?? ?? ?? ??
If you view the raw contents of memory around the address &i then yes, you will be able to tell the difference. But it's going to be 00 00 00 ff for big endian and ff 00 00 00 for little endian, assuming sizeof(unsigned int) is 4 and CHAR_BIT (number of bits per byte) is 8 -- in the question you have them swapped.
You can also detect endianness programmatically (i.e. you can make a program that prints out the endianness of the architecture it runs on) through one of several tricks; for example, see Detecting endianness programmatically in a C++ program (solutions apply to C too).

Resources