If:
(I believe the registers are adjacent to one another...)
A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
B WORD 0d30, 0d40, 0d70, 0hB
D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
Then:
What is [A+2]? The answer is 0d20 or 0x15
What is [B+2]? The answer is 40 or 0x28
What is [D+4]? Not sure
What is [D-10]? Not sure
I think those are the answers but I am not sure. Since a WORD is 1 BYTE, AND DWORD is 2 WORDS, then as a result when you are counting the array of [B+2] for example, you should be starting at 0d30, then 0d40 (count two WORD). And [A+2] is 0d20 because you are counting two bytes. What am I doing wrong? Please help. Thank you
EDIT
So is it because: Taking into account that the first value of A,B, and D are offsets x86 is little endian... A = 0d10, count 2 more from there B...bytes (in decimal) = 30,0,40,0,70,0,11,0 B is 0d40, count 2 more bytes from that D...bytes (in hex) = 0x200, 0,0,0,...0,2,0,0,...0x10,3,0,0,...0,4,0,0,...0,5,0,0,...0,6,0,0 D is 0x200. Count 4 bytes from there. Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C? Thank you
Also if I did [B-3], would it be 0d13? I was told it actually is between 0d10 and 0d13 such that it will be 0A0D and due to little endian will be 0D0A. Is that correct? Thank you!!
EDIT
WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).
Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):
address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A
WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.
So WORD 0x1234 is stored in memory at address 0xDEAD as:
address | value
----------------
DEAD | 34
DEAE | 12
Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).
That means you have no registers in your question, and it makes no sense to talk about registers in it.
In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).
[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.
Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).
I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.
edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.
; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00
Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.
Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).
Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.
Can you explain how did you get 0d13 of A and through to 0d30 of B #Jester?
Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.
Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)
each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?
It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.
first value of A,B, and D are offsets x86 is little endian
"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).
OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).
In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.
B is 0d40
[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".
Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).
D...bytes (in hex) = 0x200, 0,0,0,...
0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.
Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.
D is 0x200.
No, D is address. And even [D] is 0xB0.
Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?
B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.
Related
I have been searching high and low for an easy way to manually align any x64 PE to 0x10000 section alignment and all i could find was pe file aligment or realignment.
As a test case i used the source code of notepad++ and compiled it with vs 2022 community.
The first time with a section alignment of 0x1000 the pagesize default aligment and the second time with 0x10000 section aligment and i was comparing the differences.
The vs linker didn't change the file(raw address, raw size) in both files
The only thing changed was the size of image,section aligment,base of code and the virtual address of the sections(in first file the .text section was starting at 0x1000 and in second at 0x10000).
Making a copy of the first file with 0x1000 alignment and manually changed all the fields to try to section align the file to 0x10000 resulted in an invalid pe.
When vs linker does it it does not.
How to compute the correct virtual addresses based on the 0x10000 section aligment
for the PE sections?
I have assembled and linked two simple PE files: t1.asm with standard section alignment 4 KiB
EUROASM CPU=X64, TimeStamp=0
t1 PROGRAM Format=PE, Width=64, Entry=Start, IconFile=, \
FileAlign=512, SectionAlign=0x1000
[.text]
Start: RET
[.data]
DB "PE"
ENDPROGRAM
and t2.asm with alignment increased to 64 KiB:
EUROASM CPU=X64, TimeStamp=0
t2 PROGRAM Format=PE, Width=64, Entry=Start, IconFile=, \
FileAlign=512, SectionAlign=0x10000
[.text]
Start: RET
[.data]
DB "PE"
ENDPROGRAM
assembled and linked them with EuroAssembler and compared the executables:
R:\>euroasm t1.asm, t2.asm, nowarn=0..999
R:\>dir t?.exe
Directory of R:\
07.07.2022 09:48 1 026 t1.exe
07.07.2022 09:48 1 026 t2.exe
R:\>fc t1.exe t2.exe
Comparing files t1.exe and T2.EXE
000000AD: 10 00
000000AE: 00 01
000000B1: 10 00
000000B2: 00 01
000000B9: 10 00
000000BA: 00 01
000000BD: 10 00
000000BE: 00 01
000000C9: 10 00
000000CA: 00 01
000000E1: 30 00
000000E2: 00 03
000001A1: 10 00
000001A2: 00 01
000001A5: 10 00
000001A6: 00 01
000001BE: D0 50
000001C9: 10 00
000001CA: 00 01
000001CD: 20 00
000001CE: 00 02
000001E6: D0 50
As expected, in t1.exe sections .text and .data start at RVA 0x1000 and 0x2000, while in t2.exe they start at 0x10000 and 0x20000.
According to MS PE32+ specification the differences at file offsets 000000AD..000000B2 is SizeOfCode and SizeOfInitializedData in virtual address space rounded up to section alignment.
Differences at B8..BF are AddressOfEntryPoint and BaseOfCode (RVA of the first byte in .text).
Difference at C8..CB coresponds with SectionAlignment defined in optional header.
Difference at E0..E3 coresponds with SizeOfImage rounded up to section alignment.
Following differences concern section headers.
1A0..1A7 describe VirtualSize and VirtualAddress (RVA) increased from 0x1000 to 0x10000. Difference at 1BE concerns section-header Characteristics, where the value 0x00D00000 specifies 4 KiB section alignment and 0x00500000 specifies 16 B alignment (64 KiB section alignment isn't available in MS specification, so €ASM reverts to default 16 B, nevertheless this characteristic is relevant in object files only).
The remaining five differeces are analogous for section .data.
As you can see, there are many places where you will need to manually change the contents of PE file to keep it valid.
Let's say we have some 128bit floating point number, for example x = 2.6 (1.3 * 2^1 ieee-754).
I put in in union like this:
union flt {
long double flt;
int64_t byte8[OCTALC];
} d;
d = x;
Then i run this to get it hexadecimal representation in memory:
void print_bytes(void *ptr, int size)
{
unsigned char *p = ptr;
int i;
for (i=0; i<size; i++) {
printf("%02hhX ", p[i]);
}
printf("\n");
}
// some where in the code
print_bytes(&d.byte8[0], 16);
And i get something like
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
So by assumption i expect to see one of the leading bits(the left ones) to be 1(because exponent of 2.6 is 1) but in fact i see right bits to be 1(like it treating value big-endian). If i flip sign the output changes to:
66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00
So it seems like sign bit is righter than i thought. And if you count the bytes it seems like there is only 10 bytes used remaining 6 is like truncated or something.
I trying to find out why this happens any help?
You have a number of misconceptions.
First of all, you don't have a 128-bit floating point number. long double is probably a float in the x86 extended precision format on an x86-64. This is an 80 bit (10 byte) value, which is padded to 16 bytes. (I suspect this is for alignment purposes.)
And of course, it's going to be in little-endian byte order (since this is an x86/x86-64). This doesn't refer to the order of the bits in each byte, it refers to the order of the bytes in the whole.
And finally, the exponent is biased. An exponent of 1 isn't stored as 1. It's stored as 1+0x3FFF. This allows for negative exponents.
So we get the following:
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
Demo on Compiler Explorer
If we remove the padding and reverse the bytes to better match the image in the Wikipedia page, we get
4000A666666666666666
This translates to
+0x1.4CCCCCCCCCCCCCCC × 2^(0x4000-0x3FFF)
(0xA66...6 = 0b1010 0110 0110...0110 ⇒ 0b1.0100 1100 1100...110[0] = 0x1.4CC...C)
or
+1.29999999999999999995663191310057982263970188796520233154296875 × 2^1
Decimal conversion obtained using
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "4CCCCCCCCCCCCCCC" ) /
Math::BigFloat->from_hex( "10000000000000000" )
'
or
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "A666666666666666" ) /
Math::BigFloat->from_hex( "8000000000000000" )
'
You've been bamboozled by some very strange aspects of the way extended-precision floating-point is typically implemented in C on Intel architectures. So don't feel too bad. :-)
What you're seeing is that although sizeof(long double) may be 16 (== 128 bits), deep down inside what you're really getting is the 80-bit Intel extended format. It's being padded out with 6 bytes, which in your case happen to be 0. So, yes, "the sign bit is righter than you thought".
I see the same thing on my machine, and it's something I've always wondered about. It seems like a real waste, doesn't it? I used to think it was for some kind of compatibility with machines which actually do have 128-bit long doubles. But that can't be it, because this 0-padded 16-byte format is not binary-compatible with true IEEE 128-bit floating point, among other things because the padding is on the wrong end.
I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.
Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.
I am trying to understand how 300*1024*1024 value will be stored in a 64bit variable on a big endian machine and how will we evaluate the high and low bytes?
Build a union with long integer and an array of 8 unsigned chars and see for yourself. You can view the unsigned chars in hex if you want.
Big-endian hardware stores the most significant byte first in memory. Little-endian hardware stores the least significant byte first. In hex 300*1024*1024 is 0x12C00000.
So, for your big-endian hardware it will be stored like so:
byte number 1 2 3 4 5 6 7 8
value 00 00 00 00 12 C0 00 00
On LE hardware the bytes will be stored in reverse order:
byte number 1 2 3 4 5 6 7 8
value 00 00 C0 12 00 00 00 00
If I debug this C code:
unsigned int i = 0xFF;
Will the debugger show me in memory 0x00FF if it's a little endian system and 0xFF00 if it's a big endian system?
If:
Your system has 8-bit chars.
Your system uses 16-bit unsigned ints
And your debugger displays memory as simply bytes of hex
You would see this at &i if it's little-endian:
ff 00 ?? ?? ?? ?? ?? ??
and this if it's big-endian:
00 ff ?? ?? ?? ?? ?? ??
The question marks simply represent the bytes following i in memory.
Note that:
A little-endian machine will store the byte with the lowest value at the lowest address.
A big-endian machine will store the byte with the largest value at the lowest address.
C doesn't specify how many bits are in an unsigned int. It's up to the implementation.
You can find out using CHAR_BIT * sizeof (unsigned int).
If instead your machine uses 32-bit unsigned char, you would see:
ff 00 00 00 ?? ?? ?? ??
in the little-endian case, and on a big-endian machine you would see:
00 00 00 ff ?? ?? ?? ??
If you view the raw contents of memory around the address &i then yes, you will be able to tell the difference. But it's going to be 00 00 00 ff for big endian and ff 00 00 00 for little endian, assuming sizeof(unsigned int) is 4 and CHAR_BIT (number of bits per byte) is 8 -- in the question you have them swapped.
You can also detect endianness programmatically (i.e. you can make a program that prints out the endianness of the architecture it runs on) through one of several tricks; for example, see Detecting endianness programmatically in a C++ program (solutions apply to C too).