If:
(I believe the registers are adjacent to one another...)
A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
B WORD 0d30, 0d40, 0d70, 0hB
D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
Then:
What is [A+2]? The answer is 0d20 or 0x15
What is [B+2]? The answer is 40 or 0x28
What is [D+4]? Not sure
What is [D-10]? Not sure
I think those are the answers but I am not sure. Since a WORD is 1 BYTE, AND DWORD is 2 WORDS, then as a result when you are counting the array of [B+2] for example, you should be starting at 0d30, then 0d40 (count two WORD). And [A+2] is 0d20 because you are counting two bytes. What am I doing wrong? Please help. Thank you
EDIT
So is it because: Taking into account that the first value of A,B, and D are offsets x86 is little endian... A = 0d10, count 2 more from there B...bytes (in decimal) = 30,0,40,0,70,0,11,0 B is 0d40, count 2 more bytes from that D...bytes (in hex) = 0x200, 0,0,0,...0,2,0,0,...0x10,3,0,0,...0,4,0,0,...0,5,0,0,...0,6,0,0 D is 0x200. Count 4 bytes from there. Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C? Thank you
Also if I did [B-3], would it be 0d13? I was told it actually is between 0d10 and 0d13 such that it will be 0A0D and due to little endian will be 0D0A. Is that correct? Thank you!!
EDIT
WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).
Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):
address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A
WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.
So WORD 0x1234 is stored in memory at address 0xDEAD as:
address | value
----------------
DEAD | 34
DEAE | 12
Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).
That means you have no registers in your question, and it makes no sense to talk about registers in it.
In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).
[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.
Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).
I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.
edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.
; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00
Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.
Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).
Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.
Can you explain how did you get 0d13 of A and through to 0d30 of B #Jester?
Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.
Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)
each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?
It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.
first value of A,B, and D are offsets x86 is little endian
"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).
OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).
In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.
B is 0d40
[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".
Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).
D...bytes (in hex) = 0x200, 0,0,0,...
0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.
Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.
D is 0x200.
No, D is address. And even [D] is 0xB0.
Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?
B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.
I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.
Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.
So, I have a program to sniff packets, and so far it has been working very well. I am now trying to add additional functionality, but I keep running into problems when I try to decode the ethertype of a captured packet's Ethernet header. To isolate the location in which the error is occurring, I wrote a modified program to dump only the Ethernet header in hexadecimal, and then, using custom headers, decode the the source MAC address, destination MAC address, and ethertype, but this is the output:
Sniffing on device wlan0
308 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 0000
328 BYTE PACKET
33 33 00 00 00 fb 04 e5 36 4a 5d 3a 86 dd | 33......6J]:..
SRC 33:33:00:00:00:fb DST 04:e5:36:4a:5d:3a TYPE ff11
194 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 0000
Captured 3 packets
Apparently, the program has no problem decoding the MAC addresses, which are the first twelve bytes of the fourteen byte header, but, for some reason, when it reaches the final two bytes comprising the ethertype, the program fails to display them properly. The first and third packets' ethertypes are 0x0800, or IP, and the second packet's ethertype is 0x86dd, or IPv6. I have tried displaying the ethertype in numerous formats, but none have yielded proper results.
These are the lines of code responsible for displaying the MAC addresses and ethertype:
ethernet_header = (const struct sniff_ethernet_hdr *)packet;
printf ("SRC %02x", ethernet_header->ether_shost[0]);
for (i = 1; i < ETH_ALEN; i++)
printf (":%02x", ethernet_header->ether_shost[i]);
printf (" DST %02x", ethernet_header->ether_dhost[0]);
for (i = 1; i < ETH_ALEN; i++)
printf (":%02x", ethernet_header->ether_dhost[i]);
printf (" TYPE %.4x", ethernet_header->ether_type);
printf ("\n");
Does anyone have any suggestions or notice any problems with the code?
EDIT: I continued to play with this program, and I discovered something strange. I set a pointer called eth_ptr to point to the thirteenth byte of the packet, where the ethertype begins. When I dereferenced this pointer and printed the result, it was, indeed, the first byte of the ethertype. So, I added a line to the program that prints the addresses of eth_ptr and the ethertype of the Ethernet header struct. These are the results:
Sniffing on device wlan0
104 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 8
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
124 BYTE PACKET
33 33 00 00 00 fb 04 e5 36 4a 5d 3a 86 dd | 33......6J]:..
SRC 33:33:00:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 86
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
319 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 8
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
Captured 3 packets
$ gdb -q
(gdb) p /x 0x7ffafd0b194c - 0x7ffafd0b1944
$1 = 0x8
The type that is being printed is really the dereferenced pointer, which points to the first byte of the ethertype. This pointer is located eight bytes before ethernet_header->ether_type in memory; therefore, the problem is that the ethertype element of the struct is located eight bytes ahead of where it is supposed to be. I do not know why this is, or how to fix this. Can anyone offer an explanation?
EDIT AGAIN: Well, I am a fool. I just took a close look at the Ethernet header structure:
struct sniff_ethernet_hdr
{
uint8_t ether_shost[ETH_ALEN];
uint8_t ether_dhost[ETH_HLEN];
uint16_t ether_type;
} __attribute__ ((__packed__));
The macros:
#define ETH_ALEN 6
#define ETH_HLEN 14
After I fixed this careless mistake, the program ran flawlessly.
I am trying to understand how 300*1024*1024 value will be stored in a 64bit variable on a big endian machine and how will we evaluate the high and low bytes?
Build a union with long integer and an array of 8 unsigned chars and see for yourself. You can view the unsigned chars in hex if you want.
Big-endian hardware stores the most significant byte first in memory. Little-endian hardware stores the least significant byte first. In hex 300*1024*1024 is 0x12C00000.
So, for your big-endian hardware it will be stored like so:
byte number 1 2 3 4 5 6 7 8
value 00 00 00 00 12 C0 00 00
On LE hardware the bytes will be stored in reverse order:
byte number 1 2 3 4 5 6 7 8
value 00 00 C0 12 00 00 00 00
Consider the 16-Bit data packet below, which is sent through the network in network byte order ie Big Endian:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (Byte num)
34 67 89 45 90 AB FF 23 65 37 56 C6 56 B7 00 00 (Value)
Lets say 8945 is a 16 bit value. All others are 8 bit data bytes.
On my system, which is little endian, how would the data be received and stored?
Lets say, we are configured to receive 8 bytes at a time. RxBuff is the Rx buffer where data will be received.
Buff is the storage buffer where data would be stored.
Please point out which case is correct for data storage after reading 8 bytes at a time:
1) Buff[] = {0x34, 0x67, 0x45, 0x89, 0x90, 0xAB....... 0x00};
2) Buff[] = {0x00, 0x00, .......0x67, 0x89, 0x45, 0x34};
Would the whole 16 bytes data be reversed or only the 2 bytes value contained in this packet?
Only the 2 bytes value contained in the packet will be reversed.
Endianess concern byte order, and not bit order.
This is explain on wikipedia