Could you please help me understand the purpose of the two assembly instructions in below ? (for more context, assembly + C code at the end). Thanks !
movzx edx,BYTE PTR [edx+0xa]
mov BYTE PTR [eax+0xa],dl
===================================
Assembly code below:
push ebp
mov ebp,esp
and esp,0xfffffff0
sub esp,0x70
mov eax,gs:0x14
mov DWORD PTR [esp+0x6c],eax
xor eax,eax
mov edx,0x8048520
lea eax,[esp+0x8]
mov ecx,DWORD PTR [edx]
mov DWORD PTR [eax],ecx
mov ecx,DWORD PTR [edx+0x4]
mov DWORD PTR [eax+0x4],ecx
movzx ecx,WORD PTR [edx+0x8]
mov WORD PTR [eax+0x8],cx
movzx edx,BYTE PTR [edx+0xa] ; instruction 1
mov BYTE PTR [eax+0xa],dl ; instruction 2
mov edx,DWORD PTR [esp+0x6c]
xor edx,DWORD PTR gs:0x14
je 804844d <main+0x49>
call 8048320 <__stack_chk_fail#plt>
leave
ret
===================================
C source code below (without libraries inclusion):
int main() {
char str_a[100];
strcpy(str_a, "eeeeefffff");
}
It inlined the strcpy() call, the code generator can tell that 11 bytes need to be copied. The string literal "eeeeefffff" has 10 characters, one extra for the zero terminator.
The code optimizer unrolled the copy loop to 4 moves, moving 4 + 4 + 2 + 1 bytes. It needs to be done this way because there is no processor instruction that moves 3 bytes. The instructions you are asking about copy the 11th byte. Using movzx is a bit overkill but it is probably faster than loading the DL register.
Observe the changes in the generated code when you alter the string. Adding an extra letter should unroll to 3 moves, 4 + 4 + 4. When the string gets too long you ought to see it fall back to something like memmove.
Related
First of all, I am a student, I do not yet have extensive knowledge about C, C ++ and assembler, so I am making a extreme effort to understand it.
I have this piece of assembly code from an Intel x86-32 bit processor.
My goal is to transform it to source code.
0x80483dc <main>: push ebp
0x80483dd <main+1>: mov ebp,esp
0x80483df <main+3>: sub esp,0x10
0x80483e2 <main+6>: mov DWORD PTR [ebp-0x8],0x80484d0
0x80483e9 <main+13>: lea eax,[ebp-0x8]
0x80483ec <main+16>: mov DWORD PTR [ebp-0x4],eax
0x80483ef <main+19>: mov eax,DWORD PTR [ebp-0x4]
0x80483f2 <main+22>: mov edx,DWORD PTR [eax+0xc]
0x80483f5 <main+25>: mov eax,DWORD PTR [ebp-0x4]
0x80483f8 <main+28>: movzx eax,WORD PTR [eax+0x10]
0x80483fc <main+32>: cwde
0x80483fd <main+33>: add edx, eax
0x80483ff <main+35>: mov eax,DWORD PTR [ebp-0x4]
0x8048402 <main+38>: mov DWORD PTR [eax+0xc],edx
0x8048405 <main+41>: mov eax,DWORD PTR [ebp-0x4]
0x8048408 <main+44>: movzx eax,BYTE PTR [eax]
0x804840b <main+47>: cmp al,0x4f
0x804840d <main+49>: jne 0x8048419 <main+61>
0x804840f <main+51>: mov eax,DWORD PTR [ebp-0x4]
0x8048412 <main+54>: movzx eax,BYTE PTR [eax]
0x8048415 <main+57>: cmp al,0x4b
0x8048417 <main+59>: je 0x804842d <main+81>
0x8048419 <main+61>: mov eax,DWORD PTR [ebp-0x4]
0x804841c <main+64>: mov eax,DWORD PTR [eax+0xc]
0x804841f <main+67>: mov edx, eax
0x8048421 <main+69>: and edx,0xf0f0f0f
0x8048427 <main+75>: mov eax,DWORD PTR [ebp-0x4]
0x804842a <main+78>: mov DWORD PTR [eax+0x4],edx
0x804842d <main+81>: mov eax,0x0
0x8048432 <main+86>: leave
0x8048433 <main+87>: ret
This is what I understand from the code:
There are 4 variables:
a = [ebp-0x8] ebp
b = [ebp-0x4] eax
c = [eax + 0xc] edx
d = [eax + 0x10] eax
Values:
0x4 = 4
0x8 = 8
0xc = 12
0x10 = 16
0x4b = 75
0x4f = 79
Types:
char (8 bits) = 1 BYTE
short (16 bits) = WORD
int (32 bit) = DWORD
long (32 bits) = DWORD
long long (32 bit) = DWORD
This is what I was able to create:
#include <stdio.h>
int main (void)
{
int a = 0x80484d0;
int b
short c;
int d;
c + b?
if (79 <= al) {
instructions
} else {
instructions
}
return 0
}
But I'm stuck. Nor can I understand what the sentence "cmp al .." compares to, what is "al"?
How do these instructions work?
EDIT1:
That said, as you comment the assembly seems to be wrong or as someone comments say, it is insane!
The code and the exercise are from the following book called: "Reversing, Reverse Engineering" on page 140 (3.8 Proposed Exercises). It would never have occurred to me that it was wrong, if so, this clearly makes it difficult for me to learn ...
So it is not possible to do a reversing to get the source code because it is not a good assembly? Maybe I am not oppressed? Is it possible to optimize it?
EDIT2:
Hi!
I did ask and finally she says this should be the c code:
inf foo(void){
char *string;//ebp-0x8
unsigned int *pointerstring//[ebp-0x4]
unsigned int *position;
*position = *(pointerstring+0xc);
unsigned char character;
character=(unsigned char) string[*position];
if ((character != 0x4)||(character != 0x4b))
{
*(position+0x4)=(unsigned int)(*position & 0x0f0f0f0f);
}
return(0);
}
Does it have any sense at all for you?, could someone please explain this to me?
Does anyone really program like this?
Thanks very much!
Your assembly is completely insane. This is roughly equivalent C:
int main() {
int i = 0x80484d0; // in ebp-8
int *p = &i; // in ebp-4
p[3] += (short)p[4]; // add argc to the return address(!)
if((char)*p != 0x4f || (char)*p != 0x4b) // always true because of || instead of &&
p[1] = p[3] & 0xf0f0f0f; // note that p[1] is p
return 0;
}
It should be immediately obvious that this is horrifically bad code that almost certainly won't do what the programmer intended.
The x86 assembly language follows a long legacy and has mostly kept compatibility. We need to go back to the 8086/8088 chip where that story starts. These were 16 bit processors, which means that their register had a word size of 16 bits. The general purpose registers were named AX, BX, CX and DX. The 8086 had instructions to manipulate the upper and lower 8-bit parts of these registers that were then named AH, AL, BH, BL, CH, CL, DH and DL. This Wikipedia page describes this, please take a look.
The 32 bit versions of these registers have an E in front: EAX, EBX, ECX, etc.
The particular instruction you mention, e.g, cmp al,0x4f is comparing the lower byte of the AX register with 0x4f. The comparison is effectively the same as a subtraction, but does not save the result, only sets the flags.
For the 8086 instruction set, there is a nice reference here. Your program is 32 bit code, so you will need at least a 80386 instruction reference.
You have analyzed variables, and that's a good place to start. You should try to add type annotations to them, size, as you started, and, when used as pointers (like b), pointers to what kind/size.
I might update your variable chart as follows, knowing that [ebp-4] is b:
c = [b + 0xc]
d = [b + 0x10]
e = [b + 0], size = byte
Another thing to analyze is the control flow. For most instructions control flow is sequential, but certain instructions purposefully alter it. Broadly speaking, when the pc is moved forward, it skips some code and when the pc is moved backward it repeats some code it already ran. Skipping code is used to construct if-then, if-then-else, and statements that break out of loops. Jumping back is used to continue looping.
Some instructions, called conditional branches, on some dynamic condition being true: skip forward (or backwards) and on being false do the simple sequential advancement to the next instruction (sometimes called conditional branch fall through).
The control sequences here:
...
0x8048405 <main+41>: mov eax,DWORD PTR [ebp-0x4] b
0x8048408 <main+44>: movzx eax,BYTE PTR [eax] b->e
0x804840b <main+47>: cmp al,0x4f b->e <=> 'O'
0x804840d <main+49>: jne 0x8048419 <main+61> b->e != 'O' skip to 61
** we know that the letter, a->e, must be 'O' here
0x804840f <main+51>: mov eax,DWORD PTR [ebp-0x4] b
0x8048412 <main+54>: movzx eax,BYTE PTR [eax] b->e
0x8048415 <main+57>: cmp al,0x4b b->e <=> 'K'
0x8048417 <main+59>: je 0x804842d <main+81> b->e == 'K' skip to 81
** we know that the letter, a->e must not be 'K' here if we fall thru the above je
** this line can be reached by taken branch jne or by fall thru je
0x8048419 <main+61>: mov eax,DWORD PTR [ebp-0x4] ******
...
The flow of control reaches this last line tagged we know that either the letter is either not 'O' or it is not 'K'.
The construct where the jne instruction is used to skip another test is a short-circuit || operator. Thus the control construct is:
if ( a->e != 'O' || a->e != 'K' ) {
then-part
}
As that these two conditional branches are the only flow control modifications in the function, there is no else part of the if, and there are no loops or other if's.
This code appears to have a slight problem.
If the value is not 'O', the then-part will fire from the first test. However, if we reach the 2nd test, we already know the letter is 'O', so testing it for 'K' is silly and will be true ('O' is not 'K').
Thus, this if-then will always fire.
It is either very inefficient, or, there is a bug that perhaps it is the next letter along in the (presumably) string should be tested for 'K' not the same exact letter.
Studying I found the use of the (i+1)mod(SIZE) to perform a cycle in an array of elements.
So I wondered if this method was more efficient than an if-statement...
For example:
#define SIZE 15
int main(int argc, char *argv[]) {
int items[SIZE];
for(int i = 0; items[0] < 5; i = (i + 1) % SIZE) items[i] += 1;
return 0;
}
It is more efficient than(?):
#define SIZE 15
int main(int argc, char *argv[]) {
int items[SIZE];
for(int i = 0; items[0] < 5; i++) {
if(i == SIZE) i = 0;
items[i] += 1;
}
return 0;
}
Thanks for the answers and your time.
You can check the assembly online (i. e. here). The result depends on the architecture and the optimization, but without optimization and for x64 with GCC, you get this code (as a simple example).
Example 1:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-68], edi
mov QWORD PTR [rbp-80], rsi
mov DWORD PTR [rbp-4], 0
.L3:
mov eax, DWORD PTR [rbp-64]
cmp eax, 4
jg .L2
mov eax, DWORD PTR [rbp-4]
cdqe
mov eax, DWORD PTR [rbp-64+rax*4]
lea edx, [rax+1]
mov eax, DWORD PTR [rbp-4]
cdqe
mov DWORD PTR [rbp-64+rax*4], edx
mov eax, DWORD PTR [rbp-4]
add eax, 1
movsx rdx, eax
imul rdx, rdx, -2004318071
shr rdx, 32
add edx, eax
mov ecx, edx
sar ecx, 3
cdq
sub ecx, edx
mov edx, ecx
mov DWORD PTR [rbp-4], edx
mov ecx, DWORD PTR [rbp-4]
mov edx, ecx
sal edx, 4
sub edx, ecx
sub eax, edx
mov DWORD PTR [rbp-4], eax
jmp .L3
.L2:
mov eax, 0
pop rbp
ret
Example 2:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-68], edi
mov QWORD PTR [rbp-80], rsi
mov DWORD PTR [rbp-4], 0
.L4:
mov eax, DWORD PTR [rbp-64]
cmp eax, 4
jg .L2
cmp DWORD PTR [rbp-4], 15
jne .L3
mov DWORD PTR [rbp-4], 0
.L3:
mov eax, DWORD PTR [rbp-4]
cdqe
mov eax, DWORD PTR [rbp-64+rax*4]
lea edx, [rax+1]
mov eax, DWORD PTR [rbp-4]
cdqe
mov DWORD PTR [rbp-64+rax*4], edx
add DWORD PTR [rbp-4], 1
jmp .L4
.L2:
mov eax, 0
pop rbp
ret
You see, that for the specific case with x86, the solution without modulo is much shorter.
Although you are only asking about mod vs branch, there are probably more like five cases depending on the actual implementation of the mod and branch:
Modulus-based
Power-of-two
If the value of SIZE is known to the compiler and is a power of 2, the mod will compile into a single and like this and will be very efficient in performance and code size. The and is still part of the loop increment dependency chain though, putting a speed limit on the performance of 2 cycles per iteration unless the compiler is clever enough to unroll it and keep the and out of the carried chain (gcc and clang weren't).
Known, not power-of-two
On the other hand, if the value of SIZE is known but not a power of two, then you are likely to get a multiplication-based implementation of the fixed modulus value, like this. This generally takes something like 4-6 instructions, which end up part of the dependency chain. So this will limit your performance to something like 1 iteration every 5-8 cycles, depending exactly on the latency of the dependency chain.
Unknown
In your example SIZE is a known constant, but in the more general case where it is not known at compile time you will get an division instruction on platforms that support it. Something like this.
That is good for code size, since it's a single instruction, but probably disastrous for performance because now you have a slow division instruction as part of the carried dependency for the loop. Depending on your hardware and the type of the SIZE variable, you are looking at 20-100 cycles per iteration.
Branch-based
You put a branch in your code, but jump compiler made decide to implement that as a conditional jump or as a conditional move. At -O2, gcc decides on a jump and clang on a conditional move.
Conditional Jump
This is the direct interpretation of your code: use a conditional branch to implement the i == SIZE condition.
It has the advantage of making the condition a control dependency, not a data dependency, so your loop will mostly run at full speed when the branch is not taken.
However, performance could be seriously impacted if the branch mispredicts often. That depends heavily on the value of SIZE and on your hardware. Modern Intel should be able to predict nested loops like this up to 20-something iterations, but beyond that it will mispredict once every time the inner loop is exited. Of course, is SIZE is very large then the single mispredict won't matter much anyways, so the worst case is SIZE just large enough to mispredict.
Conditional Move
clang uses a conditional move to update i. This is a reasonable option, but it does mean a carried data flow dependency of 3-4 cycles.
1 Either actually a constant like your example or effectively a constant due to inlining and constant propagation.
I would like to know what's really happening calling & and * in C.
Is that it costs a lot of resources? Should I call & each time I wanna get an adress of a same given variable or keep it in memory i.e in a cache variable. Same for * i.e when I wanna get a pointer value ?
Example
void bar(char *str)
{
check_one(*str)
check_two(*str)
//... Could be replaced by
char c = *str;
check_one(c);
check_two(c);
}
I would like to know what's really happening calling & and * in C.
There's no such thing as "calling" & or *. They are the address operator, or the dereference operator, and instruct the compiler to work with the address of an object, or with the object that a pointer points to, respectively.
And C is not C++, so there's no references; I think you just misused that word in your question's title.
In most cases, that's basically two ways to look at the same thing.
Usually, you'll use & when you actually want the address of an object. Since the compiler needs to handle objects in memory with their address anyway, there's no overhead.
For the specific implications of using the operators, you'll have to look at the assembler your compiler generates.
Example: consider this trivial code, disassembled via godbolt.org:
#include <stdio.h>
#include <stdlib.h>
void check_one(char c)
{
if(c == 'x')
exit(0);
}
void check_two(char c)
{
if(c == 'X')
exit(1);
}
void foo(char *str)
{
check_one(*str);
check_two(*str);
}
void bar(char *str)
{
char c = *str;
check_one(c);
check_two(c);
}
int main()
{
char msg[] = "something";
foo(msg);
bar(msg);
}
The compiler output can far wildly depending on the vendor and optimization settings.
clang 3.8 using -O2
check_one(char): # #check_one(char)
movzx eax, dil
cmp eax, 120
je .LBB0_2
ret
.LBB0_2:
push rax
xor edi, edi
call exit
check_two(char): # #check_two(char)
movzx eax, dil
cmp eax, 88
je .LBB1_2
ret
.LBB1_2:
push rax
mov edi, 1
call exit
foo(char*): # #foo(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB2_3
movzx eax, al
cmp eax, 120
je .LBB2_2
pop rax
ret
.LBB2_3:
mov edi, 1
call exit
.LBB2_2:
xor edi, edi
call exit
bar(char*): # #bar(char*)
push rax
movzx eax, byte ptr [rdi]
cmp eax, 88
je .LBB3_3
movzx eax, al
cmp eax, 120
je .LBB3_2
pop rax
ret
.LBB3_3:
mov edi, 1
call exit
.LBB3_2:
xor edi, edi
call exit
main: # #main
xor eax, eax
ret
Notice that foo and bar are identical. Do other compilers do something similar? Well...
gcc x64 5.4 using -O2
check_one(char):
cmp dil, 120
je .L6
rep ret
.L6:
push rax
xor edi, edi
call exit
check_two(char):
cmp dil, 88
je .L11
rep ret
.L11:
push rax
mov edi, 1
call exit
bar(char*):
sub rsp, 8
movzx eax, BYTE PTR [rdi]
cmp al, 120
je .L16
cmp al, 88
je .L17
add rsp, 8
ret
.L16:
xor edi, edi
call exit
.L17:
mov edi, 1
call exit
foo(char*):
jmp bar(char*)
main:
sub rsp, 24
movabs rax, 7956005065853857651
mov QWORD PTR [rsp], rax
mov rdi, rsp
mov eax, 103
mov WORD PTR [rsp+8], ax
call bar(char*)
mov rdi, rsp
call bar(char*)
xor eax, eax
add rsp, 24
ret
Well, if there were any doubt foo and bar are equivalent, a least by the compiler, I think this:
foo(char*):
jmp bar(char*)
is a strong argument they indeed are.
In C, there's no runtime cost associated with either the unary & or * operators; both are evaluated at compile time. So there's no difference in runtime between
check_one(*str)
check_two(*str)
and
char c = *str;
check_one( c );
check_two( c );
ignoring the overhead of the assignment.
That's not necessarily true in C++, since you can overload those operators.
tldr;
If you are programming in C, then the & operator is used to obtain the address of a variable and * is used to get the value of that variable, given it's address.
This is also the reason why in C, when you pass a string to a function, you must state the length of the string otherwise, if someone unfamiliar with your logic sees the function signature, they could not tell if the function is called as bar(&some_char) or bar(some_cstr).
To conclude, if you have a variable x of type someType, then &x will result in someType* addressOfX and *addressOfX will result in giving the value of x. Functions in C only take pointers as parameters, i.e. you cannot create a function where the parameter type is &x or &&x
Also your examples can be rewritten as:
check_one(str[0])
check_two(str[0])
AFAIK, in x86 and x64 your variables are stored in memory (if not stated with register keyword) and accessed by pointers.
const int foo = 5 equal to foo dd 5 and check_one(*foo) equal to push dword [foo]; call check_one.
If you create additional variable c, then it looks like:
c resd 1
...
mov eax, [foo]
mov dword [c], eax ; Variable foo just copied to c
push dword [c]
call check_one
And nothing changed, except additional copying and memory allocation.
I think that compiler's optimizer deals with it and makes both cases as fast as it is possible. So you can use more readable variant.
Size contains the number 86.
var_10= dword ptr -10h
var_C= dword ptr -0Ch
size= dword ptr 8
push ebp
mov ebp, esp
sub esp, 28h
mov eax, [ebp+size]
mov [esp], eax ; size
call _malloc
mov ds:x, eax
mov [ebp+var_C], 0
jmp short loc_804889E
loc_804889E: ~~~~~~~~~~~~~~~~~~~~~
mov eax, [ebp+size]
sub eax, 1
cmp eax, [ebp+var_C]
jg short loc_8048887
loc_8048887: ~~~~~~~~~~~~~~~~~~~~~
mov edx, ds:x
mov eax, [ebp+var_C]
add edx, eax
mov eax, [ebp+var_C]
add eax, 16h
mov [edx], al
add [ebp+var_C], 1
I am having difficulties reversing this portion of a project I am working on. There's a portion of the code where ds:x is moved into edx and is added with var_c and I am unsure where to go with that.
To me the program looks like it calls malloc and then moves that into ds:x and then moves 0 to var_c.
After that it simply subtracts 1 from the size of my pointer array and compares that number to 0, then jumps to a portion where it adds ds:x into edx so it can add eax to edx.
Am I dealing with some sort of array here? What is the first value that's going to go into edx in loc_8048887? Another way this could help would be to see a C equivalent of it... But that would be what I am trying to accomplish and would rather learn the solution through a different means.
Thank you!
In x86 assembly there's no strict distinction between a variable stored in memory and an array in memory. It only depends on how you access the memory region. All you have is code and data. Anyway, I'd say that ds:x is an array as because of this code here:
mov edx, ds:x ; edx = [x]
mov eax, [ebp+var_C] ; eax = something
add edx, eax ; edx = [x] + something
mov eax, [ebp+var_C] ; eax = something
add eax, 16h ; eax = something + 0x16
mov [edx], al ; [[x] + something ] = al . Yes, ds:x is an array!
What is the value of edx in loc_8048887? To find it out you only need some very basic debugging skills. I assume you have gdb at hand, if not, get it ASAP. Then compile the code with debug symbols and link it, then run gdb with the executable, set a code breakpoint at loc_8048887, run the program with r, and finally check the value of edx.
These are the commands you need:
gdb myexecutable
(gdb) b loc_8048887
(gdb) r
(gdb) info registers edx
Assembly info: Using Visual Studio 2010 to write inline assembly embedded into C
Hello,
I am trying to write into an array of chars in C and trying to mimic the action of this C code:
resNum[posNum3] = currNum3 + '0';
currently this is what i have:
mov ebx, posNum3;
mov resNum[ebx], edx; //edx is currNum3
add resNum[ebx], 48; // add 48 because thats the value of the char '0'
I also tried doing this:
mov ebx, posNum3;
mov eax, resNum[ebx] ;// eax points to the beggining of the string
mov eax, edx; // = currnum3
add eax, 48; // + '0'
No luck with any of this, help is more than appreciated!
The problem is the instruction
mov resNum[ebx], edx
moves 4 bytes (an entire dword) into the destination, not a single byte. You probably want
mov byte ptr resNum[ebx], dl
instead. While the assembler will allow you to leave off the 'size ptr' prefix on the address, you probably don't want to, as getting it wrong leads to hard to see bugs.
My X86 asm is rusty, but...
If you're using characters (8 bits) you need to first, before you start a loop, zero out EAX and then move the char into AH or AL, something like:
; Before you start your loop
xor EAX, EAX ; If you're sure you won't overflow an 8 bit number by adding 48, this can go outside the loop
; ... code here to setup loop
mov EBX, posNum3
mov AL, resNum[EBX]
add AL, 48
; ... rest of loop
Note that the compiler will do a better job of this than you will... Unless you're Mike Abrash or someone like him :)
Avoid using expressions like
mov resNum[ebx], edx;
because you never know what is resNum. It could be an expression like esp + 4, and there is no opcode for mov [esp + ebx + 4], edx, so use small steps instead.
Also, ebx is a register that have to be preserved across calls. See http://msdn.microsoft.com/en-us/library/k1a8ss06%28v=VS.71%29.aspx for details and learn about calling conventions.
Most of inline assemblers allows using name instead of size ptr [name], so you can just write
mov al, currNum3
add al, 0x30 //'0'
mov edx, posNum3
mov ecx, resNum
mov byte ptr [edx+ecx], al
if resNum is a global array, not an function argument or local variable, you can write shorter code:
mov al, currNum3
add al, 0x30 //'0'
mov edx, posNum3
mov byte ptr [resNum+ecx], al