Analyzing the assembly code generated to manipulate command line arguments

Analyzing the assembly code generated to manipulate command line arguments - c

#include <stdio.h>
int main(int argc, char * argv[])
{
argv[1][2] = 'A';
return 0;
}
Here is the corresponding assembly code from GCC for a 32-bit Intel architecture. I can't totally understand what is going on.
main:
leal 4(%esp), %ecx - Add 4 to esp and store the address in ecx
andl $-16, %esp - Store first 28 bits from esp's address into esp??
pushl -4(%ecx) - Push the old esp on stack
pushl %ebp - Preamble
movl %esp, %ebp
pushl %ecx - push old esp + 4 on stack
movl 4(%ecx), %eax - move ecx + 4 to eax. this is the address of argv. argc stored at (%ecx).
addl $4, %eax - argv[1]
movl (%eax), %eax - argv[1][0]
addl $2, %eax - argv[1][2]
movb $65, (%eax) - move 'A'
movl $0, %eax - move return value (0)
popl %ecx - get old value of ecx
leave
leal -4(%ecx), %esp - restore esp
ret
What is going on in the beginning of the code before the preamble? Where is argv store according to the following code? On the stack?

The funny code (the first two lines) that you are seeing is the alignment of the stack to 16 bytes (-16 is the same as ~15, and x & ~15 rounds x to a multiple of 16).
argv would be stored at ESP + 8 when entering the function, what leal 4(%esp), %ecx does is create a pointer to a pseudo-struct containing argc and argv, then it proceeds to access them from there. movl 4(%ecx), %eax access argv from this pseudo-struct.

argv is a parameter to "main()", so in many ABIs, it will indeed be passed on the stack.

Related

Calling C function from the x86 assembler

I am trying to write a function that converts decimal numbers into binary in assembler. Since printing is so troublesome in there, I have decided to make a separate function in C that just prints the numbers. But when I run the code, it always prints '0110101110110100'
Heres the C function (both print and conversion):
void printBin(int x) {
printf("%d", x);
}
void DecToBin(int n)
{
// Size of an integer is assumed to be 16 bits
for (int i = 15; i >= 0; i--) {
int k = n >> i;
printBin(k & 1);
}
heres the code in asm:
.globl _DecToBin
.extern _printBin
_DecToBin:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp),%eax
movl $15, %ebx
cmpl $0, %ebx
jl end
start:
movl %ebx, %ecx
movl %eax, %edx
shrl %cl, %eax
andl $1, %eax
pushl %eax
call _printBin
movl %edx, %eax
dec %ebx
cmpl $0, %ebx
jge start
end:
movl %ebp, %esp
popl %ebp
ret
Cant figure out where the mistake is. Any help would be appreciated
disassembled code using online program

Your principle problem is that it is very unlikely that %edx is preserved across the function call to printBin.
Also:
%ebx is not a volatile register in most (any?) C calling convention rules. You need to check your compilers documentation and conform to it.
If you are going to use ebx, you need to save and restore it.
The stack pointer needs to be kept aligned to 16 bytes. On my machine (macos), it SEGVs under printBin if you don’t.

Writing assembly Language in C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing assembly language in C. I was given the following assembly code:
fn:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
movl %eax, -4(%ebp)
subl $1, 8(%ebp)
.f2:
cmpl $1, 8(%ebp)
jg .f3
movl -4(%ebp), %eax
leave
ret
I have written my code in C below:
fn (int x)
{
int y = 1;
if(x > 1):
int z = y *= x
x – 1;
return z;
}
Can someone tell me if I am on the right track with my C code? or if I am off and if I am can you point me in the right direction.
Thank You in advanced

fn:
pushl %ebp
movl %esp, %ebp
This is the usual function startup used in the calling convertion __cdecl, save existing %ebp to the stack then save %esp (current stack address) to %ebp. %ebp will be from now on the reference to the stack call point of this function used by the assembly code to access both parameters and local variables. To conclude this function, the opposite will be done in the future, as well as setting the "return" value to %eax.
At this point you can assume %ebp points to the current stack position where the previous %ebp is stored, %ebp + 4 is the function's return point, and everything from %ebp + 8 on is the data of the function parameters. Negative values will be accessing unused stack space, where the function can store it's local variables.
subl $16, %esp
This is reserving 22 bytes of information in the stack for local variables. It can be 22 variables of 1 byte or 1 variable of 22 bytes, there is no way to know. It can reserve unused bytes to make stack alignment.
Its important to change the value of %esp at this point so any call to push, pop and call won't overwrite this function's local variables.
movl $1, -4(%ebp)
Here it is possible to assume that the first of the local variables present in this function is a 32-bits integer at the address %ebp - 4. It's value was just set to 1. Lets call this variable i.
jmp .f2
.f3:
movl -4(%ebp), %eax
imull 8(%ebp), %eax
Based on what this imull is doing we can assume the function takes at least one parameter at the address %ebp + 8, and it seems its a 32-bits integer being multiplied by i. Lets call this parameter x.
movl %eax, -4(%ebp)
So far we can assume i = x * i.
subl $1, 8(%ebp)
And now x = x - 1.
.f2:
cmpl $1, 8(%ebp)
jg .f3
Here we are checking if x is greater than 1. If true, jumps to f3, that is a backward jump therefore it seems we have a loop going on.
movl -4(%ebp), %eax
leave
ret
Here outside the loop we see it set %eax to i and terminate the function, we can interpret it as return i.
Compiling the analyzed information together, we can assume the original function must have looked somewhat like this:
int fn(int x)
{
int i = 1;
while (x > 1)
{
i = i * x; // or just i *= x, same thing
x = x - 1; // or just x--, same thing
}
return i;
}

Segmentation Fault when getting the max of three numbers in Assembly x86

I am trying to get the max of three numbers using C to call a method in Assembly 32 bit AT & T. When the program runs, I get a segmentation fault(core dumped) error and cannot figure out why. My input has been a mix of positive/negative numbers and 1,2,3, both with the same error as a result.
Assembly
# %eax - first parameter
# %ecx - second parameter
# %edx - third parameter
.code32
.file "maxofthree.S"
.text
.global maxofthree
.type maxofthree #function
maxofthree:
pushl %ebp # old ebp
movl %esp, %ebp # skip over
movl 8(%ebp), %eax # grab first value
movl 12(%ebp), %ecx # grab second value
movl 16(%ebp), %edx # grab third value
#test for first
cmpl %ecx, %eax # compare first and second
jl firstsmaller # first smaller than second, exit if
cmpl %edx, %eax # compare first and third
jl firstsmaller # first smaller than third, exit if
leave # reset the stack pointer and pop the old base pointer
ret # return since first > second and first > third
firstsmaller: # first smaller than second or third, resume comparisons
#test for second and third against each other
cmpl %edx, %ecx # compare second and third
jg secondgreatest # second is greatest, so jump to end
movl %eax, %edx # third is greatest, move third to eax
leave # reset the stack pointer and pop the old base pointer
ret # return third
secondgreatest: # second > third
movl %ecx, %eax #move second to eax
leave # reset the stack pointer and pop the old base pointer
ret # return second
C code
#include <stdio.h>
#include <inttypes.h>
long int maxofthree(long int, long int, long int);
int main(int argc, char *argv[]) {
if (argc != 4) {
printf("Missing command line arguments. Instructions to"
" execute this program:- .\a.out <num1> <num2> <num3>");
return 0;
}
long int x = atoi(argv[1]);
long int y = atoi(argv[2]);
long int z = atoi(argv[3]);
printf("%ld\n", maxofthree(x, y, z)); // TODO change back to (x, y, z)
}

The code is causing a segmentation fault because it is trying to jump back to an invalid return address when the ret instruction is executed. This happens for all three different ret instructions.
The reason why it is occurring is because you don't pop the old base pointer before returning. A small change to the code will remove the fault. Change each ret instruction to:
leave
ret
The leave instruction will do the following:
movl %ebp, %esp
popl %ebp
Which will reset the stack pointer and pop the old base pointer that you saved.
Also, your comparisons are not doing what they are specified to do in the comments. When you do:
cmp %eax, %edx
jl firstsmaller
The jump will happen when %edx is smaller than %eax. So you want the code be
cmpl %edx, %eax
jl firstsmaller
which will jump when %eax is smaller than %edx, as specified in the comment.
Reference this this page for details on the cmp instruction in AT&T/GAS syntax.

You forgot to pop ebp before returning from the function.
Also, cmpl %eax, %ecx compares ecx to eax not the other way. So the code
cmpl %eax, %ecx
jl firstsmaller
will jump if ecx is smaller than eax.

X86 assembly - Handling array-type parameters

I am currently writing a simple C compiler, that takes a .c file as input and generates assembly code (X86, AT&T syntax). I am having a hard time passing array-type parameters and generating the correct assembly code for it. Here's my input:
int getIndexOne(int tab[]){
return tab[1];
}
int main_test(void){
int t[3];
t[0] = 0;
t[1] = 1;
t[2] = 2;
return getIndexOne(t);
}
A fairly simple test. Here is my output:
getIndexOne:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 16
movl %esp, %ebp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl %edi, -24(%ebp)
movl $1, %eax
movl -24(%ebp, %eax, 8), %eax #trouble over here
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size getIndexOne, .-getIndexOne
falsemain:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 16
movl %esp, %ebp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushl %ebx
subl $120, %esp
movl $2, -32(%ebp)
movl $0, %eax
movl $0, -24(%ebp, %eax, 8)
movl $1, %eax
movl $1, -24(%ebp, %eax, 8)
movl $2, %eax
movl $2, -24(%ebp, %eax, 8)
leal -24(%ebp, %eax, 8), %eax
movl %eax, %edi
call getIndexOne
addl $120, %esp
popl %ebx
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main_test, .-main_test
I'm unable to access the content of the passed address (leal instruction). Any help would be much appreciated.
PS: Don't worry about the size of my ints, they are set to 8 instead of 4 bytes for other reasons.

There are two problems, the first being when you setup the stack for the call to getIndexOne:
movl $2, %eax
movl $2, -24(%ebp, %eax, 8)
leal -24(%ebp, %eax, 8), %eax ##<== EAX still holds value of 2!
movl %eax, %edi ##<== wrong address for start of array
You're not clearing the contents of the EAX register after the MOV command, therefore the address that you're putting into the EDI register for the function call is not pointing to the start of the array, but last element of the array (i.e., the second index which has an element value of 2).
The second problem comes here in your getIndexOne function:
movl %edi, -24(%ebp)
movl $1, %eax
movl -24(%ebp, %eax, 8), %eax
You've stored the address on the stack. That's fine, but it also means when you retreive the value from the stack, that you're getting a pointer back which must then be dereferenced a second time. What you're doing right now is you're just reading back a value offset from the frame-pointer on the stack ... that's not the value in the array since you're not dereferencing the pointer that you stored on the stack. In other words you should change it to the following if you must store the pointer on the stack (I think this isn't the most efficient way since the value is already in EDI, but whatever):
movl %edi, -24(%ebp) ##<== store pointer to the array on the stack
movl $1, %eax
movl -24(%ebp), %ecx ##<== get the pointer back from the stack
movl (%ecx, %eax, 8), %eax ##<== dereference the pointer
As a side note, while I'm not sure how your compiler works, I do think it's a little scary that you are using the value that you are loading into the array elements to also index into the array ... if the values being loaded don't match the array index, that's going to create quite a bit of havok ... I'm guessing this is some type of optimization you're attempting to-do when the two values match?

previous stack variables

I have this problem, I am recursively calling a function in C and C is lexically scoped, so I can only access the current stack frame. I want to extract the arguments and the local variables from the previous stack frame which was created under the previous function call while im on the current stack frame
I know that the values from the previous recursive call are still on the stack, but I cant access access these values because they're "buried" under the active stack frame?
I want to extract the arguments and local variables from the previous stack and copy them to copy_of_buried_arg and copy_of_buried_loc;
It is a requirement to use inline assembly using GAS to extract the variables, this is what I have so far, and I tried all day, I cant seem to figure it out, I drew the stack on paper and did the calculations but nothing is working, I also tried deleting calls to printf so the stack will be cleaner but I cant figure out the right arithmetic. Here is the code so far, my function halts on the second iteration
#include <stdio.h>
char glo = 97; // just for fun 97 is ascii lowercase 'a'
int copy_of_buried_arg;
char copy_of_buried_loc;
void rec(int arg) {
char loc;
loc = glo + arg * 2; // just for fun, some char arithmetic
printf("inside rec() arg=%d loc='%c'\n", arg, loc);
if (arg != 0) {
// after this assembly code runs, the copy_of_buried_arg and
// copy_of_buried_loc variables will have arg, loc values from
// the frame of the previous call to rec().
__asm__("\n\
movl 28(%esp), %eax #moving stack pointer to old ebp (pointing it to old ebp)\n\
addl $8, %eax #now eax points to the first argument for the old ebp \n\
movl (%eax), %ecx #copy the value inside eax to ecx\n\
movl %ecx, copy_of_buried_arg # copies the old argument\n\
\n\
");
printf("copy_of_buried_arg=%u copy_of_buried_loc='%c'\n",
copy_of_buried_arg, copy_of_buried_loc);
} else {
printf("there is no buried stack frame\n");// runs if argument = 0 so only the first time
}
if (arg < 10) {
rec(arg + 1);
}
}
int main (int argc, char **argv) {
rec(0);
return 0;
}

I can try to help, but don't have Linux or assembly in GAS. But the calculations should be similar:
Here's the stack after a couple of calls. A typical stack frame setup creates a linked list of stack frames, where EBP is the current stack frame and points to its old value for the previous stack frame.
+-------+
ESP-> |loc='c'| <- ESP currently points here.
+-------+
EBP-> |oldEBP |--+ <- rec(0)'s call frame
+-------+ |
|retaddr| | <- return value of rec(1)
+-------+ |
|arg=1 | | <- pushed argument of rec(1)
+-------+ |
|loc='a'| | <- local variable of rec(0)
+-------+ |
+--|oldEBP |<-+ <- main's call frame
| +-------+
| |retaddr| <- return value of rec(0)
| +-------+
| |arg=0 | <- pushed argument of rec(0)
| +-------+
\|/
to main's call frame
This is created by the following sequence:
Push arguments last arg first.
Call the function, pushing a return address.
Push soon-to-be old EBP, preserving previous stack frame.
Move ESP (top of stack, containing oldEBP) into EBP, creating new stack frame.
Subtract space for local variables.
This has the effect on a 32-bit stack that EBP+8 will always be the first parameter of the call, EBP+12 the 2nd parameter, etc. EBP-n is always an offset to a local variable.
The code to get the previous loc and arg is then (in MASM):
mov ecx,[ebp] // get previous stack frame
mov edx,[ecx]+8 // get first argument
mov copy_of_buried_arg,edx // save it
mov dl,[ecx]-1 // get first char-sized local variable.
mov copy_of_buried_loc,dl // save it
or my best guess in GAS (I don't know it but know it is backwards to MASM):
movl (%ebp),ecx
movl 8(%ecx),edx
movl edx,copy_of_buried_arg
movb -1(%ecx),dl
movb dl,copy_of_buried_loc
Output of your code with my MASM using VS2010 on Windows:
inside rec() arg=0 loc='a'
there is no buried stack frame
inside rec() arg=1 loc='c'
copy_of_buried_arg=0 copy_of_buried_loc='a'
inside rec() arg=2 loc='e'
copy_of_buried_arg=1 copy_of_buried_loc='c'
inside rec() arg=3 loc='g'
copy_of_buried_arg=2 copy_of_buried_loc='e'
inside rec() arg=4 loc='i'
copy_of_buried_arg=3 copy_of_buried_loc='g'
inside rec() arg=5 loc='k'
copy_of_buried_arg=4 copy_of_buried_loc='i'
inside rec() arg=6 loc='m'
copy_of_buried_arg=5 copy_of_buried_loc='k'
inside rec() arg=7 loc='o'
copy_of_buried_arg=6 copy_of_buried_loc='m'
inside rec() arg=8 loc='q'
copy_of_buried_arg=7 copy_of_buried_loc='o'
inside rec() arg=9 loc='s'
copy_of_buried_arg=8 copy_of_buried_loc='q'
inside rec() arg=10 loc='u'
copy_of_buried_arg=9 copy_of_buried_loc='s'

With my compiler (gcc 3.3.4) I ended up with this:
#include <stdio.h>
char glo = 97; // just for fun 97 is ascii lowercase 'a'
int copy_of_buried_arg;
char copy_of_buried_loc;
void rec(int arg) {
char loc;
loc = glo + arg * 2; // just for fun, some char arithmetic
printf("inside rec() arg=%d loc='%c'\n", arg, loc);
if (arg != 0) {
// after this assembly code runs, the copy_of_buried_arg and
// copy_of_buried_loc variables will have arg, loc values from
// the frame of the previous call to rec().
__asm__ __volatile__ (
"movl 40(%%ebp), %%eax #\n"
"movl %%eax, %0 #\n"
"movb 31(%%ebp), %%al #\n"
"movb %%al, %1 #\n"
: "=m" (copy_of_buried_arg), "=m" (copy_of_buried_loc)
:
: "eax"
);
printf("copy_of_buried_arg=%u copy_of_buried_loc='%c'\n",
copy_of_buried_arg, copy_of_buried_loc);
} else {
printf("there is no buried stack frame\n");// runs if argument = 0 so only the first time
}
if (arg < 10) {
rec(arg + 1);
}
}
int main (int argc, char **argv) {
rec(0);
return 0;
}
Here's the disassembly of the relevant part (get it with gcc file.c -S -o file.s):
_rec:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
addl %eax, %eax
addb _glo, %al
movb %al, -1(%ebp)
subl $4, %esp
movsbl -1(%ebp),%eax
pushl %eax
pushl 8(%ebp)
pushl $LC0
call _printf
addl $16, %esp
cmpl $0, 8(%ebp)
je L2
/APP
movl 40(%ebp), %eax #
movl %eax, _copy_of_buried_arg #
movb 31(%ebp), %al #
movb %al, _copy_of_buried_loc #
/NO_APP
subl $4, %esp
movsbl _copy_of_buried_loc,%eax
pushl %eax
pushl _copy_of_buried_arg
pushl $LC1
call _printf
addl $16, %esp
jmp L3
L2:
subl $12, %esp
pushl $LC2
call _printf
addl $16, %esp
L3:
cmpl $9, 8(%ebp)
jg L1
subl $12, %esp
movl 8(%ebp), %eax
incl %eax
pushl %eax
call _rec
addl $16, %esp
L1:
leave
ret
Those offsets from ebp (40 and 31) initially were set to an arbitrary guess value (e.g. 0) and then refined through observation of the disassembly and some simple calculations.
Note that the function uses extra 12+4=16 bytes of stack for the alignment and the parameter when it calls itself recursively:
subl $12, %esp
movl 8(%ebp), %eax
incl %eax
pushl %eax
call _rec
addl $16, %esp
There are also 4 bytes of the return address.
And then the function uses 4+8=12 bytes for the old ebp and its local variables:
_rec:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
So, in total the stack grows by 16+4+12=32 bytes with each recursive call.
Now, we know how to get our local arg and loc through ebp:
movl 8(%ebp), %eax ; <- arg
addl %eax, %eax
addb _glo, %al
movb %al, -1(%ebp) ; <- loc
So, we just add 32 to those offsets 8 and -1 and arrive at 40 and 31.
Do the same and you'll get your "buried" variables.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Analyzing the assembly code generated to manipulate command line arguments - c

argv is a parameter to "main()", so in many ABIs, it will indeed be passed on the stack.

Related

Calling C function from the x86 assembler

Writing assembly Language in C [closed]

Segmentation Fault when getting the max of three numbers in Assembly x86

X86 assembly - Handling array-type parameters

previous stack variables

Categories

Resources