I have a C function which allocated some memory to an array that is going to be filled with natural numbers up to a certain N.
Lets say,
N = 10;
array = calloc(N, sizeof(int));
I then call upon an assembly function that I have written, however I don't seem to be able to access the array fields. I manage to find the value of N which is located at 8(%ebp), and I have checked with GDB that it really equals to the N set in the C code.
However, when I try to access the first element in the array, and move it to, for example %esi, the value is not zero as it should be.
That I do by using the following code
movl 12(%ebp), %esi
EDIT; I do of course fill the array with natural numbers before calling the assembly function. I just did not want to type in the for loop here.
As I understands it, the parentheses de-refrences the first element of the array, and copies that to esi, however, esi contains only a huge negative number when I use info registers on a breakpoint set after this code in GDB.
So, How do I access arrays that is calloced beforehand, and passed into an assembly function? Is it not possible to derefrence, and copy that single element?
Here is the C function that calls upon the assembly function
int main(int argc, char *argv[]){
int n = 10;
int *array = calloc(n, sizeof(int));
int i, j;
// Populate array up to N
for(i = 0; i < n; i++){
array[i] = 2 + i;
}
printf("Array: %d \n", sizeof(array));
// Run sievs
sievs_assembly(n, array);
// print prime
print_prime(array, n);
// Free mem
free(array);
return EXIT_SUCCESS;
}
I dont want to post the assembly file as a whole, since it is a school project, and I'm not asking for help solving the assessment, only the specific problem. How to derefrence an array item.
The function prototype is
extern void sievs_assembly(int n, int *a);
I thought that since the pointer *a is an int array, and the first argument N is located at 8(%ebp), that the first array element would be 12(%ebp).
How do I actually get to the value if its not enough to just do movl 12(%ebp), %esi
When you do:
movl 12(%ebp), %esi
you have moved the memory adress of your array into %esi. The value of the first element is what this adress is pointing to. To get that you can use:
movl (%esi), %eax
This moves the first element into %eax. The brackets basicly mean "what %esi is pointing to". The size of an int is probably 4 bytes for you (you could check with 'sizeof(int)'). So to acces the next element you could use:
movl 4(%esi), %eax
Which moves the next element into %eax.
I've also made an example program which prints 2 values from an array. Note: I made it for windows.
.macro print str #macro name is 'print'. 1 argument: 'str'
pushl \str #argument names are escaped
call _printf
addl $4, %esp #pop the arguments
.endm
.macro printf str fs #can't overload macro names, so let's call this one 'printf'
pushl \str
pushl \fs #printing numbers requires a format srting
call _printf
addl $8, %esp
.endm
.text
.global _main
_main: #actual program, '_main' becomes 'WinMain#16'
pushl %ebp #push frame
movl %esp, %ebp
movl $array, %esi #Move array pointer to $esi.
#print what %esi is pointing to
printf (%esi), $fs
#print a newline
print $nl
#print what %esi+4 is pointing to. Since a long is 4 bytes
#The next element of the array is 4 bytes further than the first
printf 4(%esi), $fs
movl $0, %eax #move 0 to return register
leave #pop frame
ret #return
.data
fs: .string "%i" #Format string
nl: .string "\n" #New Line
array: #Some array
.long 1,2
This program prints the output:
1
2
Edit:
Since this got some attention, I thought I'd update the answer with some macros.
And explain the _ prefixes on c library calls and main; I'm compiling on windows with MinGW, which requires those prefixes to avoid getting undefined reference errors. On linux they're not needed.
For further documentation about macros and GAS see: using as
A little bit modificated. Compile it by gcc -m32 -g -o foo foo.s
.data
fs: .string "%i" #Format string
nl: .string "\n" #New Line
array: .long 1,2
.text
.global main
main:
pushl %ebp #push frame
movl %esp, %ebp
movl $array, %esi #Move array pointer to $esi.
pushl (%esi) #Push what %esi is pointing to
pushl $fs #push the format string
call printf #call printf
addl $8, %esp #pop the arguments
pushl $nl #print a new line
call printf
addl $4, %esp
pushl 4(%esi) #push what %esi+4 is pointing to. Since a long is 4 bytes
pushl $fs #The next element is 4 bytes further than the first
call printf
addl $8, %esp
pushl $nl #print a new line
call printf
addl $4, %esp
pushl $0
call exit
Related
I'm trying to learn some assembly.
My goal is to create an external assembly function that is able to read an array of char, cast to int and then execute various operation, just to learn something.
I've done many proofs but i think i'm missing the point
code:
#include <stdio.h>
#define SIZE 5
extern int foo(char array[]);
int main(void){
char array[SIZE]={'0','1','1','0','1'};
printf("GAS said: %c\n", foo(array));
return 0;
}
assembly:
.data
.text
.global foo
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%esp), %eax #saving in eax the pointer of the array
movl (%eax), %eax #saving in eax the first char of the array
popl %ebp
ret
The strange thing for me is here:
when i use, like in this case
printf("GAS said: %c\n", foo(array));
The output is, as expected, GAS said: 0
Based on this, i was expecting also that changing with:
printf("GAS said: %i\n", foo(array));
will output GAS said: 48 but instead i get in return some random address.
Also, in the assembly file, i can't explain why if i try to
cmpl $48, %eax
je LABEL
the jump will never happen.
The only thing i can think of is that there is a problem with the size, since int takes 4B and char only 1B but i'm not so sure.
So, how can i use compare and return an int to main in this case?
I am trying to get the max of three numbers using C to call a method in Assembly 32 bit AT & T. When the program runs, I get a segmentation fault(core dumped) error and cannot figure out why. My input has been a mix of positive/negative numbers and 1,2,3, both with the same error as a result.
Assembly
# %eax - first parameter
# %ecx - second parameter
# %edx - third parameter
.code32
.file "maxofthree.S"
.text
.global maxofthree
.type maxofthree #function
maxofthree:
pushl %ebp # old ebp
movl %esp, %ebp # skip over
movl 8(%ebp), %eax # grab first value
movl 12(%ebp), %ecx # grab second value
movl 16(%ebp), %edx # grab third value
#test for first
cmpl %ecx, %eax # compare first and second
jl firstsmaller # first smaller than second, exit if
cmpl %edx, %eax # compare first and third
jl firstsmaller # first smaller than third, exit if
leave # reset the stack pointer and pop the old base pointer
ret # return since first > second and first > third
firstsmaller: # first smaller than second or third, resume comparisons
#test for second and third against each other
cmpl %edx, %ecx # compare second and third
jg secondgreatest # second is greatest, so jump to end
movl %eax, %edx # third is greatest, move third to eax
leave # reset the stack pointer and pop the old base pointer
ret # return third
secondgreatest: # second > third
movl %ecx, %eax #move second to eax
leave # reset the stack pointer and pop the old base pointer
ret # return second
C code
#include <stdio.h>
#include <inttypes.h>
long int maxofthree(long int, long int, long int);
int main(int argc, char *argv[]) {
if (argc != 4) {
printf("Missing command line arguments. Instructions to"
" execute this program:- .\a.out <num1> <num2> <num3>");
return 0;
}
long int x = atoi(argv[1]);
long int y = atoi(argv[2]);
long int z = atoi(argv[3]);
printf("%ld\n", maxofthree(x, y, z)); // TODO change back to (x, y, z)
}
The code is causing a segmentation fault because it is trying to jump back to an invalid return address when the ret instruction is executed. This happens for all three different ret instructions.
The reason why it is occurring is because you don't pop the old base pointer before returning. A small change to the code will remove the fault. Change each ret instruction to:
leave
ret
The leave instruction will do the following:
movl %ebp, %esp
popl %ebp
Which will reset the stack pointer and pop the old base pointer that you saved.
Also, your comparisons are not doing what they are specified to do in the comments. When you do:
cmp %eax, %edx
jl firstsmaller
The jump will happen when %edx is smaller than %eax. So you want the code be
cmpl %edx, %eax
jl firstsmaller
which will jump when %eax is smaller than %edx, as specified in the comment.
Reference this this page for details on the cmp instruction in AT&T/GAS syntax.
You forgot to pop ebp before returning from the function.
Also, cmpl %eax, %ecx compares ecx to eax not the other way. So the code
cmpl %eax, %ecx
jl firstsmaller
will jump if ecx is smaller than eax.
I am learning assembly and I have this function that contains some lines I just don't understand:
. globl
. text
factR:
cmpl $0 ,4(% esp )
jne cont
movl $1 ,%eax
ret
cont :
movl 4(%esp),%eax
decl %eax
pushl %eax // (1)
call factR // (2)
addl $4,%esp // (3)
imull 4(%esp),%eax
ret
and the C code corresponding to it is:
int factR ( int n ) {
if ( n != 0 )
return n;
else
return n ∗ factR ( n − 1 );
}
I am not sure about the lines marked with numbers.
pushl %eax: does it mean we put the contents of %eax in
%esp?
So we call factR(). Will the result of that be in %esp when we come back here to the next instructions?
addl $4,%esp not sure about this one, are we adding 4 to the number stored in %esp or do we add 4 to the pointer to get the next number or something similar?
It appears that the factR() function follows the C calling convention (cdecl). It is where the caller pushes the arguments to the function call onto the stack and the caller cleans up the stack (undoes the changes to the stack that was made to do the function call) when the function returns.
The first push (1) is putting the contents of the %eax register as the argument to the following call. Then the actual call to the function is made (2). Then the stack is cleaned (3) by resetting the stack pointer %esp back to the state when it didn't have the argument pushed back in step 1. It pushed one 32-bit value so it must adjust the pointer by 4-bytes.
I can't understand how memory is allocated in the following code:
#include<stdio.h>
#include<string.h>
int main()
{
char a[]={"text"};
char b[]={'t','e','x','t'};
printf(":%s: sizeof(a)=%d, strlen(a)=%d\n",a, sizeof(a), strlen(a));
printf(":%s: sizeof(b)=%d, strlen(b)=%d\n",b, sizeof(b), strlen(b));
return 0;
}
The output is
:text: sizeof(a)=5, strlen(a)=4
:texttext: sizeof(b)=4, strlen(b)=8
By looking into memory addresses and the output code it seems that variable b is placed before variable a, and that's why strlen(b), by looking for \0, returns 8.
Why does this happen? I expected variable a to be declared first.
The language makes no guarantees about what is placed where. So, your experiment make very little sense. It might work, it might not. The behavior is undefined. Your b is not a string and it is UB to use strlen with something that is not a string.
From the purely practical point of view though, local variables are usually allocated on the stack, and the stack on may moderns platforms (like x86) grows backwards, i.e. from higher addresses to lower addresses. So, if you are using one of these platforms, it is possible that your compiler decided to allocate variables in the order of their declaration (a first and b second), but because stack grows backwards b ended up at lower addresses in the memory than a. I.e. b ended up before a in memory.
One can note though that a typical implementation does not normally allocate stack space for local variables one-by-one. Instead, the entire block of memory for all local variables (stack frame) is allocated at once, meaning that the logic I described above does not necessarily apply. Yet, it is still possible that the compiler still follows the "reverse" approach to local variable layout anyway, i.e. variables declared earlier are placed later in the local memory frame, "as if" they were allocated one-by-one in the order of their declaration.
Your "b" character array is not null-terminated. To understand consider that the char a[] declaration is equivalent to:
char a[] = { 't', 'e', 'x', 't', '\0' };
In otherwords strlen(b) is undefined, it just looks through random memory for a NULL character (0 byte).
I do not get the same output see here on my ideone snippet: http://ideone.com/zHhHc
:text: sizeof(a)=5, strlen(a)=4
:text
When I use codepad, I see different output than you: http://codepad.org/MXJWY136
:text: sizeof(a)=5, strlen(a)=4
:text: sizeof(b)=4, strlen(b)=4
Also, when I compile it a C++ compiler, I get the same output: http://ideone.com/aLNjv
:text: sizeof(a)=5, strlen(a)=4
:text: sizeof(b)=4, strlen(b)=4
So something is definitely wrong on your platform and/or compiler. It could be undefined behavior (UB) due to the fact that your char array does not have a null-terminator (\0). At any rate...
While both a and b may look the same, they are not due to how you have defined the character arrays.
char a[] = "text";
What this array looks like in memory is the following:
----------------------
| t | e | x | t | \0 |
----------------------
The double quotes mean "text string" and will add the \0 automatically (that's why the size is 5). In b, you have to add it manually but the size is 4. The strlen() in b is searching until end in your implementation, which could include garbage characters. This is a big problem in many security aspects of coding for char arrays that are not null terminated.
I compiled your code on Linux/x86 with GCC using the -S flag to see assembly output. That shows that for me, b[] is allocated at a higher memory address than a[], so I didn't get strlen(b)=4.
.file "str.c"
.section .rodata
.align 4
.LC0:
.string ":%s: sizeof(a)=%d, strlen(a)=%d\n"
.align 4
.LC1:
.string ":%s: sizeof(b)=%d, strlen(b)=%d\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl %gs:20, %eax
movl %eax, 28(%esp)
xorl %eax, %eax
movl $1954047348, 19(%esp)
movb $0, 23(%esp)
movb $116, 24(%esp)
movb $101, 25(%esp)
movb $120, 26(%esp)
movb $116, 27(%esp)
leal 19(%esp), %eax
movl %eax, (%esp)
call strlen
movl %eax, %edx
movl $.LC0, %eax
movl %edx, 12(%esp)
movl $5, 8(%esp)
leal 19(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
leal 24(%esp), %eax
movl %eax, (%esp)
call strlen
movl $.LC1, %edx
movl %eax, 12(%esp)
movl $4, 8(%esp)
leal 24(%esp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call printf
movl $0, %eax
movl 28(%esp), %edx
xorl %gs:20, %edx
je .L2
call __stack_chk_fail
.L2:
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
In the code above, $1954047348 followed by $0 is a[] with the null termination. The 4 bytes after that are b[]. This means b[] was pushed on the stack before a[] since the stack grows down on this compiler.
If you compile with -S (or equivalent), you should see b[] at a lower address than a[], so you'll get strlen(b)=8.
So lets say I have this code
int my_static_int = 4;
func(&my_static_int);
I passed the function a pointer to my_static_int, obviously. But what happens when the code is compiled? Avenue I've considered:
1) When you declare a non-pointer variable, C automatically creates its pointer and does something internally like typedefs my_static_int to be *(internal_reference)
Anyway, I hope that my question is descriptive enough
Pointers are just a term to help us humans understand what's going on.
The & operator when used with a variable simply means address of. No "pointer" is created at runtime, you are simply passing in the address of the variable into the function.
If you have:
int x = 3;
int* p = &x;
Then p is a variable which holds a memory address. Inside that memory address is an int.
If you really want to know how the code looks under the covers, you have to get the compiler to generate the assembler code (gcc can do this with the -S option).
When you truly grok C and pointers at their deepest level, you'll realise that it's just the address of the variable being passed in rather than the value of the variable. There's no need for creating extra memory to hold a pointer since the pointer is moved directly from the code to the stack (the address will probably have been set either at link time or load time, not run time).
There's also no need for internal type creation since the compiled code already knows the type and how to manipulate it.
Keeping in mind that this is implementation-specific, consider the following code:
int my_static_int = 4;
static void func (int *x) {
*x = *x + 7;
}
int main (void) {
func(&my_static_int);
return 0;
}
which, when compiled with gcc -S to get the assembler, produces:
.file "qq.c"
.globl _my_static_int
.data
.align 4
_my_static_int:
.long 4
.text
.def _func; .scl 3; .type 32; .endef
_func:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl 8(%ebp), %edx
movl (%edx), %edx
addl $7, %edx
movl %edx, (%eax)
popl %ebp
ret
.def ___main; .scl 2; .type 32; .endef
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $_my_static_int, (%esp)
call _func
movl $0, %eax
leave
ret
The important bit is these sections:
movl $_my_static_int, (%esp) ; load address of variable onto stack.
call _func ; call the function.
:
movl 8(%ebp), %eax ; get passed parameter (the address of the var) into eax
movl 8(%ebp), %edx ; and also into edx.
movl (%edx), %edx ; get the value from the address (dereference).
addl $7, %edx ; add 7 to it.
movl %edx, (%eax) ; and put it back into the same address.
Hence the address is passed, and used to get at the variable.
When the code is compiled, function func receives the address of your my_static_int variable as parameter. Nothing else.
There no need to create any implicit pointers when you declare a non-pointer variable. It is not clear from your question how you came to this weird idea.
Why not look at the assembly output? You can do this with gcc using the -S option, or (if your system uses the GNU toolchain) using the objdump -d command on the resulting object file or executable file.
The simple answer is that the object code generates a reference to the symbol where my_static_int is allocated (which is typically in the static data segment of your object module).
So the address of the variable is resolved at load time (when it is assigned a real physical address), and the loader fixes up the reference to the variable, filling it in with its address.