Segmentation fault - what is the reason - c

I'm learning 32bit assembly and I need help with code. I'm trying to put 4 to a table at index 3, which is passed by arguments to assebly code.
.code32
.equ KERNEL, 0x80 # Linux system functions entry
.equ WRITE, 0x04 # write data to file function
.equ EXIT, 0x01 # exit program function
.equ STDOUT, 1
.equ argTab, 8
.equ argLicz, 12
.equ argN, 16
.equ argZakres, 20
.text
.globl przelicz
.type przelicz, #function
przelicz:
pushl %ebp
movl %esp, %ebp
movl $2, %ecx
movl $4, %ebx
movl argTab(%ebp), %edx
movl %ebx, (%edx,%ecx,4)
movl %ebp, %esp
popl %ebp
ret
I execute it with C code:
#include <stdio.h>
int main(){
const static int n = 5;
int tab[n];
int a;
for(a = 0; a < n; ++a){
tab[a] = a;
}
int licz[n];
przelicz(tab, licz, 50, 50);
for(a = 0; a < n; ++a){
//printf("%d ", licz[a]);
}
}
When I run it I get error: Segmentation fault (code dumped). I've read that I'm trying to get access to memory that doesn't exists. How can I solve this?

As I commented above, the problem is that the process is being compiled as a 64-bit process. This is a problem for two reasons:
x64-linux uses a different system call table than x86-linux. Since you aren't calling a direct system call, this probably isn't the mistake - but it's something to be aware of.
For example, write isn't 0x04 in x64-linux, it is 0x01. (See this table for x64-linux system call numbers).
Obviously, x64-linux has larger pointer sizes. So when a 32 bit address is loaded, there is a random 32-bit upper-half of that address that may point to anywhere. This also affects values in a function's stack (they call contain 8-byte offsets, instead of 4) This is mostly likely what was causing the problem in this code.

Related

Segmentation fault when calling x86 Assembly function from C program

I am writing a C program that calls an x86 Assembly function which adds two numbers. Below are the contents of my C program (CallAssemblyFromC.c):
#include <stdio.h>
#include <stdlib.h>
int addition(int a, int b);
int main(void) {
int sum = addition(3, 4);
printf("%d", sum);
return EXIT_SUCCESS;
}
Below is the code of the Assembly function (my idea is to code from scratch the stack frame prologue and epilogue, I have added comments to explain the logic of my code) (addition.s):
.text
# Here, we define a function addition
.global addition
addition:
# Prologue:
# Push the current EBP (base pointer) to the stack, so that we
# can reset the EBP to its original state after the function's
# execution
push %ebp
# Move the EBP (base pointer) to the current position of the ESP
# register
movl %esp, %ebp
# Read in the parameters of the addition function
# addition(a, b)
#
# Since we are pushing to the stack, we need to obtain the parameters
# in reverse order:
# EBP (return address) | EBP + 4 (return value) | EBP + 8 (b) | EBP + 4 (a)
#
# Utilize advanced indexing in order to obtain the parameters, and
# store them in the CPU's registers
movzbl 8(%ebp), %ebx
movzbl 12(%ebp), %ecx
# Clear the EAX register to store the sum
xorl %eax, %eax
# Add the values into the section of memory storing the return value
addl %ebx, %eax
addl %ecx, %eax
I am getting a segmentation fault error, which seems strange considering that I think I am allocating memory in accordance with the x86 calling conventions (e.x. allocating the correct memory sections to the function's parameters). Furthermore, if any of you have a solution, it would be greatly appreciated if you could provide some advice as to how to debug an Assembly program embedded with C (I have been using the GDB debugger but it simply points to the line of the C program where the segmentation fault happens instead of the line in the Assembly program).
Your function has no epilogue. You need to restore %ebp and pop the stack back to where it was, and then ret. If that's really missing from your code, then that explains your segfault: the CPU will go on executing whatever garbage happens to be after the end of your code in memory.
You clobber (i.e. overwrite) the %ebx register which is supposed to be callee-saved. (You mention following the x86 calling conventions, but you seem to have missed that detail.) That would be the cause of your next segfault, after you fixed the first one. If you use %ebx, you need to save and restore it, e.g. with push %ebx after your prologue and pop %ebx before your epilogue. But in this case it is better to rewrite your code so as not to use it at all; see below.
movzbl loads an 8-bit value from memory and zero-extends it into a 32-bit register. Here the parameters are int so they are already 32 bits, so plain movl is correct. As it stands your function would give incorrect results for any arguments which are negative or larger than 255.
You're using an unnecessary number of registers. You could move the first operand for the addition directly into %eax rather than putting it into %ebx and adding it to zero. And on x86 it is not necessary to get both operands into registers before adding; arithmetic instructions have a mem, reg form where one operand can be loaded directly from memory. With this approach we don't need any registers other than %eax itself, and in particular we don't have to worry about %ebx anymore.
I would write:
.text
# Here, we define a function addition
.global addition
addition:
# Prologue:
push %ebp
movl %esp, %ebp
# load first argument
movl 8(%ebp), %eax
# add second argument
addl 12(%ebp), %eax
# epilogue
movl %ebp, %esp # redundant since we haven't touched esp, but will be needed in more complex functions
pop %ebp
ret
In fact, you don't need a stack frame for this function at all, though I understand if you want to include it for educational value. But if you omit it, the function can be reduced to
.text
.global addition
addition:
movl 4(%esp), %eax
addl 8(%esp), %eax
ret
You are corrupting the stacke here:
movb %al, 4(%ebp)
To return the value, simply put it in eax. Also why do you need to clear eax? that's inefficient as you can load the first value directly into eax and then add to it.
Also EBX must be saved if you intend to use it, but you don't really need it anyway.

passing string to an external assembly function

I'm trying to learn some assembly.
My goal is to create an external assembly function that is able to read an array of char, cast to int and then execute various operation, just to learn something.
I've done many proofs but i think i'm missing the point
code:
#include <stdio.h>
#define SIZE 5
extern int foo(char array[]);
int main(void){
char array[SIZE]={'0','1','1','0','1'};
printf("GAS said: %c\n", foo(array));
return 0;
}
assembly:
.data
.text
.global foo
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%esp), %eax #saving in eax the pointer of the array
movl (%eax), %eax #saving in eax the first char of the array
popl %ebp
ret
The strange thing for me is here:
when i use, like in this case
printf("GAS said: %c\n", foo(array));
The output is, as expected, GAS said: 0
Based on this, i was expecting also that changing with:
printf("GAS said: %i\n", foo(array));
will output GAS said: 48 but instead i get in return some random address.
Also, in the assembly file, i can't explain why if i try to
cmpl $48, %eax
je LABEL
the jump will never happen.
The only thing i can think of is that there is a problem with the size, since int takes 4B and char only 1B but i'm not so sure.
So, how can i use compare and return an int to main in this case?

Need help in understanding basic Assembly code which was generated from C code

I am learning Assembly and I am tring to understand how Assembly is being generated from C code.
I created following dummy C code:
#include <stdio.h>
int add(int x, int y){
int result = x + y;
return result;
}
int main(int argc, char *argv[]){
int x = 1 * 10;
int y = 2 * 5;
int firstArg = x + y;
int secondArg = firstArg / 2;
int value;
value = add(firstArg, secondArg);
return value;
}
And got following Assembly code
.file "first.c"
.text
.globl add
.type add, #function
add:
.LFB39:
.cfi_startproc
movl 8(%esp), %eax
addl 4(%esp), %eax
ret
.cfi_endproc
.LFE39:
.size add, .-add
.globl main
.type main, #function
main:
.LFB40:
.cfi_startproc
movl $30, %eax
ret
.cfi_endproc
.LFE40:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
And I was very surprised where are all those arithmetic operations in main vanished?
I do not understand how we got $30 from a nowhere (well, I suppose it is a return value from add function, but why I do not see any command that gets this value, actually I see that return value was already pushed to eax in the add function, so why would we need to move $30 to eax again?).
Where are all local main vars are being declared? I specially created 5 of them to see how one is pushed onto the stack.
Can you also help me to understand what are all those .LFE39 .LFB40: .LFB39: mean?
I am ready the book, but it does not clarify this case for me.
Actually book says that all function must begin with stack initialization:
pushl %ebp
movl %esp, %ebp
As well as when function ends it needs to complete it with pop instruction.
Which is not the case in the above code. I do not see any stack initialization.
Thank you!
You are compiling with optimizations enabled. GCC was smart enough to perform all of those calculations at compile-time, and replace all of that useless code with a simple constant.
First of all, x and y will get replaced with their constant expressions:
int x = 10;
int y = 10;
Then, the places where those variables are used will instead get their constant values:
int firstArg = 20;
int secondArg = 10;
Next, your add function is small and trivial, so it will certainly get inlined:
value = firstArg + secondArg;
Now those are constants too, so the whole thing will be replaced with:
int main(int argc, char *argv[]) {
return 30;
}
While most functions will have a prologue like you've shown, your program does nothing but return 30. More specifically, it no longer uses any local variables, and calls no other functions. Because of this main does not need a frame or reserved space on the call stack. So there's no need for the compiler to emit a prologue/epilogue.
main:
movl $30, %eax
ret
These are the only two instructions your program will run (other than the C-runtime startup code).
Further note, that because your add function was not marked static, the compiler had to assume that someone externally might call it. For that reason, we still see add in the generated assembly, even though no one is calling it:
add:
movl 8(%esp), %eax
addl 4(%esp), %eax
ret

Accessing calloced array in GAS assembly

I have a C function which allocated some memory to an array that is going to be filled with natural numbers up to a certain N.
Lets say,
N = 10;
array = calloc(N, sizeof(int));
I then call upon an assembly function that I have written, however I don't seem to be able to access the array fields. I manage to find the value of N which is located at 8(%ebp), and I have checked with GDB that it really equals to the N set in the C code.
However, when I try to access the first element in the array, and move it to, for example %esi, the value is not zero as it should be.
That I do by using the following code
movl 12(%ebp), %esi
EDIT; I do of course fill the array with natural numbers before calling the assembly function. I just did not want to type in the for loop here.
As I understands it, the parentheses de-refrences the first element of the array, and copies that to esi, however, esi contains only a huge negative number when I use info registers on a breakpoint set after this code in GDB.
So, How do I access arrays that is calloced beforehand, and passed into an assembly function? Is it not possible to derefrence, and copy that single element?
Here is the C function that calls upon the assembly function
int main(int argc, char *argv[]){
int n = 10;
int *array = calloc(n, sizeof(int));
int i, j;
// Populate array up to N
for(i = 0; i < n; i++){
array[i] = 2 + i;
}
printf("Array: %d \n", sizeof(array));
// Run sievs
sievs_assembly(n, array);
// print prime
print_prime(array, n);
// Free mem
free(array);
return EXIT_SUCCESS;
}
I dont want to post the assembly file as a whole, since it is a school project, and I'm not asking for help solving the assessment, only the specific problem. How to derefrence an array item.
The function prototype is
extern void sievs_assembly(int n, int *a);
I thought that since the pointer *a is an int array, and the first argument N is located at 8(%ebp), that the first array element would be 12(%ebp).
How do I actually get to the value if its not enough to just do movl 12(%ebp), %esi
When you do:
movl 12(%ebp), %esi
you have moved the memory adress of your array into %esi. The value of the first element is what this adress is pointing to. To get that you can use:
movl (%esi), %eax
This moves the first element into %eax. The brackets basicly mean "what %esi is pointing to". The size of an int is probably 4 bytes for you (you could check with 'sizeof(int)'). So to acces the next element you could use:
movl 4(%esi), %eax
Which moves the next element into %eax.
I've also made an example program which prints 2 values from an array. Note: I made it for windows.
.macro print str #macro name is 'print'. 1 argument: 'str'
pushl \str #argument names are escaped
call _printf
addl $4, %esp #pop the arguments
.endm
.macro printf str fs #can't overload macro names, so let's call this one 'printf'
pushl \str
pushl \fs #printing numbers requires a format srting
call _printf
addl $8, %esp
.endm
.text
.global _main
_main: #actual program, '_main' becomes 'WinMain#16'
pushl %ebp #push frame
movl %esp, %ebp
movl $array, %esi #Move array pointer to $esi.
#print what %esi is pointing to
printf (%esi), $fs
#print a newline
print $nl
#print what %esi+4 is pointing to. Since a long is 4 bytes
#The next element of the array is 4 bytes further than the first
printf 4(%esi), $fs
movl $0, %eax #move 0 to return register
leave #pop frame
ret #return
.data
fs: .string "%i" #Format string
nl: .string "\n" #New Line
array: #Some array
.long 1,2
This program prints the output:
1
2
Edit:
Since this got some attention, I thought I'd update the answer with some macros.
And explain the _ prefixes on c library calls and main; I'm compiling on windows with MinGW, which requires those prefixes to avoid getting undefined reference errors. On linux they're not needed.
For further documentation about macros and GAS see: using as
A little bit modificated. Compile it by gcc -m32 -g -o foo foo.s
.data
fs: .string "%i" #Format string
nl: .string "\n" #New Line
array: .long 1,2
.text
.global main
main:
pushl %ebp #push frame
movl %esp, %ebp
movl $array, %esi #Move array pointer to $esi.
pushl (%esi) #Push what %esi is pointing to
pushl $fs #push the format string
call printf #call printf
addl $8, %esp #pop the arguments
pushl $nl #print a new line
call printf
addl $4, %esp
pushl 4(%esi) #push what %esi+4 is pointing to. Since a long is 4 bytes
pushl $fs #The next element is 4 bytes further than the first
call printf
addl $8, %esp
pushl $nl #print a new line
call printf
addl $4, %esp
pushl $0
call exit

Returning an int* from ASM

I'm writing a function in ASM which is supposed to copy the (constant) value 2 into every index of an array declared in .data. My code compiles, but I don't get any output through my C program. Here's the code:
.globl my_func
.globl _my_func
my_func:
_my_func:
movl %esp,%ebp
pushl %ebp
movl $0,%ecx
leal array,%eax
jmp continue
continue:
_continue:
movl $2,array(%ecx,4)
cmpl $1024,%ecx
jne incr
je finish
incr:
_incr:
addl $4,%ecx
jmp continue
finish:
_finish:
popl %ebp
ret
.data
.align 4
array: .fill 1024
It is called from here:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
extern int* my_func();
int main(int argc, const char * argv[])
{
int i = 0;
int* a = my_func();
for(i = 0; i < 1024/4; i++){
printf("%d\n", a[i]);
}
return 0;
}
As mentioned, the program does compile and run, but the main function does not output anything to the terminal. And yes, I know the code isn't optimal -- I'm currently following an introductory course in computer architecture and ASM, and I'm just checking out instructions and data.
I am assembling the code for IA32 on an Intel Mac with OSX10.9, using LLVM5.1
Thanks in advance.
The function prologue where you save the previous frame pointer and set it up for the new stack frame should be:
pushl %ebp
movl %esp,%ebp
Yours is in the opposite order, so when your function returns the caller's frame pointer will be incorrect.
return values are normally in eax, so you need to set eax to the address of the start of the memory you want to return in finish.
fyi: you shouldn't need to declare your label twice, the leading underscore is only needed for public functions you want to access from C

Resources