calling assembly functions from c - c

I'm trying to use a function in assembly, invoked from a C project. This function is supposed to call a libc function let's say printf(), but I keep getting a segmentation fault.
In the .c file I have the declaration of the function let's say
int do_shit_in_asm()
In the .asm file I have
.extern printf
.section .data
printtext:
.ascii "test"
.section .text
.global do_shit_in_asm
.type do_shit_in_asm, #function
do_shit_in_asm:
pushl %ebp
movl %esp, %ebp
push printtext
call printf
movl %ebp, %esp
pop %ebp
ret
Any pointers comments would be appreciated.
as func.asm -o func.o
gcc prog.c func.o -o prog

Change push printtext to push $printtext.
As it is, you're loading a value from the address printtext and pushing that, rather than pushing the address. Thus, you're passing 'test' as a 32-bit number, rather than a pointer, and printf is trying to interpret that as an address and crashing.

One of the best ways to get started with assembly language functions is to write a similar function in C, and then build it with the compiler switch that generates an assembly listing (-S on gcc). Then you can study the output of what the compiler did, and modify as needed.
This is particularly useful if you're calling functions such as printf which use a different calling convention (because of the variable number of arguments). Calling those functions may be quite different from calling non-varargs functions.

the issue was that i was using
pushl printtext
rather that
pushl $printtext
Thanks everybody for your help and sorry for wasting your time :P

After this:
push printtext
call printf
You want:
addl $4, %esp
Further explanation:
Because you're using x86 Linux I assume the calling convention requires the callee to cleanup the parameters. Because you pushed a pointer before calling printf, your stack is off by 4 after that function's ret instruction happened.
Update:
Yeah, OK, I was used to Intel syntax so I was getting the order of the arguments backward in my head. Actually the lack of the addl back to esp doesn't matter, because you're restoring esp correctly near your ret. My next guess is that the string you're passing to printf is lacking a null terminator... Let me see what gas does...
Update 2:
OK, gas null terminates strings for you, so I guess my second hunch was wrong. It looks like you found the issue so the point is moot.

Related

Linking and calling printf from gas assembly

There are a few related questions to this which I've come across, such as Printf with gas assembly and Calling C printf from assembly but I'm hoping this is a bit different.
I have the following program:
.section .data
format:
.ascii "%d\n"
.section .text
.globl _start
_start:
// print "55"
mov $format, %rdi
mov $55, %rsi
mov $0, %eax
call printf # how to link?
// exit
mov $60, %eax
mov $0, %rdi
syscall
Two questions related to this:
Is it possible to use only as (gas) and ld to link this to the printf function, using _start as the entry point? If so, how could that be done?
If not, other than changing _start to main, what would be the gcc invocation to run things properly?
It is possible to use ld, but not recommended: if you use libc functions, you need to initialise the C runtime. That is done automatically if you let the C compiler provide _start and start your program as main. If you use the libc but not the C runtime initialisation code, it may seem to work, but it can also lead to strange spurious failure.
If you start your program from main (your second case) instead, it's as simple as doing gcc -o program program.s where program.s is your source file. On some Linux distributions you may also need to supply -no-pie as your program is not written in PIC style (don't worry about this for now).
Note also that I recommend not mixing libc calls with raw system calls. Instead of doing a raw exit system call, call the C library function exit. This lets the C runtime deinitialise itself correctly, including flushing any IO streams.
Now if you assemble and link your program as I said in the first paragraph, you'll notice that it might crash. This is because the stack needs to be aligned to a multiple of 16 bytes on calls to functions. You can ensure this alignment by pushing a qword of data on the stack at the beginning of each of your functions (remember to pop it back off at the end).

int 80 doesn't appear in assembly code

Problem
Let's consider:
int main(){
write(1, "hello", 5);
return 0;
}
I am reading a book that suggests the assembly output for the above code should be:
main:
mov $4, %eax
mov $1 %ebx
mov %string, %ecx
mov $len, %edx
int $0x80
(The above code was compiled with 32 bit architecture. Passing arguments by registers isn't caused by '64 bit convention passing arguments by registers' but it is caused by the fact, we make a syscall. )
And the output on my 64 bit Ubuntu machine with: gcc -S main.c -m32
is:
pushl $4
pushl $string
pushl $1
call write
My doubts
So it confused me. Why did gcc compile it as "normal" call, not as syscall.
In this situation, what is the way to make the processor use a kernel function (like write)?
I am reading a book that suggests the assembly output for the above code should be ...
You shouldn't believe everything you read :-)
There is no requirement that C code be turned into specific assembly code, the only requirement that the C standard mandates is that the resulting code behave in a certain manner.
Whether that's done by directly calling the OS system call with int $80 (or sysenter), or whether it's done by calling a library routine write() which eventually calls the OS in a similar fashion, is largely irrelevant.
If you were to locate and disassemble the write() code, you may well find it simply reads those values off the stack into registers and then calls the OS in much the same way as the code you've shown containing int $80.
As an aside, what if you wanted to port gcc to a totally different architecture that uses call 5 to do OS-level system calls. If gcc is injecting specific int $80 calls into the assembly stream, that's not going to work too well.
But, if it's injecting a call to a write() function, all you have to do is make sure you link it with the correct library containing a modified write() function (one that does call 5 rather than int $80).

How to call printf in machine language

Let's say we have the assembly code that prints Z to the screen.
pushl $'Z'
call putchar
add $4, %esp
How can we write this in machine language code? I've checked both online resources and this code in gdb, but the former disagrees with the latter and the latter changes each time I run the code. Thank you for your help.
I'm using linux x86. Again, I'd like to say that I want to know how to write this in machine language code.
Use putchar instead of printf . printf needs a string, and you have provided it a char. Also, remember that you have to restore the stack just after calling your function, as both printf and putchar uses the cdecl calling convention.
pushl $'Z'
call putchar
add $4, %esp

What is the use of _start() in C?

I learned from my colleague that one can write and execute a C program without writing a main() function. It can be done like this:
my_main.c
/* Compile this with gcc -nostartfiles */
#include <stdlib.h>
void _start() {
int ret = my_main();
exit(ret);
}
int my_main() {
puts("This is a program without a main() function!");
return 0;
}
Compile it with this command:
gcc -o my_main my_main.c –nostartfiles
Run it with this command:
./my_main
When would one need to do this kind of thing? Is there any real world scenario where this would be useful?
The symbol _start is the entry point of your program. That is, the address of that symbol is the address jumped to on program start. Normally, the function with the name _start is supplied by a file called crt0.o which contains the startup code for the C runtime environment. It sets up some stuff, populates the argument array argv, counts how many arguments are there, and then calls main. After main returns, exit is called.
If a program does not want to use the C runtime environment, it needs to supply its own code for _start. For instance, the reference implementation of the Go programming language does so because they need a non-standard threading model which requires some magic with the stack. It's also useful to supply your own _start when you want to write really tiny programs or programs that do unconventional things.
While main is the entry point for your program from a programmers perspective, _start is the usual entry point from the OS perspective (the first instruction that is executed after your program was started from the OS)
In a typical C and especially C++ program, a lot of work has been done before the execution enters main. Especially stuff like initialization of global variables. Here you can find a good explanation of everything that's going on between _start() and main() and also after main has exited again (see comment below).
The necessary code for that is usually provided by the compiler writers in a startup file, but with the flag –nostartfiles you essentially tell the compiler: "Don't bother giving me the standard startup file, give me full control over what is happening right from the start".
This is sometimes necessary and often used on embedded systems. E.g. if you don't have an OS and you have to manually enable certain parts of your memory system (e.g. caches) before the initialization of your global objects.
Here is a good overview of what happens during program startup before main. In particular, it shows that __start is the actual entry point to your program from OS viewpoint.
It is the very first address from which the instruction pointer will start counting in your program.
The code there invokes some C runtime library routines just to do some housekeeping, then call your main, and then bring things down and call exit with whatever exit code main returned.
A picture is worth a thousand words:
P.S: this answer is transplanted from another question which SO has helpfully closed as duplicate of this one.
When would one need to do this kind of thing?
When you want your own startup code for your program.
main is not the first entry for a C program, _start is the first entry behind the curtain.
Example in Linux:
_start: # _start is the entry point known to the linker
xor %ebp, %ebp # effectively RBP := 0, mark the end of stack frames
mov (%rsp), %edi # get argc from the stack (implicitly zero-extended to 64-bit)
lea 8(%rsp), %rsi # take the address of argv from the stack
lea 16(%rsp,%rdi,8), %rdx # take the address of envp from the stack
xor %eax, %eax # per ABI and compatibility with icc
call main # %edi, %rsi, %rdx are the three args (of which first two are C standard) to main
mov %eax, %edi # transfer the return of main to the first argument of _exit
xor %eax, %eax # per ABI and compatibility with icc
call _exit # terminate the program
Is there any real world scenario where this would be useful?
If you mean, implement our own _start:
Yes, in most of the commercial embedded software I have worked with, we need to implement our own _start regarding to our specific memory and performance requirements.
If you mean, drop the main function and change it to something else:
No, I don't see any benefit doing that.

Function Prologue and Epilogue removed by GCC Optimization

Taking an empty program
//demo.c
int main(void)
{
}
Compiling the program at default optimization.
gcc -S demo.c -o dasm.asm
I get the assembly output as
//Removed labels and directive which are not relevant
main:
pushl %ebp // prologue of main
movl %esp, %ebp // prologue of main
popl %ebp // epilogue of main
ret
Now Compiling the program at -O2 optimization.
gcc -O2 -S demo.c -o dasm.asm
I get the optimized assembly
main:
rep
ret
In my initial search , i found that the optimization flag -fomit-frame-pointer was responsible for removing the prologue and epilogue.
I found more information about the flag , in the gcc compiler manual.But could not understand this reason below , given by the manual , for removing the prologue and epilogue.
Don't keep the frame pointer in a register for functions that don't
need one.
Is there any other way , of putting the above reason ?
What is the reason for "rep" instruction , appearing at -02 optimization ?
Why does main function , not require a stack frame initialization ?
If the setting up of the frame pointer , is not done from within the main function , then who does this job ?
Is it done by the OS or is it the functionality of the hardware ?
Compilers are getting smart, it knew you didn't need a stack frame pointer stored in a register because whatever you put into your main() function didn't use the stack.
As for rep ret:
Here's the principle. The processor tries to fetch the next few
instructions to be executed, so that it can start the process of
decoding and executing them. It even does this with jump and return
instructions, guessing where the program will head next.
What AMD says here is that, if a ret instruction immediately follows a
conditional jump instruction, their predictor cannot figure out where
the ret instruction is going. The pre-fetching has to stop until the
ret actually executes, and only then will it be able to start looking
ahead again.
The "rep ret" trick apparently works around the problem, and lets the
predictor do its job. The "rep" has no effect on the instruction.
Source: Some forum, google a sentence to find it.
One thing to note is that just because there is no prologue it doesn't mean there is no stack, you can still push and pop with ease it's just that complex stack manipulation will be difficult.
Functions that don't have prologue/epilogue are usually dubbed naked. Hackers like to use them a lot because they don't contaminate the stack when you jmp to them, I must confess I know of no other use to them outside optimization. In Visual Studio it's done via:
__declspec(naked)

Resources