Mixing C and Assembly. `Hello World` on 64-bit Linux - c

Based on this tutorial, I am trying to write Hello World to the console on 64 bit Linux. Compilation raises no errors, but I get no text on console either. I don't know what is wrong.
write.s:
.data
SYSREAD = 0
SYSWRITE = 1
SYSEXIT = 60
STDOUT = 1
STDIN = 0
EXIT_SUCCESS = 0
message: .ascii "Hello, world!\n"
message_len = .-message
.text
.globl _write
_write:
pushq %rbp
movq %rsp, %rbp
movq $SYSWRITE, %rax
movq $STDOUT, %rdi
movq $message, %rsi
movq $message_len, %rdx
syscall
popq %rbp
ret
main.c:
extern void write(void);
int main (int argc, char **argv)
{
write();
return 0;
}
Compiling:
as write.s -o write.o
gcc main.c -c -o main.o
gcc main.o write.o -o program
./program

Okay, so my code had two mistakes:
1) I named my as function 'write' that is common c name and i needed to rename it.
2) in function name, i shouldn't put underscores.
Proper code:
writehello.s
.data
SYSREAD = 0
SYSWRITE = 1
SYSEXIT = 60
STDOUT = 1
STDIN = 0
EXIT_SUCCESS = 0
message: .ascii "Hello, world!\n"
message_len = .-message
.text
#.global main
#main:
#call write
#movq $SYSEXIT, %rax
#movq $EXIT_SUCCESS, %rdi
#syscall
#********
.global writehello
writehello:
pushq %rbp
movq %rsp, %rbp
movq $SYSWRITE, %rax
movq $STDOUT, %rdi
movq $message, %rsi
movq $message_len, %rdx
syscall
popq %rbp
ret
main.c
extern void writehello(void);
int main (int argc, char **argv)
{
writehello();
return 0;
}
Compilation stays as is :) Thanks to everyone that helped!

The tutorial you're reading is not quite right. There has been two differing conventions for global symbols in the ELF (Executable and Linkable Format) executables. One convention says that all global C symbols should be prefixed with _, the other convention does not prefix the C symbols. In GNU/Linux, especially in x86-64 ABI, the global symbols are not prefixed with _. However the tutorial that you linked might be right for some other compiler for Linux/ELF that didn't use the GNU libc.
Now, what happens in your original code is that your assembler function would be visible as _write in C code, not write. Instead, the write symbol is found in the libc (the wrapper for write(2) system call):
ssize_t write(int fd, const void *buf, size_t count);
Now you declared this write as a function void write(void);, which leads to undefined behaviour as such when you call it. You can use strace ./program to find out what system calls it makes:
% strace ./program
...
write(1, "\246^P\313\374\177\0\0\0\0\0\0\0\0"..., 140723719521144) = -1 EFAULT (Bad address)
...
So it called the write system call not with your intended arguments, but with whatever garbage there was in the registers provided to glibc write wrapper. (actually the "garbage" is known here - first argument is the argc, and the second argument is the value of argv and the 3rd argument is the value of char **environ). And as the kernel noticed that a buffer starting at (void*)argv and 140723719521144 bytes long wasn't completely contained within the mapped address space, it returned EFAULT from that system call. Result: no crash, no message.
write is not a reserved word as such in C. It is a function and possibly a macro in POSIX. You could overwrite it, the linking order matters - if you program defines write, other code would be linked against this definition instead of the one found in glibc. However this would mean that other code calling write would end up calling your incompatible function instead.
Thus the solution is to not use a name that is a function in the GNU libc or in any other libraries that you've linked against. Thus in assembler you can use:
.global writehello
writehello:
and then
extern void writehello(void);
as you yourself have found out.

Related

Why are frame pointers saved in the beginning of the main function

Assume this C code:
int main(){
return 0;
}
Would look like this in assembly:
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
popq %rbp
ret
I know that Frame pointer fp needs to be saved in the start of functions by pushq %rbp since it needs to be restored when returnning to the caller function.
My question is why do so in main? what's the parent caller of main? Isn't fp pointing to a virtual address, meaning when main terminates the address doesn't mean anything anymore to the next program, correct?
Are fp (or even sp) values persistent between different programs and their address space?
what's the parent caller of main?
In linux main is called by __libc_start_main witch in term is called by _start, in windows I'm not so sure but there is also a _start.
In fact a neat trick is to start a C program without main:
#include <stdio.h>
#include <stdlib.h>
void _start()
{
printf("No main function!\n");
exit(0);
}
compile with:
gcc main.c -nostartfiles
For Windows(10, gcc 8.1.0) and Ubuntu(18.04, gcc 9.2.0)
clang -Wl,-e,-Wl,__start main.c
For MacOS (10.14.6, Xcode 11.3)
Here is an article that talks about Linux x86 Program Start Up

What parts of this HelloWorld assembly code are essential if I were to write the program in assembly?

I have this short hello world program:
#include <stdio.h>
static const char* msg = "Hello world";
int main(){
printf("%s\n", msg);
return 0;
}
I compiled it into the following assembly code with gcc:
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world"
.data
.align 4
.type msg, #object
.size msg, 4
msg:
.long .LC0
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl msg, %eax
movl %eax, (%esp)
call puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don't understand. For instance, I don't know what .cfi* is, and I'm wondering if I would need to include this to write this program in assembly.
The absolute bare minimum that will work on the platform that this appears to be, is
.globl main
main:
pushl $.LC0
call puts
addl $4, %esp
xorl %eax, %eax
ret
.LC0:
.string "Hello world"
But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is
.globl main
.type main, #function
main:
subl $24, %esp
pushl $.LC0
call puts
xorl %eax, %eax
addl $28, %esp
ret
.size main, .-main
.section .rodata
.LC0:
.string "Hello world"
Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.
The .cfi_* directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.
The line
.section .note.GNU-stack,"",#progbits
is also important to know about if you are writing assembly language by hand; it is another optional annotation, but a valuable one, because what it means is "nothing in this object file requires the stack to be executable." If all the object files in a program have this annotation, the kernel won't make the stack executable, which improves security a little bit.
(To indicate that you do need the stack to be executable, you put "x" instead of "". GCC may do this if you use its "nested function" extension. (Don't do that.))
It is probably worth mentioning that in the "AT&T" assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line
with a single token on it, ending in a colon, is a label. (I don't remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.
related: How to remove "noise" from GCC/clang assembly output? The .cfi directives are not directly useful to you, and the program would work without them. (It's stack-unwind info needed for exception handling and backtraces, so -fomit-frame-pointer can be enabled by default. And yes, gcc emits this even for C.)
As far as the number of asm source lines needed to produce a value Hello World program, obviously we want to use libc functions to do more work for us.
#Zwol's answer has the shortest implementation of your original C code.
Here's what you could do by hand, if you don't care about the exit status of your program, just that it prints your string.
# Hand-optimized asm, not compiler output
.globl main # necessary for the linker to see this symbol
main:
# main gets two args: argv and argc, so we know we can modify 8 bytes above our return address.
movl $.LC0, 4(%esp) # replace our first arg with the string
jmp puts # tail-call puts.
# you would normally put the string in .rodata, not leave it in .text where the linker will mix it with other functions.
.section .rodata
.LC0:
.asciz "Hello world" # asciz zero-terminates
The equivalent C (you just asked for the shortest Hello World, not one that had identical semantics):
int main(int argc, char **argv) {
return puts("Hello world");
}
Its exit status is implementation-defined but it definitely prints. puts(3) returns "a non-negative number", which could be outside the 0..255 range, so we can't say anything about the program's exit status being 0 / non-zero in Linux (where the process's exit status is the low 8 bits of the integer passed to the exit_group() system call (in this case by the CRT startup code that called main()).
Using JMP to implement the tail-call is a standard practice, and commonly used when a function doesn't need to do anything after another function returns. puts() will eventually return to the function that called main(), just like if puts() had returned to main() and then main() had returned. main()'s caller still has to deal with the args it put on the stack for main(), because they're still there (but modified, and we're allowed to do that).
gcc and clang don't generate code that modifies arg-passing space on the stack. It is perfectly safe and ABI-compliant, though: functions "own" their args on the stack, even if they were const. If you call a function, you can't assume that the args you put on the stack are still there. To make another call with the same or similar args, you need to store them all again.
Also note that this calls puts() with the same stack alignment that we had on entry to main(), so again we're ABI-compliant in preserving the 16B alignment required by modern version of the x86-32 aka i386 System V ABI (used by Linux).
.string zero-terminates strings, same as .asciz, but I had to look it up to check. I'd recommend just using .ascii or .asciz to make sure you're clear on whether your data has a terminating byte or not. (You don't need one if you use it with explicit-length functions like write())
In the x86-64 System V ABI (and Windows), args are passed in registers. This makes tail-call optimization a lot easier, because you can rearrange args or pass more args (as long as you don't run out of registers). This makes compilers willing to do it in practice. (Because as I said, they currently don't like to generate code that modifies the incoming arg space on the stack, even though the ABI is clear that they're allowed to, and compiler generated functions do assume that callees clobber their stack args.)
clang or gcc -O3 will do this optimization for x86-64, as you can see on the Godbolt compiler explorer:
#include <stdio.h>
int main() { return puts("Hello World"); }
# clang -O3 output
main: # #main
movl $.L.str, %edi
jmp puts # TAILCALL
# Godbolt strips out comment-only lines and directives; there's actually a .section .rodata before this
.L.str:
.asciz "Hello World"
Static data addresses always fit in the low 31 bits of address-space, and executable don't need position-independent code, otherwise the mov would be lea .LC0(%rip), %rdi. (You'll get this from gcc if it was configured with --enable-default-pie to make position-independent executables.)
How to load address of function or label into register in GNU Assembler
Hello World using 32-bit x86 Linux int 0x80 system calls directly, no libc
See Hello, world in assembly language with Linux system calls? My answer there was originally written for SO Docs, then moved here as a place to put it when SO Docs closed down. It didn't really belong here so I moved it to another question.
related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.

C passes value instead of address to assembly function (x64)

I need to pass address instead of value of my field from C to assembly function, and I have no idea why I end up with value instead of address.
C code:
long n = 1,ret = 0;
fun(&n, &ret);
//the rest is omitted
Assembly code:
.globl fun
fun:
pushq %rbp
movq %rsp, %rbp
movq 16(%rbp), %rax #my n address
movq 24(%rbp), %rbx #my ret address
cmpq $0, %rax
//the rest is omitted
When I peek values of %rax and %rbx with gdb I can see that I have values in my registers:
Breakpoint 1, fun () at cw.s:6
6 movq 16(%rbp), %rax #my n address
(gdb) s
7 movq 24(%rbp), %rbx #my ret address
(gdb) s
9 cmpq $0, %rax
(gdb) p $rax
$1 = 1
(gdb) p $rbx
$2 = 0
I don't really see whats wrong with my code. I'm sure that &n makes C pass address instead of value. I am following the solution provided here, but with no luck.
Calling a C function in assembly
Update:
I'm running LXLE (it's a fork of Ubuntu) on AMD x86_64. The compiler used is gcc (Ubuntu 4.8.2-19ubuntu1) and GNU assembler (GNU Binutils for Ubuntu) 2.24. My makefile:
cw: cw.c cw.o
gcc cw.o cw.c -o cw
cw.o: cw.s
as -gstabs -o cw.o cw.s
What architecture are you on? What compiler generated the code for fun? Did you write it yourself?
The code is using the r* registers and your question mentions "x64", so I would assume it's some amd64/x86-64/x64 architecture. You're reading things from the stack (which you've commented as "my n/ret address") which I would assume that you expect the function arguments to be there but I'm not aware of any ABI on that CPU family that passes the first arguments to a function on the stack.
If you wrote it yourself, you need to read up on the calling conventions of the ABI your operating system/compiler uses, because unless you're on a very obscure operating system it will not pass (the first few) function arguments on the stack. Most likely you're just reading random values from the stack that just happen to match where your compiler happened to put the values in the calling function.
If you're on Linux or most other unix-like system that use the SysV ABI the first two arguments to a function will be in the rdi, rsi registers. If you're on Windows, that will be rcx, rdx. This is assuming that your arguments are int/long/pointers. If the arguments are structs, floating point or such, other rules apply.

Where is the "2+2" in this Assembly code (translated by gcc from C)

I've written this simple C code
int main()
{
int calc = 2+2;
return 0;
}
And I want to see how that looks in assembly, so I compiled it using gcc
$ gcc -S -o asm.s test.c
And the result was ~65 lines (Mac OS X 10.8.3) and I only found these to be related:
Where do I look for my 2+2 in this code?
Edit:
One part of the question hasn't been addressed.
If %rbp, %rsp, %eax are variables, what values do they attain in this case?
Almost all of the code you got is just useless stack manipulation. With optimization on (gcc -S -O2 test.c) you will get something like
main:
.LFB0:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE0:
Ignore every line that starts with a dot or ends with a colon: there are only two assembly instructions:
xorl %eax, %eax
ret
and they encode return 0;. (XORing a register with itself sets it to all-bits-zero. Function return values go in register %eax per the x86 ABI.) Everything to do with your int calc = 2+2; has been discarded as unused.
If you changed your code to
int main(void) { return 2+2; }
you would instead get
movl $4, %eax
ret
where the 4 comes from the compiler doing the addition itself rather than making the generated program do it (this is called constant folding).
Perhaps more interesting is if you change the code to
int main(int argc, char **argv) { return argc + 2; }
then you get
leal 2(%rdi), %eax
ret
which is doing some real work at runtime! In the 64-bit ELF ABI, %rdi holds the first argument to the function, argc in this case. leal 2(%rdi), %eax is x86 assembly language for "%eax = %edi + 2" and it's being done this way mainly because the more familiar add instruction takes only two arguments, so you can't use it to add 2 to %rdi and put the result in %eax all in one instruction. (Ignore the difference between %rdi and %edi for now.)
The compiler determined that 2+2 = 4 and inlined it. The constant is stored in line 10 (the $4). To verify this, change the math to 2+3 and you will see $5
EDIT: as for the registers themselves, %rsp is the stack pointer, %rbp is the frame pointer, and %eax is a general register
Here is an explanation of the assembly code:
pushq %rbp
This saves a copy of the frame pointer on the stack. The function itself does not need this; it is there so that debuggers or exception handlers can find frames on the stack.
movq %rsp, %rbp
This starts a new frame by setting the frame pointer to point to the current top-of-stack. Again, the function does not need this; it is housekeeping to maintain a proper stack.
mov $4, -12(%rbp)
Here the compiler initializes calc to 4. Several things have happened here. First, the compiler evaluated 2+2 by itself and used the result, 4, in the assembly code. The arithmetic is not performed in the executing program; it was completed in the compiler. Second, calc has been assigned the location 12 bytes below the frame pointer. (This is interesting because it is also below the stack pointer. The OS X ABI for this architecture includes a “red zone” below the stack pointer that programs are permitted to use, which is unusual.) Third, the program was clearly compiled without optimization. We know that because the optimizer would recognize that this code has no effect and is useless, so it would remove it.
movl $0, -8(%rbp)
This code stores 0 in the place the compiler has set aside to prepare the return value of main.
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
This copies data from the place where the return value is prepared to a temporary handling location. This is even more useless than the previous code, reinforcing the conclusion that optimization was not used. This looks like code I would expect at a negative optimization level.
movl -4(%rbp), %eax
This moves the return value from the temporary handling location to the register in which it is returned to the caller.
popq %rbp
This restores the frame pointer, thus removing the previously-pushed frame from the stack.
ret
This puts the program out of its misery.
Your program has no observable behavior, which means that in general case the compiler might not generate any machine code for it at all, besides some minimal startup-wrapup instructions intended to ensure that zero is returned to the calling environment. At least declare your variable as volatile. Or print its value after evaluating it. Or return it from main.
Also note that in C language 2 + 2 qualifies as integral constant expression. This means that compiler is not just allowed, but actually required to know the result of that expression at compile time. Taking this into account, it would be strange to expect the compiler to evaluate 2 + 2 at run time when the final value is known at compile time (even if you completely disable optimizations).
The compiler optimized it away, it pre-computed the answer and just set the result. If you want to see the compiler do the add then you cannot let it "see" the constants you are feeding it
If you compile this code all by itself as an object (gcc -O2 -c test_add.c -o test_add.o)
then you will force the compiler to generate the add code. But the operands will be registers or on the stack.
int test_add ( int a, int b )
{
return(a+b);
}
Then if you call it from code in a separate source (gcc -O2 -c test.c -o test.o) then you will see the two operands be forced into the function.
extern int test_add ( int, int );
int test ( void )
{
return(test_add(2,2));
}
and you can disassemble both of those objects (objdump -D test.o, objdump -D test_add.o)
When you do something that simple in one file
int main ( void )
{
int a,b,c;
a=2;
b=2;
c=a+b;
return(0);
}
The compiler can optimize your code into one of a few equivalents. My example here, does nothing, the math and results have no purpose, they are not used, so they can simply be removed as dead code. Your opitmization did this
int main ( void )
{
int c;
c=4;
return(0);
}
But this is also a perfectly valid optimization of the above code
int main ( void )
{
return(0);
}
EDIT:
Where is the calc=2+2?
I believe the
movl $4,-12(%rbp)
Is the 2+2 (the answer is computed and simply placed in calc which is on the stack.
movl $0,-8(%rbp)
I assume is the 0 in your return(0);
The actual math of adding two numbers was optimized out.
I guess line 10, he optimzed since all are constants

Get the Stack Pointer in C on Mac OS X Lion

I've run into some strange behaviour when trying to obtain the current stack pointer in C (using inline ASM). The code looks like:
#include <stdio.h>
class os {
public:
static void* current_stack_pointer();
};
void* os::current_stack_pointer() {
register void *esp __asm__ ("rsp");
return esp;
}
int main() {
printf("%p\n", os::current_stack_pointer());
}
If I compile the code using the standard gcc options:
$ g++ test.cc -o test
It generates the following assembly:
__ZN2os21current_stack_pointerEv:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rdi,0xf8(%rbp)
0000000000000008 movq 0xe0(%rbp),%rax
000000000000000c movq %rax,%rsp
000000000000000f movq %rsp,%rax
0000000000000012 movq %rax,0xe8(%rbp)
0000000000000016 movq 0xe8(%rbp),%rax
000000000000001a movq %rax,0xf0(%rbp)
000000000000001e movq 0xf0(%rbp),%rax
0000000000000022 popq %rbp
If I run the resulting binary it crashes with a SIGILL (Illegal Instruction). However if I add a little optimisation to the compile:
$ g++ -O1 test.cc -o test
The generated assembly is much simpler:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movq %rsp,%rax
0000000000000007 popq %rbp
0000000000000008 ret
And the code runs fine. So to the question; is there a more stable to get hold of the stack pointer from C code on Mac OS X? The same code has no problems on Linux.
The problem with attempting to fetch the stack pointer through a function call is that the stack pointer inside the called function is pointing at a value that will be completely different after the function returns, and therefore you're capturing the address of a location that will be invalid after the call. You're also making the assumption that there was no function prologue added by the compiler on that platform (i.e., both your functions currently have a prologue where the compiler setups up the current activation record on the stack for the function, which will change the value of RSP that you are attempting to capture). At the very least, provided that there was no function prologue added by the compiler, you will need to subtract the size of a pointer on the platform you're using in order to actually get the "true" address to where the stack will be pointing after the return from the function call. This is because the assembly command call pushes the return address for the instruction pointer onto the stack, and ret in the callee will pop that value off the stack. Thus inside the callee, there will at the very least be a return-address instruction that the stack-pointer will be pointing to, and that location won't be valid after the function call. Finally, on certain platforms (unfortunately not x86), you can use the __attributes__((naked)) tag to create a function with no prologue in gcc. Using the inline keyword to avoid a prologue is not completely reliable since it does not force the compiler to inline the function ... under certain low-optimization levels, inlining will not occur, and you'll end up with a prologue again, and the stack-pointer will not be pointing to the correct location if you decide to take it's address in those cases.
If you must have the value of the stack pointer, then the only reliable method will be to use assembly, follow the rules of your platform's ABI, compile to an object file using an assembler, and then link that object file with the rest of the object files in your executable. You can then expose the assembler function to the rest of your code by including a function declaration in a header file. So your code could look like (assuming you're using gcc to compile your assembly):
//get_stack_pointer.h
extern "C" void* get_stack_ptr();
//get_stack_pointer.S
.section .text
.global get_stack_ptr
get_stack_ptr:
movq %rsp, %rax
addq $8, %rax
ret
Rather than using a register variable with a constraint, you should just write some explicit inline assembler to fetch %esp:
static void *getsp(void)
{
void *sp;
__asm__ __volatile__ ("movq %%rsp,%0"
: "=r" (sp)
: /* No input */);
return sp;
}
You can also convert this to a macro using gcc statement expressions:
#define GETSP() ({void *sp;__asm__ __volatile__("movl %%esp,%0":"=r"(sp):);sp;})
A multi arch version was what I needed recently:
/**
* helps to check the architecture macros:
* `echo | gcc -E -dM - | less`
*
* this is arm, x64 and i386 (linux | apple) compatible
* #return address where the stack starts
*/
void *get_sp(void) {
void *sp;
__asm__ __volatile__(
#ifdef __x86_64__
"movq %%rsp,%0"
#elif __i386__
"movl %%esp,%0"
#elif __arm__
// sp is an alias for r13
"mov %%sp,%0"
#endif
: "=r" (sp)
: /* no input */
);
return sp;
}
I do not have a reference for that, but GCC is known to occasionally (often) misbehave in the presence of inline assembly if compilation is not optimized at all. So you should always add the -O1 flag.
As a side-note, what you are trying to do is not very robust in the presence of an optimizing compiler, because the compiler may inline the call to current_stack_pointer() and the returned value may thus be an approximation of the current stack pointer value (not even a lower bound).

Resources