So I have this c code:
#include <stdio.h>
int main(void)
{
int a;
int b;
int c;
a=b=c=5;
printf("Hi%d%d%dHi",a,b,c);
}
I compiled it on ubuntu with:
gcc program.c -o program -ggdb -m32 -O2
And then disassembled it with:
objdump -M intel program -d
And in main printf() gets called like this:
mov DWORD PTR [esp+0x10],0x5
mov DWORD PTR [esp+0xc],0x5
mov DWORD PTR [esp+0x8],0x5
mov DWORD PTR [esp+0x4],0x8048500
mov DWORD PTR [esp],0x1
call 8048330 <__printf_chk#plt>
What I am wondering right now is what this means:
mov DWORD PTR [esp],0x1
I know what the first 4 mov instructions are for, but I just can't figure out why a '1' gets pushed onto the stack. Also this mov only occurs when optimization is turned on. Any ideas?
The GNU C library (glibc) will use __printf_chk instead of printf if you (or the the compiler) defines _FORTIFY_SOURCE and optimization is enabled. The _chk version of the function behaves just like the function it replaces except it's supposed to check for stack overflow and maybe validate the arguments. The extra first argument indicates how much checking and validation should occur.
Looking at the actual glibc implmenation it appears that doesn't do any additional stack checking over what the compiler automatic provides (and so shouldn't be necessary) and the validation of arguments is very minimal. It will check that %n only appears on read-only format strings, and checks that if the special %m$ argument specifiers are used that they're used for all arguments without any gaps.
Related
I'm pretty new to C, and I know that static functions can only be used within the same object file.
Something that still confuses me though, is how if I hover over a call to printf in my IDE it tells me that printf is a static function, when I can perfectly use printf in multiple object files without any problems?
Why is that?
Edit: I'm using Visual Studio Code the library is stdio.h and compiling using GCC
#include <stdio.h>
int main()
{
printf("Hello, world!");
return 0;
}
Hovering over printf would give that hint
Edit 2: if static inline functions are different from inline functions how so? I don't see how making a function inline would change the fact that it's only accessible from the same translation unit
Edit 3: per request, here's the definition of printf in the stdio.h header
__mingw_ovr
__attribute__((__format__ (gnu_printf, 1, 2))) __MINGW_ATTRIB_NONNULL(1)
int printf (const char *__format, ...)
{
int __retval;
__builtin_va_list __local_argv; __builtin_va_start( __local_argv, __format );
__retval = __mingw_vfprintf( stdout, __format, __local_argv );
__builtin_va_end( __local_argv );
return __retval;
}
screenshot of the printf definition in my IDE
Thanks for all the updates. Your IDE is showing you implementation details of your C library that you're not supposed to have to worry about. That is arguably a bug in your IDE but there's nothing wrong with the C library.
You are correct to think that a function declared static is visible only within the translation unit where it is defined. And that is true whether or not the function is also marked inline. However, this particular static inline function is defined inside stdio.h, so every translation unit that includes stdio.h can see it. (There is a separate copy of the inline in each TU. This is technically a conformance violation since, as chux surmises, it means that &printf in one translation unit will not compare equal to &printf in another. However, this is very unlikely to cause problems for a program that isn't an ISO C conformance tester.)
As "Adrian Mole" surmised in the comments on the question, the purpose of this inline function is, more or less, to rewrite
printf("%s %d %p", string, integer, pointer);
into
fprintf(stdout, "%s %d %p", string, integer, pointer);
(All the stuff with __builtin_va_start and __mingw_vfprintf is because of limitations in how variadic functions work in C. The effect is the same as what I showed above, but the generated assembly language will not be nearly as tidy.)
Update 2022-12-01: It's worse than "the generated assembly language will not be nearly as tidy". I experimented with all of the x86 compilers supported by godbolt.org and none of them will inline a function that takes a variable number of arguments, even if you try to force it. Most silently ignore the force-inlining directive; GCC gets one bonus point for actually saying it refuses to do this:
test.c:4:50: error: function ‘xprintf’ can never be inlined
because it uses variable argument lists
In consequence, every program compiled against this version of MinGW libc, in which printf is called from more than one .c file, will have multiple copies of the following glob of assembly embedded in its binary. This is bad just because of cache pollution; it would be better to have a single copy in the C library proper -- which is exactly what printf normally is.
printf:
mov QWORD PTR [rsp+8], rcx
mov QWORD PTR [rsp+16], rdx
mov QWORD PTR [rsp+24], r8
mov QWORD PTR [rsp+32], r9
push rbx
push rsi
push rdi
sub rsp, 48
mov rdi, rcx
lea rsi, QWORD PTR f$[rsp+8]
mov ecx, 1
call __acrt_iob_func
mov rbx, rax
call __local_stdio_printf_options
xor r9d, r9d
mov QWORD PTR [rsp+32], rsi
mov r8, rdi
mov rdx, rbx
mov rcx, QWORD PTR [rax]
call __stdio_common_vfprintf
add rsp, 48
pop rdi
pop rsi
pop rbx
ret 0
(Probably not exactly this assembly, this is what godbolt's "x64 MSVC 19.latest" produces with optimization option /O2, godbolt doesn't seem to have any MinGW compilers. But you get the idea.)
__mingw_ovr is normally defined as
#define __mingw_ovr static __attribute__ ((__unused__)) __inline__ __cdecl
It appears that this definition of printf is a violation of the C standard. The standard says
7.1.2 Standard headers
6 Any declaration of a library function shall have external linkage
I have written a minimal function to test whether I can call/link C and x86_64 assembly code.
Here is my main.c
#include <stdio.h>
extern int test(int);
int main(int argc, char* argv[])
{
int a = 10;
int b = test(a);
printf("b=%d\n", b);
return 0;
}
Here is my test.asm
section .text
global test
test:
mov ebx,2
add eax,ebx
ret
I built an executable using this script
#!/usr/bin/env bash
nasm -f elf64 test.asm -o test.o
gcc -c main.c -o main.o
gcc main.o test.o -o a.out
I wrote test.asm without having any real clue what I was doing. I then went away and did some reading, and now I don't understand how my code appears to be working, as I have convinced myself that it shouldn't be.
Here's a list of reasons why I think this shouldn't work:
I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.
The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.
ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.
There may even be other reasons which I don't know about.
Is it a complete fluke that what I have written produces the correct output?
I am completely new to this. While I have heard of some x86 concepts in passing this is the first time I have actually attempted to write some. Got to start somewhere?
Edit: For future reference here is a corrected code
test:
; save old base pointer
push rbp ; sub rsp, 8; mov [rsp] rbp
mov rbp, rsp ; mov rbp, rsp ;; rbp = rsp
; initializes new stack frame
add rdi, 2 ; add 2 to the first argument passed to this function
mov rax, rdi ; return value passed via rax
; did not allocate any local variables, nothing to add to
; stack pointer
; the stack pointer is unchanged
pop rbp ; restore old base pointer
ret ; pop the return address off the stack and jump
; call and ret modify or save the rip instruction pointer
I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.
That's not needed. Try compiling some C code with -O3 and you'll see it doesn't happen then.
The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.
That part is only working because of a fluke. The assembly happens to put 10 in eax too the way you compiled, but there's no guarantee this will always happen. Again, compile with -O3 and it won't anymore.
ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.
This part is fine. The return address gets supplied by the caller. It'll always be at the top of the stack when your function gets entered.
There may even be other reasons which I don't know about.
Yes, there's one more: ebx is call-saved but you're clobbering it. If the calling function (or anything above it in the stack) used it, then this would break it.
Is it a complete fluke that what I have written produces the correct output?
Yes, because of the second and fourth points above.
For reference, here's a comparison of the assembly that gcc produces from your C code at the -O0 (the default optimization level) and -O3: https://godbolt.org/z/7P13fbb1a
I have the following code snippet:
#include <inttypes.h>
#include <stdio.h>
uint64_t
esp_func(void)
{
__asm__("movl %esp, %eax");
}
int
main()
{
uint32_t esp = 0;
__asm__("\t movl %%esp,%0" : "=r"(esp));
printf("esp: 0x%08x\n", esp);
printf("esp: 0x%08lx\n", esp_func());
return 0;
}
Which prints the following upon multiple executions:
❯ clang -g esp.c && ./a.out
esp: 0xbd3b7670
esp: 0x7f8c1c2c5140
❯ clang -g esp.c && ./a.out
esp: 0x403c9040
esp: 0x7f9ee8bd8140
❯ clang -g esp.c && ./a.out
esp: 0xb59b70f0
esp: 0x7fe301f8c140
❯ clang -g esp.c && ./a.out
esp: 0x6efa4110
esp: 0x7fd95941f140
❯ clang -g esp.c && ./a.out
esp: 0x144e72b0
esp: 0x7f246d4ef140
esp_func shows that ASLR is active with 28 bits of entropy, which makes sense on my modern Linux kernel.
What doesn't make sense is the first value: why is it drastically different?
I took a look at the assembly and it looks weird...
// From main
0x00001150 55 push rbp
0x00001151 4889e5 mov rbp, rsp
0x00001154 4883ec10 sub rsp, 0x10
0x00001158 c745fc000000. mov dword [rbp-0x4], 0
0x0000115f c745f8000000. mov dword [rbp-0x8], 0
0x00001166 89e0 mov eax, esp ; Move esp to eax
0x00001168 8945f8 mov dword [rbp-0x8], eax ; Assign eax to my variable `esp`
0x0000116b 8b75f8 mov esi, dword [rbp-0x8]
0x0000116e 488d3d8f0e00. lea rdi, [0x00002004]
0x00001175 b000 mov al, 0
0x00001177 e8b4feffff call sym.imp.printf ; For whatever reason, the value in [rbp-0x8]
; is assigned here. Why?
// From esp_func
0x00001140 55 push rbp
0x00001141 4889e5 mov rbp, rsp
0x00001144 89e0 mov eax, esp ; Move esp to eax (same instruction as above)
0x00001146 488b45f8 mov rax, qword [rbp-0x8] ; This changes everything. What is this?
0x0000114a 5d pop rbp
0x0000114b c3 ret
0x0000114c 0f1f4000 nop dword [rax]
So my question is, what is in [rbp-0x8], how did it get there, and why are the two values different?
No, stack ASLR happens once at program startup. Relative adjustments to RSP between functions are fixed at compile time, and are just the small constants to make space for a function's local vars. (C99 variable-length arrays and alloca do runtime-variable adjustments to RSP, but not random.)
Your program contains Undefined Behaviour and isn't actually printing RSP; instead some stack address left in a register by the previous printf call (which appears to be a stack address, so its high bits do vary with ASLR). It tells you nothing about stack-pointer differences between functions, just how not to use GNU C inline asm.
The first value is printing the current ESP correctly, but that's only the low 32 bits of the 64-bit RSP.
Falling off the end of a non-void function is not safe, and using the return value is Undefined Behaviour. Any caller that uses the return value of esp_func() necessarily would trigger UB, so the compiler is free to leave whatever it wants in RAX.
If you want to write mov %rsp, %rax / ret, then write that function in pure asm, or mov to an "=r"(tmp) local variable. Using GNU C inline asm to modify RAX without telling the compiler about it doesn't change anything; the compiler still sees this as a function with no return value.
MSVC inline asm is different: it is apparently supported to use _asm{ mov eax, 123 } or something and then fall off the end of a non-void function, and MSVC will respect that as the function return value even when inlining. GNU C inline asm doesn't need silly hacks like that: if you want your asm to interact with C values, use Extended asm with an output constraint like you're doing in main. Remember that GNU C inline asm is not parsed by the compiler, just emit the template string as part of the compiler's asm output to be assembled.
I don't know exactly why clang is reloading a return value from the stack, but that's just an artifact of clang internals and how it does code-gen with optimization disabled. But it's allowed to do this because of the undefined behaviour. It is a non-void function, so it needs to have a return value. The simplest thing would be to just emit a ret, and is what some compilers happen to do with optimization enabled, but even that doesn't fix the problem because of inter-procedural optimization.
It's actually Undefined Behaviour in C to use the return value of a function that didn't return one. This applies at the C level; using inline asm that modifies a register without telling the compiler about it doesn't change anything as far as the compiler is concerned. Therefore your program as a whole contains UB, because it passes the result to printf. That's why the compiler is allowed to compile this way: your code was already broken. In practice it's just returning some garbage from stack memory.
TL:DR: this is not a valid way to emit mov %rsp, %rax / ret as the asm definition for a function.
(C++ strengthens this to it being UB to fall off the end in the first place, but in C it's legal as long as the caller doesn't use the return value. If you compile the same source as C++ with optimization, g++ doesn't even emit a ret instruction after your inline asm template. Probably this is to support C's default-int return type if you declare a function without a return type.)
This UB is also why your modified version from comments (with the printf format strings fixed), compiled with optimization enabled (https://godbolt.org/z/sE7e84) prints "surprisingly" different "RSP" values: the 2nd one isn't using RSP at all.
#include <inttypes.h>
#include <stdio.h>
uint64_t __attribute__((noinline)) rsp_func(void)
{
__asm__("movq %rsp, %rax");
} // UB if return value used
int main()
{
uint64_t rsp = 0;
__asm__("\t movq %%rsp,%0" : "=r"(rsp));
printf("rsp: 0x%08lx\n", rsp);
printf("rsp: 0x%08lx\n", rsp_func()); // UB here
return 0;
}
Output example:
Compiler stderr
<source>:7:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
Program returned: 0
Program stdout
rsp: 0x7fff5c472f30
rsp: 0x7f4b811b7170
clang -O3 asm output shows that the compiler-visible UB was a problem. Even though you used noinline, the compiler can still see the function body and try to do inter-procedural optimization. In this case, the UB led it to just give up and not emit a mov %rsp, %rsi between call rsp_func and call printf, so it's printing whatever value the previous printf happened to leave in RSI
# from the Godbolt link
rsp_func: # #rsp_func
mov rax, rsp
ret
main: # #main
push rax
mov rsi, rsp
mov edi, offset .L.str
xor eax, eax
call printf
call rsp_func # return value ignored because of UB.
mov edi, offset .L.str
xor eax, eax
call printf # printf("0x%08lx\n", garbage in RSI left from last printf)
xor eax, eax
pop rcx
ret
.L.str:
.asciz "rsp: 0x%08lx\n"
GNU C Basic asm (without constraints) is not useful for anything (except the body of a __attribute__((naked)) function).
Don't assume the compiler will do what you expect when there is UB visible to it at compile time. (When UB isn't visible at compile time, the compiler has to make code that would work for some callers or callees, and you get the asm you expected. But compile-time-visible UB means all bets are off.)
I've made a function to calculate the length of a C string (I'm trying to beat clang's optimizer using -O3). I'm running macOS.
_string_length1:
push rbp
mov rbp, rsp
xor rax, rax
.body:
cmp byte [rdi], 0
je .exit
inc rdi
inc rax
jmp .body
.exit:
pop rbp
ret
This is the C function I'm trying to beat:
size_t string_length2(const char *str) {
size_t ret = 0;
while (str[ret]) {
ret++;
}
return ret;
}
And it disassembles to this:
string_length2:
push rbp
mov rbp, rsp
mov rax, -1
LBB0_1:
cmp byte ptr [rdi + rax + 1], 0
lea rax, [rax + 1]
jne LBB0_1
pop rbp
ret
Every C function sets up a stack frame using push rbp and mov rbp, rsp, and breaks it using pop rbp. But I'm not using the stack in any way here, I'm only using processor registers. It worked without using a stack frame (when I tested on x86-64), but is it necessary?
No, the stack frame is, at least in theory, not always required. An optimizing compiler might in some cases avoid using the call stack. Notably when it is able to inline a called function (in some specific call site), or when the compiler successfully detects a tail call (which reuses the caller's frame).
Read the ABI of your platform to understand requirements related to the stack.
You might try to compile your program with link time optimization (e.g. compile and link with gcc -flto -O2) to get more optimizations.
In principle, one could imagine a compiler clever enough to (for some programs) avoid using any call stack.
BTW, I just compiled a naive recursive long fact(int n) factorial function with GCC 7.1 (on Debian/Sid/x86-64) at -O3 (i.e. gcc -fverbose-asm -S -O3 fact.c). The resulting assembler code fact.s contains no call machine instruction.
Every C function sets up a stack frame using...
This is true for your compiler, not in general. It is possible to compile a C program without using the stack at all—see, for example, the method CPS, continuation passing style. Probably no C compiler on the market does so, but it is important to know that there are other ways to execute programs, in addition to stack-evaluation.
The ISO 9899 standard says nothing about the stack. It leaves compiler implementations free to choose whichever method of evaluation they consider to be the best.
I have this little C code
void decode(int *xp,int *yp,int *zp)
{
int a,b,c;
a=*yp;
b=*zp;
c=*xp;
*yp=c;
*zp=a;
*xp=b;
}
Then I compiled it to object file using gcc -c -O1 decode.c, and then dumped the object with objdump -M intel -d decode.o and the equivalent assembly code for this is
mov ecx,DWORD PTR [rsi]
mov eax,DWORD PTR [rdx]
mov r8d,DWORD PTR [rdi]
mov DWORD PTR [rsi],r8d
mov DWORD PTR [rdx],ecx
mov DWORD PTR [rdi],eax
ret
And I noticed that it doesnt use stack at all.But firstly values still need to be loaded to the registers. So my question is how do the arguments get loaded into the registers? does the compiler automatically loads the arguments to the registers behind the scenes? or something else happens? because there is no instructions that would load the arguments into the registers.
And a little off topic. When you increase optimization for compiling the relationship between original source code and machine code decreases,imposing dificulties to relate the machine code back to the source code. By default if you dont specify the optimization flag to the GCC it doesnt optimize the code. So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable. But when I applied Level 1 optimization the code appeared understandable and related to the source. But why?
The arguments are loaded to registers in the caller. Example:
int a;
int b;
int f(int, int);
int g(void) {
return f(a, b);
}
Look at the code generated for g:
$ gcc -O1 -S t.c
$ cat t.s
…
movl b(%rip), %esi
movl a(%rip), %edi
call f
Second question:
So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable.
This happens because unoptimized code is stupid. It is a straight translation of an intermediate representation in which each variable is stored in the stack even if it doesn't need to, each conversion is represented by an explicit operation even if one isn't needed, and so on. -O1 is the best level of optimization for reading the generated assembly. It is also possible to disable the frame pointer, which keeps the overhead to the minimum for simple functions.