Can/do C compilers optimize out adress-of in inline functions? - c

Let's say I have following code:
int f() {
int foo = 0;
int bar = 0;
// many more repeated operations in actual code
return foo+bar;
Abstracting repeated code into a separate functions, we get
static void change_locals(int *foo_p, int *bar_p) {
int f() {
int foo = 0;
int bar = 0;
change_locals(&foo, &bar);
change_locals(&foo, &bar);
return foo+bar;
I'd expect the compiler to inline the change_locals function, and optimize things like *(&foo)++ in the resulting code to foo++.
If I remember correctly, taking address of a local variable usually prevents some optimizations (e.g. it can't be stored in registers), but does this apply when no pointer arithmetic is done on the address and it doesn't escape from the function? With a larger change_locals, would it make a difference if it was declared inline (__inline in MSVC)?
I am particularly interested in behavior of GCC and MSVC compilers.

inline (and all its cousins _inline, __inline...) are ignored by gcc. It might inline anything it decides is an advantage, except at lower optimization levels.
The code procedure by gcc -O3 for x86 is:
.p2align 4,,15
.globl f
.type f, #function
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
popl %ebp
.ident "GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
It returns zero because *ptr++ doesn't do what you think. Correcting the increments to:
results in
.p2align 4,,15
.globl f
.type f, #function
pushl %ebp
movl $4, %eax
movl %esp, %ebp
popl %ebp
So it directly returns 4. Not only did it inline them, but it optimized the calculations away.
Vc++ from vs 2005 provides similar code, but it also created unreachable code for change_locals(). I used the command line
/O2 /FD /EHsc /MD /FA /c /TP

If I remember correctly, taking
address of a local variable usually
prevents some optimizations (e.g. it
can't be stored in registers), but
does this apply when no pointer
arithmetic is done on the address and
it doesn't escape from the function?
The general answer is that if the compiler can ensure that no one else will change a value behind its back, it can safely be placed in a register.
Think of this as though the compiler first performs inlining, then transforms all those *&foo (which results from the inlining) to simply foo before deciding if they should be placed in registers on in memory on the stack.
With a larger change_locals, would it
make a difference if it was declared
inline (__inline in MSVC)?
Again, generally speaking, whether or not a compiler decides to inline something is done using heuristics. If you explicitly specify that you want something to be inlines, the compiler will probably weight this into its decision process.

I've tested gcc 4.5, MSC and IntelC using this:
#include <stdio.h>
void change_locals(int *foo_p, int *bar_p) {
int main() {
int foo = printf("");
int bar = printf("");
change_locals(&foo, &bar);
change_locals(&foo, &bar);
printf( "%i\n", foo+bar );
And all of them did inline/optimize the foo+bar value, but also did
generate the code for change_locals() (but didn't use it).
Unfortunately, there's still no guarantee that they'd do the same for
any kind of such a "local function".
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
incl (%edx)
incl (%eax)
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
pushl %ebx
subl $28, %esp
call ___main
movl $LC0, (%esp)
call _printf
movl %eax, %ebx
movl $LC0, (%esp)
call _printf
leal 4(%ebx,%eax), %eax
movl %eax, 4(%esp)
movl $LC1, (%esp)
call _printf
xorl %eax, %eax
addl $28, %esp
popl %ebx


How do i get rid of call

I tried to compile and convert a very simple C program to assembly language.
I am using Ubuntu and the OS type is 64 bit.
This is the C Program.
void add();
int main() {
return 0;
if i use gcc -S -m32 -fno-asynchronous-unwind-tables -o simple.S simple.c this is how my assembly source code File should look like:
.file "main1.c"
.globl main
.type main, #function
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call add
movl $0, %eax
movl %ebp, %esp
popl %ebp
.size main, .-main
.ident "GCC: (Debian 4.4.5-8) 4.4.5" // this part should say Ubuntu instead of Debian
.section .note.GNU-stack,"",#progbits
but instead it looks like this:
.file "main0.c"
.globl main
.type main, #function
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
movl %eax, %ebx
call add#PLT
movl $0, %eax
popl %ecx
popl %ebx
popl %ebp
leal -4(%ecx), %esp
.size main, .-main
.type, #function
movl (%esp), %eax
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
At my University they told me to use the Flag -m32 if I am using a 64 bit Linux version. Can somebody tell me what I am doing wrong?
Am I even using the correct Flag?
edit after -fno-pie
.file "main0.c"
.globl main
.type main, #function
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
call add
movl $0, %eax
addl $4, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
.size main, .-main
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
it looks better but it's not exactly the same.
for example what does leal mean?
As a general rule, you cannot expect two different compilers to generate the same assembly code for the same input, even if they have the same version number; they could have any number of extra "patches" to their code generation. As long as the observable behavior is the same, anything goes.
You should also know that GCC, in its default -O0 mode, generates intentionally bad code. It's tuned for ease of debugging and speed of compilation, not for either clarity or efficiency of the generated code. It is often easier to understand the code generated by gcc -O1 than the code generated by gcc -O0.
You should also know that the main function often needs to do extra setup and teardown that other functions do not need to do. The instruction leal 4(%esp),%ecx is part of that extra setup. If you only want to understand the machine code corresponding to the code you wrote, and not the nitty details of the ABI, name your test function something other than main.
(As pointed out in the comments, that setup code is not as tightly tuned as it could be, but it doesn't normally matter, because it's only executed once in the lifetime of the program.)
Now, to answer the question that was literally asked, the reason for the appearance of
is because your compiler defaults to generating "position-independent" executables. Position-independent means the operating system can load the program's machine code at any address in (virtual) memory and it'll still work. This allows things like address space layout randomization, but to make it work, you have to take special steps to set up a "global pointer" at the beginning of every function that accesses global variables or calls another function (with some exceptions). It's actually easier to explain the code that's generated if you turn optimization on:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
This is all just setting up main's stack frame and saving registers that need to be saved. You can ignore it.
call __x86.get_pc_thunk.bx
The special function __x86.get_pc_thunk.bx loads its return address -- which is the address of the addl instruction that immediately follows -- into the EBX register. Then we add to that address the value of the magic constant _GLOBAL_OFFSET_TABLE_, which, in position-independent code, is the difference between the address of the instruction that uses _GLOBAL_OFFSET_TABLE_ and the address of the global offset table. Thus, EBX now points to the global offset table.
call add#PLT
Now we call add#PLT, which means call add, but jump through the "procedure linkage table" to do it. The PLT takes care of the possibility that add is defined in a shared library rather than the main executable. The code in the PLT uses the global offset table and assumes that you have already set EBX to point to it, before calling an #PLT symbol.  That's why main has to set up EBX even though nothing appears to use it. If you had instead written something like
extern int number;
int main(void) { return number; }
then you would see a direct use of the GOT, something like
call __x86.get_pc_thunk.bx
movl number#GOT(%ebx), %eax
movl (%eax), %eax
We load up EBX with the address of the GOT, then we can load the address of the global variable number from the GOT, and then we actually dereference the address to get the value of number.
If you compile 64-bit code instead, you'll see something different and much simpler:
movl number(%rip), %eax
Instead of all this mucking around with the GOT, we can just load number from a fixed offset from the program counter. PC-relative addressing was added along with the 64-bit extensions to the x86 architecture. Similarly, your original program, in 64-bit position-independent mode, will just say
call add#PLT
without setting up EBX first. The call still has to go through the PLT, but the PLT uses PC-relative addressing itself and doesn't need any help from its caller.
The only difference between __x86.get_pc_thunk.bx and is which register they store their return address in: EBX for .bx, EAX for .ax. I have also seen GCC generate .cx and .dx variants. It's just a matter of which register it wants to use for the global pointer -- it must be EBX if there are going to be calls through the PLT, but if there aren't any then it can use any register, so it tries to pick one that isn't needed for anything else.
Why does it call a function to get the return address? Older compilers would do this instead:
call 1f
1: pop %ebx
but that screws up return-address prediction, so nowadays the compiler goes to a little extra trouble to make sure every call is paired with a ret.
The extra junk you're seeing is due to your version of GCC special-casing main to compensate for possibly-broken entry point code starting it with a misaligned stack. I'm not sure how to disable this or if it's even possible, but renaming the function to something other than main will suppress it for the sake of your reading.
After renaming to xmain I get:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
call add
movl $0, %eax

Calling a function from a struct (C/Assembly)

I'm trying to port some code from C to assembly but I'm running into some trouble here. In the c function I get passed a struct. Inside this struct there are saved two functions like this:
typedef struct sort sort_t;
struct sort {
void *data;
cmpfunc_t cmpfunc;
cpyfunc_t cpyfunc;
In the C code these functions are called like this (m being a pointer to the struct):
m->cpyfunc(m->data, j, k);
Now I'm trying to do this in assembly. I've realized that structs are saved sequentially in memory. So if m was stored in %ebx then cmpfunc would be found in 4(%ebx). But I can't figure out how to call this function from assembly. I've tried both running directly from 4(%ebx) by doing:
call *4(%ebx)
That wouldn't work so I tried:
movl 4(%ebx),%edx
call *%edx
But to no avail. I can't seem to find any way to do this and any searching I've tried have turned up nothing. How would I do this in Assembly?
I made a little test program:
typedef void (*cpyfunc_t)(void *, int, int);
typedef void (*cmpfunc_t)(void);
struct sort {
void *data;
cmpfunc_t cmpfunc;
cpyfunc_t cpyfunc;
int main()
struct sort *m;
int k,j;
m->cpyfunc(m->data, j, k);
and compiled with the ELLCC demo. I got this:
.file "/tmp/webcompile/_31142_0.c"
.globl main
.align 16, 0x90
.type main,#function
main: # #main
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
subl $32, %esp
movl $0, %eax
movl -12(%ebp), %ecx
movl 8(%ecx), %ecx
movl -12(%ebp), %edx
movl (%edx), %edx
movl -20(%ebp), %esi
movl -16(%ebp), %edi
movl %edx, (%esp)
movl %esi, 4(%esp)
movl %edi, 8(%esp)
movl %eax, -24(%ebp) # 4-byte Spill
calll *%ecx
movl -24(%ebp), %eax # 4-byte Reload
addl $32, %esp
popl %esi
popl %edi
popl %ebp
.size main, .Ltmp0-main
.section ".note.GNU-stack","",#progbits
Note that the cpyfunc element is at offset 8 in the struct.
Edit: I did have to turn off optimizations because the ELLCC compiler (which is clang based) optimized the function to nothing with optimization turned on.
One way I've found to find out how to write something from C in Assembly is to see how the compiler does it. A C compiler will first internally convert your code into Assembly and then run the Assembler to create the object file.
You can usually pass in a flag to your compiler to create the intermediate assembly code. Do this, then take a look at how the C compiler calls the function pointers.
You can also run an objdump disassemble-all on your object file which is created by the compiler. This produces similar results.

What exactly happens when you dereference a static variable in C?

So lets say I have this code
int my_static_int = 4;
I passed the function a pointer to my_static_int, obviously. But what happens when the code is compiled? Avenue I've considered:
1) When you declare a non-pointer variable, C automatically creates its pointer and does something internally like typedefs my_static_int to be *(internal_reference)
Anyway, I hope that my question is descriptive enough
Pointers are just a term to help us humans understand what's going on.
The & operator when used with a variable simply means address of. No "pointer" is created at runtime, you are simply passing in the address of the variable into the function.
If you have:
int x = 3;
int* p = &x;
Then p is a variable which holds a memory address. Inside that memory address is an int.
If you really want to know how the code looks under the covers, you have to get the compiler to generate the assembler code (gcc can do this with the -S option).
When you truly grok C and pointers at their deepest level, you'll realise that it's just the address of the variable being passed in rather than the value of the variable. There's no need for creating extra memory to hold a pointer since the pointer is moved directly from the code to the stack (the address will probably have been set either at link time or load time, not run time).
There's also no need for internal type creation since the compiled code already knows the type and how to manipulate it.
Keeping in mind that this is implementation-specific, consider the following code:
int my_static_int = 4;
static void func (int *x) {
*x = *x + 7;
int main (void) {
return 0;
which, when compiled with gcc -S to get the assembler, produces:
.file "qq.c"
.globl _my_static_int
.align 4
.long 4
.def _func; .scl 3; .type 32; .endef
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl 8(%ebp), %edx
movl (%edx), %edx
addl $7, %edx
movl %edx, (%eax)
popl %ebp
.def ___main; .scl 2; .type 32; .endef
.globl _main
.def _main; .scl 2; .type 32; .endef
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $_my_static_int, (%esp)
call _func
movl $0, %eax
The important bit is these sections:
movl $_my_static_int, (%esp) ; load address of variable onto stack.
call _func ; call the function.
movl 8(%ebp), %eax ; get passed parameter (the address of the var) into eax
movl 8(%ebp), %edx ; and also into edx.
movl (%edx), %edx ; get the value from the address (dereference).
addl $7, %edx ; add 7 to it.
movl %edx, (%eax) ; and put it back into the same address.
Hence the address is passed, and used to get at the variable.
When the code is compiled, function func receives the address of your my_static_int variable as parameter. Nothing else.
There no need to create any implicit pointers when you declare a non-pointer variable. It is not clear from your question how you came to this weird idea.
Why not look at the assembly output? You can do this with gcc using the -S option, or (if your system uses the GNU toolchain) using the objdump -d command on the resulting object file or executable file.
The simple answer is that the object code generates a reference to the symbol where my_static_int is allocated (which is typically in the static data segment of your object module).
So the address of the variable is resolved at load time (when it is assigned a real physical address), and the loader fixes up the reference to the variable, filling it in with its address.

C - what is the return value of a semicolon?

im just curious about the following example
int test();
int test(){
// int a = 5;
// int b = a+1;
return ;
int main(){
return 0;
i compiled it with 'gcc -Wall -o semicolon semicolon.c' to create an executable
and 'gcc -Wall -S semicolon.c' to get the assembler code which is:
.file "semicolon.c"
.globl test
.type test, #function
pushl %ebp
movl %esp, %ebp
subl $4, %esp
.size test, .-test
.section .rodata
.string "%u\n"
.globl main
.type main, #function
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
call test
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
.size main, .-main
.ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3"
.section .note.GNU-stack,"",#progbits
since im not such an assembler pro, i only know that printf prints what is in eax
but i dont fully understand what 'movl %eax, 4(%esp)' means which i assume fills eax before calling test
but what is the value then? what means 4(%esp) and what does the value of esp mean?
if i uncomment the lines in test() printf prints 6 - which is written in eax ^^
Your assembly language annotated:
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
subl $4, %esp # Allocate some local space on the stack.
leave # Restore the old frame pointer/stack
Note that nothing in test touches eax.
.size test, .-test
.section .rodata
.string "%u\n"
.globl main
.type main, #function
leal 4(%esp), %ecx # Point past the return address.
andl $-16, %esp # Align the stack.
pushl -4(%ecx) # Push the return address.
pushl %ebp # Save the frame pointer
movl %esp, %ebp # Get the new frame pointer.
pushl %ecx # save the old top of stack.
subl $20, %esp # Allocate some local space (for printf parameters and ?).
call test # Call test.
Note that at this point, nothing has modified eax. Whatever came into main is still here.
movl %eax, 4(%esp) # Save eax as a printf argument.
movl $.LC0, (%esp) # Send the format string.
call printf # Duh.
movl $0, %eax # Return zero from main.
addl $20, %esp # Deallocate local space.
popl %ecx # Restore the old top of stack.
popl %ebp # And the old frame pointer.
leal -4(%ecx), %esp # Fix the stack pointer,
So, what gets printed out is whatever came in to main. As others have pointed out it is undefined: It depends on what the startup code (or the OS) has done to eax previously.
The semicolon has no return value, what you have there is an "empty return", like the one used to return from void functions - so the function doesn't return anything.
This actually causes a warning when compiling:
warning: `return' with no value, in function returning non-void
And I don't see anything placed in eax before calling test.
About 4(%esp), this means taking the value from the stack pointer (esp) + 4. I.e. the one-before-last word on the stack.
The return value of an int function is passed in the EAX register. The test function does not set the EAX register because no return value is given. The result is therefore undefined.
A semicolon indeed has no value.
I think the correct answer is that a return <nothing> for an int function is an error, or at least has undefined behavor. That's why compiling this with -Wall yields
semi.c: In function ‘test’:
semi.c:6: warning: ‘return’ with no value, in function returning non-void
As for what the %4,esp holds... it's a location on the stack where nothing was (intentionally) stored, so it will likely return whatever junk is found at that location. This could be the last expression evaluated to variables in the function (as in your example) or something completely different. This is what "undefined" is all about. :)

How do C compilers implement functions that return large structures?

The return value of a function is usually stored on the stack or in a register. But for a large structure, it has to be on the stack. How much copying has to happen in a real compiler for this code? Or is it optimized away?
For example:
struct Data {
unsigned values[256];
Data createData()
Data data;
// initialize data values...
return data;
(Assuming the function cannot be inlined..)
None; no copies are done.
The address of the caller's Data return value is actually passed as a hidden argument to the function, and the createData function simply writes into the caller's stack frame.
This is known as the named return value optimisation. Also see the c++ faq on this topic.
commercial-grade C++ compilers implement return-by-value in a way that lets them eliminate the overhead, at least in simple cases
When yourCode() calls rbv(), the compiler secretly passes a pointer to the location where rbv() is supposed to construct the "returned" object.
You can demonstrate that this has been done by adding a destructor with a printf to your struct. The destructor should only be called once if this return-by-value optimisation is in operation, otherwise twice.
Also you can check the assembly to see that this happens:
Data createData()
Data data;
// initialize data values...
data.values[5] = 6;
return data;
here's the assembly:
pushl %ebp
movl %esp, %ebp
subl $1032, %esp
movl 8(%ebp), %eax
movl $6, 20(%eax)
ret $4
Curiously, it allocated enough space on the stack for the data item subl $1032, %esp, but note that it takes the first argument on the stack 8(%ebp) as the base address of the object, and then initialises element 6 of that item. Since we didn't specify any arguments to createData, this is curious until you realise this is the secret hidden pointer to the parent's version of Data.
But for a large structure, it has to be on the heap stack.
Indeed so! A large structure declared as a local variable is allocated on the stack. Glad to have that cleared up.
As for avoiding copying, as others have noted:
Most calling conventions deal with "function returning struct" by passing an additional parameter that points the location in the caller's stack frame in which the struct should be placed. This is definitely a matter for the calling convention and not the language.
With this calling convention, it becomes possible for even a relatively simple compiler to notice when a code path is definitely going to return a struct, and for it to fix assignments to that struct's members so that they go directly into the caller's frame and don't have to be copied. The key is for the compiler to notice that all terminating code paths through the function return the same struct variable. If that's the case, the compiler can safely use the space in the caller's frame, eliminating the need for a copy at the point of return.
There are many examples given, but basically
This question does not have any definite answer. it will depend on the compiler.
C does not specify how large structs are returned from a function.
Here's some tests for one particular compiler, gcc 4.1.2 on x86 RHEL 5.4
gcc trivial case, no copying
[00:05:21 1 ~] $ gcc -O2 -S -c t.c
[00:05:23 1 ~] $ cat t.s
.file "t.c"
.p2align 4,,15
.globl createData
.type createData, #function
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
movl $1, 24(%eax)
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
gcc more realistic case , allocate on stack, memcpy to caller
#include <stdlib.h>
struct Data {
unsigned values[256];
struct Data createData()
struct Data data;
int i;
for(i = 0; i < 256 ; i++)
data.values[i] = rand();
return data;
[00:06:08 1 ~] $ gcc -O2 -S -c t.c
[00:06:10 1 ~] $ cat t.s
.file "t.c"
.p2align 4,,15
.globl createData
.type createData, #function
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl $1, %ebx
subl $1036, %esp
movl 8(%ebp), %edi
leal -1036(%ebp), %esi
.p2align 4,,7
call rand
movl %eax, -4(%esi,%ebx,4)
addl $1, %ebx
cmpl $257, %ebx
jne .L2
movl %esi, 4(%esp)
movl %edi, (%esp)
movl $1024, 8(%esp)
call memcpy
addl $1036, %esp
movl %edi, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
gcc 4.4.2### has grown a lot, and does not copy for the above non-trivial case.
.file "t.c"
.p2align 4,,15
.globl createData
.type createData, #function
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl $1, %ebx
subl $1036, %esp
movl 8(%ebp), %edi
leal -1036(%ebp), %esi
.p2align 4,,7
call rand
movl %eax, -4(%esi,%ebx,4)
addl $1, %ebx
cmpl $257, %ebx
jne .L2
movl %esi, 4(%esp)
movl %edi, (%esp)
movl $1024, 8(%esp)
call memcpy
addl $1036, %esp
movl %edi, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret $4
.size createData, .-createData
.ident "GCC: (GNU) 4.1.2 20080704 (Red Hat 4.1.2-46)"
.section .note.GNU-stack,"",#progbits
In addition, VS2008 (compiled the above as C) will reserve struct Data on the stack of createData() and do a rep movsd loop to copy it back to the caller in Debug mode, in Release mode it will move the return value of rand() (%eax) directly back to the caller
typedef struct {
unsigned value[256];
} Data;
Data createData(void) {
Data r;
return r;
Data d = createData();
msvc(6,8,9) and gcc mingw(3.4.5,4.4.0) will generate code like the following pseudocode
void createData(Data* r) {
Data d;
gcc on linux will issue a memcpy() to copy the struct back on the stack of the caller. If the function has internal linkage, more optimizations become available though.
