Where are the garbage value stored, and for what purpose?
C chooses to not initialize variables to some automatic value for efficiency reasons. In order to initialize this data, instructions must be added. Here's an example:
int main(int argc, const char *argv[])
{
int x;
return x;
}
generates:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl -4(%ebp), %eax
leave
ret
While this code:
int main(int argc, const char *argv[])
{
int x=1;
return x;
}
generates:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
movl -4(%ebp), %eax
leave
ret
As you can see, a full extra instruction is used to move 1 into x. This used to matter, and still does on embedded systems.
Garbage values are not really stored anywhere. In fact, garbage values do not really exist, as far as the abstract language is concerned.
You see, in order to generate the most efficient code it is not sufficient for the compiler to operate in terms of lifetimes of objects (variables). In order to generate the most efficient code, the compiler must operate at much finer level: it must "think" in terms of lifetimes of values. This is absolutely necessary in order to perform efficient scheduling of the CPU registers, for one example.
The abstract language has no such concept as "lifetime of value". However, the language authors recognize the importance of that concept to the optimizing compilers. In order to give the compilers enough freedom to perform efficient optimizations, the language is intentionally specified so that it doesn't interfere with important optimizations. This is where the "garbage values" come into picture. The language does not state that garbage values are stored anywhere, the language does not guarantee that the garbage values are stable (i.e. repeated attempts to read the same uninitialized variable might easily result in different "garbage values"). This is done specifically to allow optimizing compilers to implement the vital concept of "lifetime of value" and thus perform more efficient variable manipulation than would be dictated by the language concept of "object lifetime".
IIRC, Thompson or Richie did an interview some years ago where they said the language definition purposely left things vague in some places so the implementers on specific platforms had leeway to do things that made sense (cycles, memory, etc) on that platform. Sorry I don't have a reference to link to.
Why does the C standard leave use of indeterminate variables undefined?
It does not :-) :
for local variables, it says undefined behavior, which means that anything (e.g. segfault, erasing your hard disk) is legal: (Why) is using an uninitialized variable undefined behavior?
for global variables, it zeros them: What happens to a declared, uninitialized variable in C? Does it have a value?
C was designed to be a relatively low-level language so that it could be used for writing, well, low-level stuff like operating systems. (in fact, it was designed so that UNIX could be written in C) You can simply think of it as assembly code with readable syntax and higher-level constructs. For this reason, C (minus optimizations) does exactly what you ask it to do, nothing more, nothing less.
When you write int x;, the compiler simply allocates memory for the integer. You never asked it to store anything there, so whatever was in that location when your program started stays as such. Most often, it turns out that the pre-existing value is "garbage".
Sometimes, an external program (for eg. a device driver) may write into some of your variables, so it is unnecessary to add another instruction to initialize such variables.
Related
I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.
I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.
I would like to divide a stack to stack-frames by looking on the raw data on the stack. I thought to do so by finding a "linked list" of saved EBP pointers.
Can I assume that a (standard and commonly used) C compiler (e.g. gcc) will always update and save EBP on a function call in the function prologue?
pushl %ebp
movl %esp, %ebp
Or are there cases where some compilers might skip that part for functions that don't get any parameters and don't have local variables?
The x86 calling conventions and the Wiki article on function prologue don't help much with that.
Is there any better method to divide a stack to stack frames just by looking on its raw data?
Thanks!
Some versions of gcc have a -fomit-frame-pointer optimization option. If memory serves, it can be used even with parameters/local variables (they index directly off of ESP instead of using EBP). Unless I'm badly mistaken, MS VC++ can do roughly the same.
Offhand, I'm not sure of a way that's anywhere close to universally applicable. If you have code with debug info, it's usually pretty easy -- otherwise though...
Even with the framepointer optimized out, stackframes are often distinguishable by looking through stack memory for saved return addresses instead. Remember that a function call sequence in x86 always consists of:
call someFunc ; pushes return address (instr. following `call`)
...
someFunc:
push EBP ; if framepointer is used
mov EBP, ESP ; if framepointer is used
push <nonvolatile regs>
...
so your stack will always - even if the framepointers are missing - have return addresses in there.
How do you recognize a return address ?
to start with, on x86, instruction have different lengths. That means return addresses - unlike other pointers (!) - tend to be misaligned values. Statistically 3/4 of them end not at a multiple of four.
Any misaligned pointer is a good candidate for a return address.
then, remember that call instructions on x86 have specific opcode formats; read a few bytes before the return address and check if you find a call opcode there (99% most of the time, it's five bytes back for a direct call, and three bytes back for a call through a register). If so, you've found a return address.
This is also a way to distinguish C++ vtables from return addresses by the way - vtable entrypoints you'll find on the stack, but looking "back" from those addresses you don't find call instructions.
With that method, you can get candidates for the call sequence out of the stack even without having symbols, framesize debugging information or anything.
The details of how to piece the actual call sequence together from those candidates are less straightforward though, you need a disassembler and some heuristics to trace potential call flows from the lowest-found return address all the way up to the last known program location. Maybe one day I'll blog about it ;-) though at this point I'd rather say that the margin of a stackoverflow posting is too small to contain this ...
I want to know how passing arguments to functions in C works. Where are the values being stored and how and they retrieved? How does variadic argument passing work? Also since it's related: what about return values?
I have a basic understanding of CPU registers and assembler, but not enough that I thoroughly understand the ASM that GCC spits back at me. Some simple annotated examples would be much appreciated.
Considering this code:
int foo (int a, int b) {
return a + b;
}
int main (void) {
foo(3, 5);
return 0;
}
Compiling it with gcc foo.c -S gives the assembly output:
foo:
pushl %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
movl 8(%ebp), %edx
leal (%edx,%eax), %eax
popl %ebp
ret
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $5, 4(%esp)
movl $3, (%esp)
call foo
movl $0, %eax
leave
ret
So basically the caller (in this case main) first allocates 8 bytes on the stack to accomodate the two arguments, then puts the two arguments on the stack at the corresponding offsets (4 and 0), and then the call instruction is issued which transfers the control to the foo routine. The foo routine reads its arguments from the corresponding offsets at the stack, restores it, and puts its return value in the eax register so it's available to the caller.
That is platform specific and part of the "ABI". In fact, some compilers even allow you to choose between different conventions.
Microsoft's Visual Studio, for example, offers the __fastcall calling convention, which uses registers. Other platforms or calling conventions use the stack exclusively.
Variadic arguments work in a very similar way - they are passed via registers or stack. In case of registers, they are usually in ascending order, based on type. If you have something like (int a, int b, float c, int d), a PowerPC ABI might put a in r3, b in r4, d in r5, and c in fp1 (I forgot where float registers start, but you get the idea).
Return values, again, work the same way.
Unfortunately, I don't have many examples, most of my assembly is in PowerPC, and all you see in the assembly is the code going straight for r3, r4, r5, and placing the return value in r3 as well.
Your questions are more than anybody could reasonably try to answer in a SO post, not to mention that it's implementation defined as well.
However, if you're interested in the x86 answer might I suggest you watch this Stanford CS107 Lecture titled Programming Paradigms where all the answers to the questions you posed will be explained in great detail (and quite eloquently) in the first 6-8 lectures.
It depends on your compiler, the target architecture and OS you’re compiling for, and whether your compiler supports non-standard extensions that change the calling convention. But there are some commonalities.
The C calling convention is usually established by the vendor of the operating system, because they need to decide what convention the system libraries use.
More recent CPUs (such as ARM or PowerPC) tend to have their calling conventions defined by the CPU vendor and compatible across different operating systems. x86 is an exception to this: different systems use different calling conventions. There used to be a lot more calling conventions for the 16-bit 8086 and 32-bit 80386 than there are for x86_64 (although even that is not down to one). 32-bit x86 Windows programs sometimes use multiple calling conventions within the same program.
Some observations:
An example of an operating system that supports several different ABIs with different calling conventions simultaneously, some of which follow the same conventions as other OSes for the same architecture, is Linux for x86_64. This can host three different major ABIs (i386, x32 and x86_64), two of which are the same as other operating systems for the same CPU, and several variants.
An exception to the rule that there's one system calling convention used for everything is 16- and 32-bit versions of MS Windows, which inherited some of the proliferation of calling conventions from MS-DOS. The Windows C API uses a different calling convention (STDCALL, originally FAR PASCAL) than the “C” calling convention for the same platform, and also supports FORTRAN and FASTCALL conventions. All four come in NEAR and FAR variants on 16-bit OSes. Nearly all Windows programs therefore use at least two different conventions in the same program.
Architectures with a lot of registers, including classic RISC and nearly all modern ISAs, use several of those registers to pass and return function arguments.
Architectures with few or no general-purpose registers often pass arguments on the stack, pointed to by a stack pointer. CISC architectures often have instructions to call and return which store the return address on the stack. (RISC architectures typically store the return address in a "link register", which the callee can save/restore manually if it's not a leaf function.)
A common variant is for tail calls, functions whose return value is also the return value of the caller, to jump to the next function (so it returns to our parent function) instead of calling it and then returning after it returns. Placing args in the right places has to account for the return address already being on the stack, where a call instruction would place it.
This is especially true of tail-recursive calls, which have exactly the same stack frame on each invocation. A tail-recursive call is typically equivalent to a loop: update a few registers that changed, then jump back to the entry point. They do not need to create a new stack frame, or have their own return address: you can simply update the caller’s stack frame and use its return address as the tail call’s. i.e. tail-recursion easily optimizes into a loop.
Some architectures with only a few registers nevertheless defined an alternative calling convention that could pass one or two arguments in registers. This was FASTCALL on MS-DOS and Windows.
A few older ISAs, such as SPARC, had a special bank of “windowed” registers, so that every function has its own bank of input and output registers, and when it made a function call, the caller’s outputs became the callee’s inputs, and the reverse when it came time to return a value. Modern superscalar designs consider this more trouble than it’s worth.
A few very old architectures used self-modifying code in their calling convention, and the first edition of The Art of Computer Programming followed this model for its abstract language. It no longer works on most modern CPUs, which have instruction caches.
A few other very old architectures had no stack and generally could not call the same function again, re-entering it, until it returned.
A function with a lot of arguments almost always puts most of them onto the stack.
C functions that put arguments on the stack almost have to push them in reverse order and have the caller clean up the stack. The called function might not even know exactly how many arguments are on the stack! That is, if you call printf("%d\n", x); the compiler will push x, then the format string, then the return address, onto the stack. This guarantees that the first argument is at a known offset from the stack pointer and <varargs.h> has the information it needs to work.
Most other languages, and therefore some operating systems that C compilers support, do it the other way around: arguments are pushed from left to right. The function being called usually cleans up its own stack frame. This used to be called the PASCAL convention on MS-DOS, and survives as the STDCALL convention on Windows. It cannot support variadic functions. (https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions)
Fortran and a few other language historically passed all arguments by reference, which translates to C as pointer arguments. Compilers that might need to interface with these other languages often support these foreign calling conventions.
Because a major source of bugs was “smashing the stack,” many compilers now have a way to add canary values (which, like a canary in a coal mine, warn you that something dangerous is going on if anything happens to them) and other means of detecting when code tampers with the stack frame.
Another form of variation across different platforms is whether the stack frame will contain all the information it needs for a debugger or exception-handler to backtrace, or whether that info will be in separate metadata (or not present at all) allowing simplification of function prologue/epilogue (-fomit-frame-pointer).
You can get cross-compilers to emit code using different calling conventions, and compare them, with switches such as -S -target (on clang).
Basically, C passes arguments by pushing them on the stack. For pointer types, the pointer is pushed on the stack.
One things about C is that the caller restores the stack rather the function being called. This way, the number of arguments can vary and the called function doesn't need to know ahead of time how many arguments will be passed.
Return values are returned in the AX register, or variations thereof.
pay attention to this code :
#include <stdio.h>
void a(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
int main()
{
a(1,2,3);
}
after that :
gcc -S a.c
that command shows our source code in assembly.
now we can see in the main function, we never use "push" command to push the arguments of
the a function into the stack. and it used "movel" instead of that
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call a
leave
why does it happen?
what's difference between them?
Here is what the gcc manual has to say about it:
-mpush-args
-mno-push-args
Use PUSH operations to store outgoing parameters. This method is shorter and usually
equally fast as method using SUB/MOV operations and is enabled by default.
In some cases disabling it may improve performance because of improved scheduling
and reduced dependencies.
-maccumulate-outgoing-args
If enabled, the maximum amount of space required for outgoing arguments will be
computed in the function prologue. This is faster on most modern CPUs because of
reduced dependencies, improved scheduling and reduced stack usage when preferred
stack boundary is not equal to 2. The drawback is a notable increase in code size.
This switch implies -mno-push-args.
Apparently -maccumulate-outgoing-args is enabled by default, overriding -mpush-args. Explicitly compiling with -mno-accumulate-outgoing-args does revert to the PUSH method, here.
2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args (and using push) eventually became the default for -mtune=generic in Jan 2014.
That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.
"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.
gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like x *= n is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, or n is not a constant (and thus using loops with shl-add-sub would come costlier).
In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.
Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.
I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607