Related
I have a small program running on x64 calling system function with a parameter long enough which means he will be pushed to function on the stack as I understand.
#include <stdlib.h>
int main(void) {
char command[] = "/bin/sh -c whoami";
system(command);
return EXIT_SUCCESS;
}
When I check in GDB what is happening I can confirm that my parameter is on the stack on 2 words.
I wonder how does the CPU know that it needs to read 2 words and not continue after. What delimit the function parameter from the rest ?
I am asking this question as I am working on Buffer Overflow and while I have the same situation on the stack, the CPU does only pick one word (/bin/sh ) instead of the 2 words I would like. Outputing sh: line 1: $'Ћ\310\367\377\177': command not found
How does processor know how much to read from the stack for function parameters (x64)
The CPU does not know. By that, I mean it does not receive an instruction that says "retrieve the next argument from the stack, whatever the appropriate size may be." It receives instructions to retrieve data of a specific size from a specific place, and to operate on that data, or put it in a register, or store it in some other place. Those instructions are generated by the compiler, based on the program source code, and they are part of the program binary.
I wonder how does the CPU know that it needs to read 2 words and not continue after. What delimit the function parameter from the rest ?
Nothing delimits one function parameter from the next -- neither on the stack nor generally. Programs do not (generally) figure out such things on the fly by introspecting the data. Instead, functions require parameters to be set up in a particular way, which is governed by a set of conventions called an "Application Binary Interface" (ABI), and they operate on the assumption that the data indeed are set up that way. If those assumptions turn out to be invalid then more or less anything can happen.
I am asking this question as I am working on Buffer Overflow and while I have the same situation on the stack, the CPU does only pick one word (/bin/sh ) instead of the 2 words I would like.
The number of words the function will consume from the stack and the significance it will attribute to them is characteristic of the function, not (generally) of the data on the stack.
Processors are very very dumb. All of them. This is like asking how do you steer a train...You do not. It just follows the tracks. The processor just follows the bits in front of it, if they are wrong or do something bad then the processor will crash just like a train will derail if the tracks are bad.
The size of a variable is not determined by the processor type, x86, arm, etc. Nor for C is it determined by the language, the size of an int for x86 is not assumed to be one size. Assumptions like that are bad. The compiler author chooses for that compiler for that target. And no reason to assume any two C compilers for the same target processor use the same sizes.
Likewise the compiler author ultimately decides the calling convention, what goes in registers what goes in stack, what order they are in the stack, what registers, etc.
The compiler author chooses also the alignment or not of the stack.
The compiler author chooses to use a stack frame or not or allows the user to choose, but within either choice, with or without still chooses how to use the stack or stack pointer.
The compiler author using their calling convention, their choices for the sizes of variables, etc then as part of the compilation process decide what instructions to use. The instructions should be chosen base on their choices above. So a two byte sized variable should be in the stack based on decisions made by the compilation relative to the stack pointer or stack frame pointer based on compiler choices and possibly user options.
The processor does not know, it simply sucks in bits and does what they say, if the compiler and assembler and linker have done their job, ultimately the programmers responsibility, then the processor will do what it is told, including reading the proper number of bytes for a certain item.
As beaten to death on this site, examining the stack for main() tends to be confusing as there is mysterious padding added, ideally you want to compile this in some other function name and see that. Also compiler options may determine how the code is built, what instructions are used and how much stack if any. Optimization levels. No reason to assume any two compilers will generate the same code from some C source, likewise no reason to assume one compiler will produce the same code based on compiler options.
So where on the stack, how many bytes on the stack, etc is determined by many layers of you the programmer plus compiler, assembler, and linker.
Depends on the calling convention implemented for the function. By specifying none, you let the compiler decide, and it can go creative, sometimes even disappearing with any explicit call for the sake of branch prediction optimization, otherwise you can learn precisely what to expect from numerous sources of documentation that specify how those calling conventions are supposed to work.
If I run a program, just like
#include <stdio.h>
int main(int argc, char *argv[], char *env[]) {
printf("My references are at %p, %p, %p\n", &argc, &argv, &env);
}
We can see that those regions are actually in the stack.
But what else is there? If we ran a loop through all the values in Linux 3.5.3 (for example, until segfault) we can see some weird numbers, and kind of two regions, separated by a bunch of zeros, maybe to try to prevent overwriting the environment variables accidentally.
Anyway, in the first region there must be a lot of numbers, such as all the frames for each function call.
How could we distinguish the end of each frame, where the parameters are, where the canary if the compiler added one, return address, CPU status and such?
Without some knowledge of the overlay, you only see bits, or numbers. While some of the regions are subject to machine specifics, a large number of the details are pretty standard.
If you didn't move too far outside of a nested routine, you are probably looking at the call stack portion of memory. With some generally considered "unsafe" C, you can write up fun functions that access function variables a few "calls" above, even if those variables were not "passed" to the function as written in the source code.
The call stack is a good place to start, as 3rd party libraries must be callable by programs that aren't even written yet. As such, it is fairly standardized.
Stepping outside of your process memory boundaries will give you the dreaded Segmentation violation, as memory fencing will detect an attempt to access non-authorized memory by the process. Malloc does a little more than "just" return a pointer, on systems with memory segmentation features, it also "marks" the memory accessible to that process and checks all memory accesses that the process assignments are not being violated.
If you keep following this path, sooner or later, you'll get an interest in either the kernel or the object format. It's much easier to investigate one way of how things are done with Linux, where the source code is available. Having the source code allows you to not reverse-engineer the data structures by looking at their binaries. When starting out, the hard part will be learning how to find the right headers. Later it will be learning how to poke around and possibly change stuff that under non-tinkering conditions you probably shouldn't be changing.
PS. You might consider this memory "the stack" but after a while, you'll see that really it's just a large slab of accessible memory, with one portion of it being considered the stack...
The contents of the stack are basically:
Whatever the OS passes to the program.
Call frames (also called stack frames, activation areas, ...)
What does the OS pass to the program? A typical *nix will pass the environment, arguments to the program, possibly some auxiliary information, and pointers to them to be passed to main().
In Linux, you'll see:
a NULL
the filename for the program.
environment strings
argument strings (including argv[0])
padding full of zeros
the auxv array, used to pass information from the kernel to the program
pointers to environment strings, ended by a NULL pointer
pointers to argument strings, ended by a NULL pointer
argc
Then, below that are stack frames, which contain:
arguments
the return address
possibly the old value of the frame pointer
possibly a canary
local variables
some padding, for alignment purposes
How do you know which is which in each stack frame? The compiler knows, so it just treats its location in the stack frame appropriately. Debuggers can use annotations for each function in the form of debug info, if available. Otherwise, if there is a frame pointer, you can identify things relative to it: local variables are below the frame pointer, arguments are above the stack pointer. Otherwise, you must use heuristics, things that look like code addresses are probably code addresses, but sometimes this results in incorrect and annoying stack traces.
The content of the stack will vary depending on the architecture ABI, the compiler, and probably various compiler settings and options.
A good place to start is the published ABI for your target architecture, then check that your particular compiler conforms to that standard. Ultimately you could analyse the assembler output of the compiler or observe the instruction level operation in your debugger.
Remember also that a compiler need not initialise the stack, and will certainly not "clear it down", when it has finished with it, so when it is allocated to a process or thread, it might contain any value - even at power-on, SDRAM for example will not contain any specific or predictable value, if the physical RAM address has been previously used by another process since power on or even an earlier called function in the same process, the content will have whatever that process left in it. So just looking at the raw stack does not tell you much.
Commonly a generic stack frame may contain the address that control will jump to when the function returns, the values of all the parameters passed, and the value of all auto local variables in the function. However the ARM ABI for example passes the first four arguments to a function in registers R0 to R3, and holds the return value of the leaf function in the LR register, so it is not as simple in all cases as the "typical" implementation I have suggested.
The details are very dependent on your environment. The operating system generally defines an ABI, but that's in fact only enforced for syscalls.
Each language (and each compiler even if they compile the same language) in fact may do some things differently.
However there is some sort of system-wide convention, at least in the sense of interfacing with dynamically loaded libraries.
Yet, details vary a lot.
A very simple "primer" could be http://kernelnewbies.org/ABI
A very detailed and complete specification you could look at to get an idea of the level of complexity and details that are involved in defining an ABI is "System V Application Binary Interface AMD64 Architecture Processor Supplement" http://www.x86-64.org/documentation/abi.pdf
very quick question for you. When I store some automatic variable in C, asm output is like this: MOV ESP+4,#25h , and I just want to know why can´t compiler calculate that ESP+4 adress itself.
I thought this through, and I really cant find reason for this. I mean, isnt compiler aware of the esp value? It should be. And when using another object file, this should not be problem either, since variables could just be represent by adress and linked later, when all automatic variables are known, and therefore proper adress could be assigned. Thanks.
No, it cannot know the value of esp in advance.
Take for example a recursive function, ie. a function that calls itself. Assume such a function has several parameters that are passed in via the stack. This means that each argument takes some space on the stack, thereby changing the value of the esp register.
Now, when the function is entered, the exact value of esp will depend on how many times the function has called itself previously, and there is no way the compiler could know this at compile time. If you doubt this, take a function such as this:
void foobar(int n)
{
if (rand() % n != 17)
foobar(n + 1);
}
There's no way the compiler would be smart enough in advance to figure out if the function will call itself once more.
If the compiler wanted to determine esp in advance, it would effectively have to create a version of the function for each possible value for esp.
The above explanation only takes into account one function. In a real-world scenario, a program has many functions which interdepend on one another, which results in fairly complex "call graphs". This together with (among other things) unpredicable program logic means the compiler would have to create a huge array of versions of each function, just to optimise on esp -- which clearly doesn't make sense.
P.S.: Now something else. You don't actually need to optimise [esp+N] at all, because it should not take any more CPU time than the simpler [esp]... at least not on Intel Pentium CPUs. You could say that they already contain optimizations for exactly this and even more complicated scenarios. If you're interested in the Intel CPUs, I suggest you look up the documentation for something called the MOD R/M and the SIB byte of a machine instruction, e.g. here for the SIB byte or here or, of course, in Intel's official CPU developer documentation.
No, the compiler is not aware of the value of ESP at runtime - it's the stack pointer. It is potentially different every time the function is called. Perhaps the simplest example to think about is a recursive function - every time it calls itself, the stack gets a little bit deeper to accommodate the local variables for the new call. Every stack frame has its own local variable, every stack frame is at a different position on the stack, and therefore has its own address (in ESP, normally).
The Stack Pointer cannot be calculated at compile time. For a simple example why this is not possible, just think of a recursive function: The same variable has a different address for each call, but it's always the same code that is run.
No, the compiler doesn't know the value ahead of time. In a few extremely basic programs (where there's only one possible "route" from main to any other particular function being called) it could, but I don't know of a compiler that attempts to compute this. If you have any recursion, or a function is called from more than one place, the the stack pointer will have different values depending on where it was called from.
There's not much point to doing so in any case -- since the stack pointer is so heavily used, most CPUs are designed to make indirect addressing from the stack pointer extremely efficient. In fact, it's often more efficient than supplying an absolute address would be.
This is really rather fundamental to the way the stack works. To reason it out for yourself, imagine how you'd implement a recursive function.
for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example:
1: Stack local scope memory allocation:
So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster.
2: Variable allocation duplicity:
When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?
How would you implement recursive functions? What you are describing is equivalent to using global variables everywhere.
That's just one problem. How can you link to a precompiled object file and be sure it won't corrupt the memory of your procedures?
Because C (and most other languages) support recursion, so a function can call itself, and each call of the function needs separate copies of any local variables. Also, on most current processors, your way would actually be slower -- indirect addressing is so common that processors are optimized for it.
You seem to want the behavior of C (or at least that C allows) for string literals. There are good and bad points to this, such as the fact that even though you've defined a "variable", you can't actually modify its contents (without affecting other variables that are pointing at the same location).
The answers to your questions are mostly wrapped up in the different semantics of different storage classes
Google "data segment"
Think about the difference in behavior between global and local variables.
Think about how constant and non-constant variables have different requirements when functions are called repeatedly (or as Mehrdad says, recursively)
Think about the difference between static and non static automatic variables again in the context of multiple or recursive calls.
Since you are comparing assembler and c (which are very close together from an architectural standpoint), I'm inclined to say that you're describing micro-optimization, which is meaningless unless you profile the code to see if it performs better.
In general, programming languages are evolving towards a more declarative style (i.e. telling the computer what you want done, rather than how you want it done). When you program in an imperative language (like assembly or c), you specify in extreme detail how you want the problem solved. This gives the compiler little room to make optimization decisions on your behalf.
However, as the languages become more declarative, the compilers are getting smarter, because we are giving them the room they need to make more intelligent performance optimizations.
If every function would put its first variable at offset 0 and so on then you would have to change the memory mapping each time you enter a function (you could not allocate all variables to unique addresses if you want recursion). This is doable, but with current hardware it's very slow. Furthermore, the address translation performed by the virtual memory is not free either, it's actually quite complicated to implement this efficiently.
Addressing off ebp (or any other register) costs having a mux (to select the register) and an adder (to add the offset to the register). The time taken for this can often be overlapped with other operations.
If you want to be able to modify the static value you have to copy it to the stack. If you don't (saying it's 'const') then a good C compiler will no copy it to the stack.
I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.