I'm developing something in an embedded context with Zephyr.
Essentially I'm dealing with a boot-loop caused by a stack overflow. The stack overflow goes away when I change an unused parameter of a function call deep inside my main. To make sure that the problem is not with the inside of the function, I hard-coded its implementation to be return 0;.
The offending line being like such creates a boot loop:
uint8_t port;
ret = foo(&port, NULL, NULL);
But the line missing the de-referenced port has the code run normally:
uint8_t port;
ret = foo(NULL, NULL, NULL);
Mind you, as I've already said, the implementation of foo is hard-coded to return 0. The parameters are at no point used. Furthermore, I'm sure the line is never actually reached at runtime (in this case) as it lives behind some conditionals requiring my interaction to actually go through.
I've started to give up and blame things on faulty memory or ESD damage but when I tried the same code with the same changes on a spare piece of hardware I had laying around the same thing happens. What is it that I'm missing? I genuinely don't know what else I could do to find out why this is happening and how to fix it. I don't have an access to a debugger for this microcontroller (SAMD21) so I'm at a bit of a loss... Any ideas (or at least sympathy)?
When you remove that parameter does it run without any errors or are there other errors? If you are writing to the wrong memory (e.g. memory that was allocated with a size of zero) somewhere in your program, changes to unrelated parts of the program's code, such as changing the size of a struct, or the parameters of a function, could change where a fatal error occurs and what kind of fatal error it is.
Nevermind, I've found the culprit - a simple stack overflow. I was one byte away from it before the addition of the uint8_t port variable declaration into main. The variable when not used as a parameter in foo() was being optimised away by the compiler. Having one fewer byte on the call stack apparently was enough to prevent the overflow.
Solution: increase stack size and be more careful with clogging it up with unnecessary items.
I am trying to understand the buffer overflow exploit and more specifically, how it can be used to run own code - e.g. by starting our own malicious application or anything similar.
While I do understand the idea of the buffer overflow exploit using the gets() function (overwriting the return address with a long enough string and then jumping to the said address), there are a few things I am struggling to understand in real application, those being:
Do I put my own code into the string just behind the return address? If so, how do I know the address to jump to? And if not, where do I jump and where is the actual code located?
Is the actual payload that runs the code my own software that's running and the other program just jumps into it or are all the instructions provided in the payload? Or more specifically, what does the buffer overflow exploit implementation actually look like?
What can I do when the address (or any instruction) contains 0? gets() function stops reading when it reads 0 so how is it possible to get around this problem?
As a homework, I am trying to exploit a very simple program that just asks for an input with gets() (ASLR turned off) and then prints it. While I can find the memory address of the function which calls it and the return, I just can't figure out how to actually implement the exploit.
You understand how changing the return address lets you jump to an arbitrary location.
But as you have correctly identified you don't know where you have loaded the code you want to execute. You just copied it into a local buffer(which was mostly some where on the stack).
But there is something that always points to this stack and it is the stack pointer register. (Lets assume x64 and it would be %rsp).
Assuming your custom code is on the top of the stack. (It could be at an offset but that too can be managed similarly).
Now we need an instruction that
1. Allows us to jump to the esp
2. Is located at a fixed address.
So most binaries use some kind of shared libraries. On windows you have kernel32.dll. In all the programs this library is loaded, it is always mapped at the same address. So you know the exact location of every instruction in this library.
All you have to do is disassemble one such library and find an instruction like
jmp *%rsp // or a sequence of instructions that lets you jump to an offset
Then the address of this instruction is what you will place where the return address is supposed to be.
The function will return then and then jump to the stack (ofcourse you need an executable stack for this). Then it will execute your arbitrary code.
Hope that clears some confusion on how to get the exploit running.
To answer your other questions -
Yes you can place your code in the buffer directly. Or if you can find the exact code you want to execute (again in a shared library), you can simply jump to that.
Yes, gets would stop at \n and 0. But usually you can get away by changing your instructions a bit to write code that doesn't use these bytes at all.
You try different instructions and check the assembled bytes.
I am programming an ATtiny2313 using avrdude and a makefile. I believe the stack pointer is not properly initialised, since when I call a function, the program appears to freeze. I found the following assembly code:
.include "tn2313def.inc"
ldi r16, low(RAMEND) ; Main program start
out SPL,r16 ;Set Stack Pointer to top of RAM
which I think might work, but I don't know how I can incorporate it into the c code that I created. ie. do I need to include a special header file or somehow denote that it is assembly and not c. I am relatively new to programming and I would appreciate any help either as to how to implement this code properly or another way of making my current c code initialise a stack pointer.
Thank you in advance.
Stephen
It really depends on how you've got your makefile configured as to whether the stack pointer will be initialised. If you're using gcc and the normal compile and link options, the linker ensures that some startup code crtX.o is also included in your executable. The linker automatically chooses the correct crtX.o file for your processor and compile options.
Amongst other things, the code in the crtX.o files will clear the bss segment to be all zeros as required by the C standard, configure your stack pointer and provide interrupt vectors in the correct location for those which have not been overridden.
Remember that the ATTiny2313 only has 128 bytes of SRAM. This area must be big enough for any initialised data you have in your program and the stack. Just the process of calling a simple function will use up quite a number of bytes of RAM to save the registers on the stack before calling the function.
So, I'd suggest to do these things:
Use the standard makefile if one is provided by your compiler, it will ensure that the standard startup code is included and that the stack/RAM is set up correctly before main() is called.
Turn on the linker map and symbol file output and verify that you actually have some space free that can be used for the stack.
The Atmel IDE has a reasonable simulator, so try running your code in the simulator. You'll be able to watch stack usage as you are calling the function and location any odd behaviour.
You may just happen to have a stack overflow (which is why you came to stackoverflow.com right?
very quick question for you. When I store some automatic variable in C, asm output is like this: MOV ESP+4,#25h , and I just want to know why can´t compiler calculate that ESP+4 adress itself.
I thought this through, and I really cant find reason for this. I mean, isnt compiler aware of the esp value? It should be. And when using another object file, this should not be problem either, since variables could just be represent by adress and linked later, when all automatic variables are known, and therefore proper adress could be assigned. Thanks.
No, it cannot know the value of esp in advance.
Take for example a recursive function, ie. a function that calls itself. Assume such a function has several parameters that are passed in via the stack. This means that each argument takes some space on the stack, thereby changing the value of the esp register.
Now, when the function is entered, the exact value of esp will depend on how many times the function has called itself previously, and there is no way the compiler could know this at compile time. If you doubt this, take a function such as this:
void foobar(int n)
{
if (rand() % n != 17)
foobar(n + 1);
}
There's no way the compiler would be smart enough in advance to figure out if the function will call itself once more.
If the compiler wanted to determine esp in advance, it would effectively have to create a version of the function for each possible value for esp.
The above explanation only takes into account one function. In a real-world scenario, a program has many functions which interdepend on one another, which results in fairly complex "call graphs". This together with (among other things) unpredicable program logic means the compiler would have to create a huge array of versions of each function, just to optimise on esp -- which clearly doesn't make sense.
P.S.: Now something else. You don't actually need to optimise [esp+N] at all, because it should not take any more CPU time than the simpler [esp]... at least not on Intel Pentium CPUs. You could say that they already contain optimizations for exactly this and even more complicated scenarios. If you're interested in the Intel CPUs, I suggest you look up the documentation for something called the MOD R/M and the SIB byte of a machine instruction, e.g. here for the SIB byte or here or, of course, in Intel's official CPU developer documentation.
No, the compiler is not aware of the value of ESP at runtime - it's the stack pointer. It is potentially different every time the function is called. Perhaps the simplest example to think about is a recursive function - every time it calls itself, the stack gets a little bit deeper to accommodate the local variables for the new call. Every stack frame has its own local variable, every stack frame is at a different position on the stack, and therefore has its own address (in ESP, normally).
The Stack Pointer cannot be calculated at compile time. For a simple example why this is not possible, just think of a recursive function: The same variable has a different address for each call, but it's always the same code that is run.
No, the compiler doesn't know the value ahead of time. In a few extremely basic programs (where there's only one possible "route" from main to any other particular function being called) it could, but I don't know of a compiler that attempts to compute this. If you have any recursion, or a function is called from more than one place, the the stack pointer will have different values depending on where it was called from.
There's not much point to doing so in any case -- since the stack pointer is so heavily used, most CPUs are designed to make indirect addressing from the stack pointer extremely efficient. In fact, it's often more efficient than supplying an absolute address would be.
This is really rather fundamental to the way the stack works. To reason it out for yourself, imagine how you'd implement a recursive function.
I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.