Comprehensive and clear NOP sled technique explanation needed

Comprehensive and clear NOP sled technique explanation needed - c

I have browsed the internet a lot and still could not understand the way it works
No success with this link either:
How does a NOP sled work?
Okay, let's say we have a buffer char a[8]; in the function foo()
and the stack frame for foo() looks like this (32-bit):
Now, what we want to do, is to overwrite 'return' value, saved on the stack when foo() was called
They say that the problem is, we cannot predict the location of saved 'return' value
But why? For example, if function foo() has the buffer defined prior to all other local variables, like in the picture above, then we always know that we need to fill 8 bytes allocated for buffer, then 4 bytes of saved EBP and then we got 'return'
Alternatively, if there are other local variables defined before the buffer, say 2 ints, then we consider 8 bytes allocated for buffer, then 8 bytes of those 2 local ints, then 4 bytes of saved EBP and then we got 'return'
We always know how far is the saved 'return' from our overflown buffer, don't we ? :/
If we don't, please explain. That's my question 1.
My question 2 is, okay, let's assume that we don't know where the value for 'return' is saved. Then if we have a large array of NOPs coming before the shellcode, we have a high chance of writing NOP (i.e. 0x90) value into location of 'return', hence overwriting the original value
Now, when the function foo() returns, the value of 0x90 is loaded into EIP (instruction pointer) and instead of resuming its normal flow, the program is going to be executed from address 0x90, isn't it?
I think I've misunderstood something here
Please, don't close my question, even though there is a similar question that I've provided the link to, at the very beginning,
I guess this question will be far more comprehensive and clearer

First we assume there is no address space randomization and the stack is executable.
We always know how far is the saved 'return' from our overflown buffer, don't we ? :/
No, we don't know. There is an unspecified number of padding bytes between the local variables and the saved EBP and the return address. This number of bytes may depend on the compiler version.
Then if we have a large array of NOPs coming before the shellcode, we have a high chance of writing NOP (i.e. 0x90) value into location of 'return', hence overwriting the original value
The idea is to have all NOP values after the return address. This way we can overwrite the return address with an "approximate" address of the shellcode in the stack. The exact shellcode address may depend on the OS version but also on program arguments and the size and number of the environment variables because they are in the stack address space.

Related

overcome stack randomization by brute force

I'm reading a book that shows how buffer overflow attacks and one technique to stop it is called Stack Randomization, below is quoted from the book:
a persistent attacker can overcome randomization by brute force, repeatedly attempting attacks with different addresses. A common trick is to include a long sequence of nop (pronounced “no op,” short for “no operation”) instructions before the actual exploit code. Executing this instruction has no effect, other than incrementing the program counter to the next instruction. As long as the attacker can guess an address somewhere within this sequence, the program will run through the sequence and then hit the exploit code. If we set up a 256-byte(28) nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses
I understand the first part, but don't get the second part about enumerating starting addresses. For example, %rsp points to the current start address as picture below shows (only show 8 bytes instead of 256 bytes for simplicity)
I think what the author mean is, try and guess different address to the stack memory where the %rsp is pointing to. And the padding between return address and %rsp are all nop, and then overwrite the return address with the guessed address which is highly likely points to part of padding(nop). But since Stack Randomization allocats a random amount of space between 0 and n bytes on the stack at the start of a program, so we can only say the probability of successful attack is 215/223 = 0.78%, how can we say try 32,768(a fixed number) addresses then it will be cracked? it is the same as toss a coin, you can only say the probablity of getiing a head is 0.5, you cannot say you will get a head in the second round, becuase you might get two tails

Stack Randomization allocats a random amount of space between 0 and n bytes on the stack at the start of a program
No, it doesn't allocate. It randomizes where in virtual address space the stack is mapped, leaving other pages unmapped. This is with page granularity.
Guessing wrong will typically result in the victim program segfaulting (or whatever the OS does on an invalid page fault). This is noisy and obvious to any intrusion-detection. And if the program does eventually restart so you can try again, its stack address will be re-randomized, as you suggest.
Wrong guesses that land in valid memory but not your NOP sled will also typically crash soon, on an invalid instruction or something that decodes to an invalid memory access.
So yes, you're right, you can't just enumerate the address space, it's only a probabilistic thing. Trying enough times will mean it's likely you succeed, not guaranteed. And trying to enumerate the possible ASLR entropy isn't particularly useful, or any more likely to succeed sooner than guessing the same page every time I think.
But trying different offsets within a single page is useful because OS-driven stack ASLR only has page granularity.
There is some amount of stack space used by env vars and command line args which will vary from system to system, but tend to be constant across invocations of the same program, I think. Unless the vulnerable function is reached from different call chains, or there's a variable-sized stack allocation in a parent function, in which case the buffer's offset within page could vary every run.
Although of course most processes these days run with non-executable stacks, making direct code-injection impossible for stack buffers. A ROP attack (injecting a return address to an existing sequence of bytes that decode to something interesting) is possible if there are any known static data or code addresses. If not, you have to deal with ASLR of code/data, but you don't get to inject NOP sleds.

Why do all C functions converted to assembler start and end with some identical operations? [duplicate]

I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.
I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.
P.S : I just want some general view, not too detailed.

There are lots of resources out there that explain this:
Function prologue (Wikipedia)
x86 Disassembly/Calling Conventions (WikiBooks)
Considerations for Writing Prolog/Epilog Code (MSDN)
to name a few.
Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:
Keeping track of where to return to, when calling a function
Storage of local variables in the context of a function call
Passing arguments from calling function to callee.
The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)
The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)
Then the prologue is executed:
push ebp ; Save the stack-frame base pointer (of the calling function).
mov ebp, esp ; Set the stack-frame base pointer to be the current
; location on the stack.
sub esp, N ; Grow the stack by N bytes to reserve space for local variables
At this point, we have:
...
ebp + 4: Return address
ebp + 0: Calling function's old ebp value
ebp - 4: (local variables)
...
The epilog:
mov esp, ebp ; Put the stack pointer back where it was when this function
; was called.
pop ebp ; Restore the calling function's stack frame.
ret ; Return to the calling function.

C Function Call Conventions and the Stack explains well the concept of a call stack
Function prologue briefly explains the assembly code and the hows and whys.
The gen on function perilogues

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.
Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.
I will quote the below from the accepted answer & try to extend it's explanation.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep
track of the function's stack frame. The esp register is used by the
processor to point to the most recent addition (the top value) on the
stack.
The call instruction does two things: First it pushes the return
address onto the stack, then it jumps to the function being called.
Immediately after the call, esp points to the return address on the
stack.
The last line in the quote above says immediately after the call, esp points to the return address on the stack.
Why's that?
So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below
So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).
Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.
Problem 1
So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.
There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.
So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:
So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.
Problem 2
So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?
So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:
The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.
With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :
first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).
Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.
Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.
It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.
And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.
Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).
Prologue: The structure of Prologue is look like:
push ebp
mov esp,ebp
Epilogue: The structure of Prologue is look like:
leave
ret
More in detail : what is Prologue and Epilogue

Understanding Format String Exploits

I'm learning various exploits and I can't quite get my head around Format String exploits. I've got a fairly simple program set up in an environment that allows the exploit.
int woah(char *arg){
char buf[200];
snprintf(buf, sizeof buf, arg);
return 0;
}
I'm able to control the arg being passed into the function which will be how the attack will happen with the end result of the program running my shellcode and giving me root. Making the program crash is easy, just feed it "%s%s" and it segfaults. We want to do more than that so we feed it something like "AAAA%x%x%x%x%x%x%x". Looking at the program in gdb we look at the buffer right after the snprinf and we can see:
"AAAA849541414141353934....blah blah blah"
That's good! We can see see the A's on the stack as well as the 41s which is A in hex. But then what comes next? I get that the general idea here is to overwrite the instruction pointer with four bytes by having the address at the start of our string that we feed in.....and then somewhere along the line we have it pointing to our shellcode.
How would I find the address of the seip/return address to overwrite?

When snprintf() is called, a stack frame - memory region - is created to execute it's function statements. However, before this happens, the compiler needs to know the previous caller of the function - return point address. This address is included in the stack frame so when the stack frame unwinds, that is the function is finishing up its work, it has go back to that address so the program can continue to run. What you are trying to do is overwrite this address with your shellcode address. Research more on stack frames, ESP, EBP, EIP to get the idea.

buffer overflow exploits - Why is the shellcode put before the return address

The code I'm reffering to is found here: Link to code
I read that the buffer overflow exploit uses a buffer that looks something like this:
| NOP SLED | SHELLCODE | REPEATED RETURN ADDRESS |
From what I understand the exploit happens when the buffer is put onto the stack as a function parameter and overwrites the function's return address.
I also understand that the repeated return address points to the NOP sled in the same buffer on the stack.
What I don't understand are the following:
Why does the return address have to point to the shellcode in the same buffer? Why not have the reapeated return addresses point to another part of the memory where the NOP sled and the Shellcode can be found?
How is the return address on the buffer perfectly aligned with the original one so the "ret" command will read the correct address and not read it from the middle for example.

The return address doesn't have to point to code in the same buffer, it is just often easier to do it this way. If you can put the shell code in and return address into the same buffer then this is the simplest. If the buffer that can be overflowed is too small to fit the shell code, it is feasible to put the shell code into another buffer and then jump to that when the vulnerable buffer is overflowed.
Also, protections such as Data Execution Prevention or (NX) prevent code being executed from a stack. In this case, techniques such as Return-Oriented Programming can be used to circumvent DEP. This technique involves using legitimate, executable code segments to run code the attacker wants to.
This can be tricky and may require some fiddling around with the payload. Usually the start of a buffer is at an address that is word aligned. In this case, ensuring the return address is correctly aligned means writing a buffer that is a multiple of the CPU word (4 bytes for 32-bit machines, 8 bytes for 64-bit). If the buffer is not word aligned, then an attacker may just experiment by adding or removing byte at a time until he thinks it is.
The reason it is simpler to do everything in one buffer is because not much will change between the injection of the shell code and the jumping to the newly modified return address. At the point of attack, it is very unlikely that an attacker can reference memory of another process, so we must look at buffers in process.
Putting shell code into a different buffer requires the attacker to understand how long the buffer will stay in place. Do different function calls cause one of the buffers to be deallocated? Is one of the buffers on the heap instead of the stack? So long as your single buffer is large enough, it is much simpler to put your NOP sled and shell code near the start and then just fill the rest with the return address. Compared to finding one buffer to populate with shell code and another to populate with the address of the previous buffer. Some shell code may also reference the stack pointer which means it needs to be set correctly.

attacking the stack

There is a binary file i am working on . It has a function with the address starting at 123. I need to get my code to execute this function.
Binary file accepts a byte array of size 'n' and does not check for bounds. Entire task is actually to overflow the buffer and cause bad things to happen.
Again, the job is to call address 123 and get it to execute. I was under impression that if a buffer size is , say "4", and i pass 9 characters .., 5 characters will be placed on the stack and executed. (is it true?)
Additionally, in order for me to get to address to be executed, i'd like to say "call 123". From what i understand "call" is "e8", no?
This problem is a bit confusing for me. If someone could help me better understand it, i would very much appreciate it
(Yes, this is a homework question)

I strongly recommend that you read Smashing the Stack for Fun and Profit. It describes in great detail exactly the steps you need to do to do this.

The stack does not contain code, but it does contain the return address for the function. Typical stack structure is:
<stack data> <old frame pointer> <return address>
<old frame pointer> is omitted sometimes and I think would have to for this, so all you have to provide is the data to fill the array then 123.

A common strategy is to find a call ESP opcode in a fixed memory address, and overwrite return address with this address. In this way the execution will continue on the stack. It will work if there is no DEP activated (or supported).
In your case you could also look for an abs_jump 123 in memory.
EDIT:
#Loren It would work only if stack can be executed. call esp has only two bytes opcode, so there is high probability to find it somewhere in memory. The second approach does not execute code from stack, but requires a 5 byte opcode, that is very unlikely to be found.