I'm trying to understand the basics of the addressing in the PE files, and i made a simple application with a couple of functions that call malloc linked statically against msvcr110 library. So i took my produced executable opened it in the ida pro, and found the offset of the malloc function which is not imported, added the base address and tried to call it like so:
HMODULE hCurrentModule = GetModuleHandle(NULL); // get current module base addres
DWORD_PTR hMallocAddr = (0x0048AD60 + (DWORD_PTR)hCurrentModule);
char *pointer;
__asm //calling malloc
{
push 80
mov eax,dword ptr[static_addr]
call eax
add esp,2
mov [pointer],eax
}
I then checked re-builded programm in IDA pro to make sure that the malloc offset remains the same and it's still the 0x0048AD60. So the problem is the offset+hCurrentModule gives me incorrect address, and crash after i call this address. For example the result of mine hMallocAddr is 0x0186AD60 but in the MSVC debug session in the disassembly window malloc address is at 0x0146AD60. What is wrong here?
0x0048AD60 is not the offset of malloc but the actual address of the function when the EXE is loaded at its default load address of 0x00400000. Subtract this value to get the offset from the start of the image.
I see one thing that I don't understand, the first instruction; you push a value, but never pop it. When you add 2 to esp, are you trying to fix the stack ? Could the compiler be "helping" you to optimize that as an 8 bit value ?
No guarantee, but those are the things I see from a first glance; but again, I'm not there and can't see the debug screen
{
push 80 ;Where do you pop this ?
mov eax,dword ptr[static_addr]
call eax
add esp,2 ;Is this the "pop" ? Possible bug, is "80" a 16 bit value ?
mov [pointer],eax
}
Along this same line, I'm not totally certain how your app is structured, but are you safe in using Eax without pushing before and popping afterward ? No clue if that makes a difference, it's just something from a cursory look at the code.
Related
I want to understand how my data String ends up in rdx. In my mind the mov instruction puts data found at address into the target. So the content from rbp-0x28 is put into rdx. I checked whats in rbp-0x28 and it is not the data string ('AAAAAAA'). If, however, I let the command execute with ni then rdx contains the string. I dont know how the String ends up in rdx as it is not contained in rbp-0x28 beforehand. I know that my data is contained in 0x7fffffffe58f but Im not sure how or when its loaded into rdx. Any help is greatly appreciated!
This depends a lot on which compiler or debugger you're using as well as the architecture and calling convention. I did run your code with Apple's Clang compiler and lldb and got the expected results. There are minior variations between my output and your output but it's relatively easily to follow. Since you only posted partial output of your functions debug at offset+0x12 I'll assume that prior whichever register register held the first argument to the function call (in my case RDI) moved the pointer into [rbp-0x28]
This was my output.
mov rsi, qword ptr[rbp-0x30] is the equivellent of your mov rdx,[rbp-0x28] I think you're under Microsoft's x64 ABI calling convention so your first argument is passed through rcx. But prior to that instruction it's mov [rbp-0x30], rdi which I believe in your case will be mov [rbp-0x28],rcx
In the next instruction mov rdi,rcx I breakpointed again. Here I read the contents rsi which in your case would be rdx. It printed rsi = 0x00007ffeefbff94a
At that specific memory address I got the results 'AAAAAAA' Next I read the register rbp and printed rbp = 0x00007ffeefbff740 Then I read the memory address of 0x0x00007ffeefbff740-0x30 (in your case it would be -0x28) which is 0x0x7ffeefbff710 and here it was the same address stored in rsi
0x7ffeefbff94a (Little endian). Which we know points to the string 'AAAAAAA' So I'm going to assume what you're expecting at RBP-0x28 is the string itself. It should be the address which holds a pointer to the string. Also make sure to do your offsets correctly. Follow these steps:
Breakpoint at lea rax,[rbp-0x20]
Check the value of rdx, view the memory at that address and it should give you the string.
Then check the value of rbp. Subtract 0x28 from it. View the memory at the offset.
This should give you the value of rdx. Which should in turn point to the string you're looking for.
My program redirects a function to another function by writing a jmp instruction to the first few bytes of the function (only i386). It works like expected but it means that I can't call the original function anymore, because it will always jump to the new one.
There are two possible workarounds I could think of:
Create a new function, which overwrites the jmp instruction of the target function and call it. Afterwards the function writes back the jmp instruction.
But I'm not sure how to pass the arguments since there can be any number of them. And I wonder if the target function can jmp somewhere else and skip writing back the jmp instruction (like throw catch?).
Create a new function which executes the code I have overwritten with the jmp instruction. But I can't be sure that the overwritten data is a complete instruction. I'd have to know how many bytes I have to copy for a complete instructions.
So, finally, my questions:
Is there another way I didn't think of?
How do I find the size of an instruction? I already looked at binutils and found this but I don't know how to interpret it.
Here is a sample:
mov, 2, 0xa0, None, 1, Cpu64, D|W|CheckRegSize|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
the 2nd column shows the number of operands (2) and the last column has information about the operands, seperated by a comma
I also found this question which is pretty much the same but I can't be sure that the 7 bytes contain a whole instruction.
Writing a Trampoline Function
Any help is appreciated! Thanks.
Sebastian, you can use the exe_load_symbols() function in hotpatch to get a list of the symbols and their location in the existing exe and then see if you can overwrite that in memory. I have not tried it yet. You may be able to do it with the LD_PRELOAD environment variable as well instead of hotpatch.
--Vikas
How about something like this:
Let's say this is the original function:
Instruction1
Instruction2
Instruction3
...
RET
convert it to this:
JMP new_stuff
old:
Instruction2
Instruction3
...
RET
...
new_stuff:
CMP call_my_function,0
JNZ my_function
Instruction1
JMP old
my_function:
...
Of course you'd have to take the size of the original instructions into account (you could find that out by disassembling with objdump, for example) so that the first JMP fits perfectly (pad with NOPs if the JMP is shorter than the original instruction(s)).
I have an application which creates .text segment dumps of win32 processes. Then it divides the code on basic blocks. Basic block is a set of instructions which are executed always one after another (jumps are always the last instructions of such basic blocks). Here is an example:
Basic block 1
mov ecx, dword ptr [ecx]
test ecx, ecx
je 00401013h
Basic block 2
mov eax, dword ptr [ecx]
call dword ptr [eax+08h]
Basic block 3
test eax, eax
je 0040100Ah
Basic block 4
mov edx, dword ptr [eax]
push 00000001h
mov ecx, eax
call dword ptr [edx]
Basic block 5
ret 000008h
Now I would like to group such basic blocks in functions - say which basic blocks form a function. What's the algorithm? I have to remember that there might be many ret instructions inside one function. How to detect fast_call functions?
The simplest algorithm for grouping blocks into functions would be:
note all addresses to which calls are made with call some_address instructions
if the first block after such an address ends with ret, you're done with the function, else
follow the jump in the block to another block and so on until you've followed all possible execution paths (remember about conditional jumps, each of which splits a path into two) and all the paths have finished with ret. You'll need to recognize jumps that organize loops so your program itself does not hang by entering an infinite loop
Problems:
a number of calls can be made indirectly by reading function pointers from memory, e.g. you'd have call [some_address] instead of call some_address
some indirect calls can be made to calculated addresses
functions that call other functions before returning may have jump some_address instead of call some_address immediately followed by ret
call some_address can be simulated with a combination of push some_address + ret OR push some_address + jmp some_other_address
some functions may share code at their end (e.g. they have different entry points, but one or more exit points are the same)
You may use some heuristic to determine where functions start by looking for the most common prolog instruction sequence:
push ebp
mov ebp, esp
Again, this may not work if functions are compiled with the frame pointer suppressed (i.e. they'd use esp instead of ebp to access their parameters on the stack, it's possible).
The compiler (e.g. MSVC++) may also pad the inter-function space with the int 3 instruction and that too can serve as a hint for an upcoming function beginning.
As for differentiating between the various calling conventions, it's perhaps the easiest to look at the symbols (of course, if you have them). MSVC++ generates different name prefixes and suffixes, e.g.:
_function - cdecl
_function#number - stdcall
#function#number - fastcall
If you cannot extract this information from the symbols, you must analyze code to see how parameters are passed to functions and whether functions or their callers remove them from the stack.
You could use the presence of enter to denote the beginning of a function, or certain code which sets up a frame.
push ebp
mov ebp, esp
sub esp, (bytes for "local" stack space)
Later you'll find the opposite code (or leave) before a call to ret:
mov esp, ebp
pop ebp
You can also use the number of bytes for local stack space to identify local variables.
Identifying thiscall, fastcall, etc, will take some analysis of the code just prior to calls which use the initial location and an evaluation of the registers used/cleaned up.
Have a look at software like windasm or ollydbg. The call and ret operations denote function calls. However code does not run sequentially and jumps can be made all over the place. call dword ptr [edx] depends on the edx register and thus you won't be able to know where it goes unless you do runtime debugging.
To recognize fastcall functions you have to look at how parameters are passed on. Fastcall will put the first two pointer sized parameters in edx and ecx registers, where stdcall will push them on the stack. See this article for an explanation.
So I'm disassembling some code (binary bomb lab) and need some help figuring out what's going on.
Here's an IDA screen shot:
(there's some jump table stuff and another comparison below, but I feel a bit more comfortable about that stuff (I think))
Now, I think I know what's going on in this phase, as I've read:
http://teamterradactyl.blogspot.com/2007/10/binary-bomb.html (scroll down to phase 3)
However, I'm used to a different form of the assembly.
The biggest thing I don't understand is all this var_28 = dword ptr -28h stuff at the top.
When sscanf gets called, how does it know where to put each token? And there are only going to be three tokens (which is what the link above says, although I see a %d, %d... so maybe two, I think three though). Basically, can anyone tell me what each of these var_x (and arg_0) will point to after sscanf is called?
They are just relative addressing to the stack pointer right...? But how are these addresses getting filled with the tokens from sscanf?
NOTE: This is homework, but it says not to add the homework tag, because it's obsolete or something. The homework is to figure out the secret phrase to enter via the command line to get past each phase.
NOTE2: I don't really know how to use IDA, my friend just told me to open the bomb file in IDA. Perhaps there's an easy way for me to experiment and figure it out in IDA, but I don't know how.
Local variables are stored just below the frame pointer. Arguments are above the frame pointer. x86 uses BP/EBP/RBP as a frame pointer.
A naïve disassembly would just disassemble lea eax, [ebp+var_10] as lea eax, [ebp-10h]. This instruction is referencing a local variable whose address is 10h (16 bytes) below where the frame pointer points. LEA means Load Effective Address: it's loading the address of the variable at [ebp - 10h] in eax, so eax now contains a pointer to that variable.
IDA apparently is trying to give meaningful names to local variables, but since apparently there is no debug info available it ends up using dummy names. Anyway:
var_10= dword ptr -10h
is just IDA's way of telling that it has created an alias var_10 for -10.
Using this example coming from wikipedia, in which DrawSquare() calls DrawLine(),
(Note that this diagram has high addresses at the bottom and low addresses at the top.)
Could anyone explain me what ebp and esp are in this context?
From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beginning of the the current function? Or what?
edit: I mean this in the context of windows programs
edit2: And how does eip work, too?
edit3: I have the following code from MSVC++:
var_C= dword ptr -0Ch
var_8= dword ptr -8
var_4= dword ptr -4
hInstance= dword ptr 8
hPrevInstance= dword ptr 0Ch
lpCmdLine= dword ptr 10h
nShowCmd= dword ptr 14h
All of them seem to be dwords, thus taking 4 bytes each. So I can see there is a gap from hInstance to var_4 of 4 bytes. What are they? I assume it is the return address, as can be seen in wikipedia's picture?
(editor's note: removed a long quote from Michael's answer, which doesn't belong in the question, but a followup question was edited in):
This is because the flow of the function call is:
* Push parameters (hInstance, etc.)
* Call function, which pushes return address
* Push ebp
* Allocate space for locals
My question (last, i hope!) now is, what is exactly what happens from the instant I pop the arguments of the function i want to call up to the end of the prolog? I want to know how the ebp, esp evolve during those moments(I already understood how the prolog works, I just want to know what is happening after i pushed the arguments on the stack and before the prolog).
esp is as you say it is, the top of the stack.
ebp is usually set to esp at the start of the function. Function parameters and local variables are accessed by adding and subtracting, respectively, a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frame's local variables to work.
Most function prologs look something like:
push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)
mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.
EDIT:
For your updated question, the missing two entries in the stack are:
nShowCmd = dword ptr +14h
hlpCmdLine = dword ptr +10h
PrevInstance = dword ptr +0Ch
hInstance = dword ptr +08h
return address = dword ptr +04h <==
savedFramePointer = dword ptr +00h <==
var_4 = dword ptr -04h
var_8 = dword ptr -08h
var_C = dword ptr -0Ch
This is because the flow of the function call is:
Push parameters (hInstance, PrevInstance, hlpCmdLine, nShowCmd)
Call function, which pushes return address
Push ebp
Allocate space for locals
ESP (Stack Pointer) is the current stack pointer, which will change any time a word or address is pushed or popped on/off the stack. EBP (Base Pointer) is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.
Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack by the calling function (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP, Instruction Pointer) onto the stack.
Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the Frame Pointer, because it points to the center of the function call frame.
Upon exit, all the function has to do is set ESP to the value of EBP (which deallocates the local variables from the stack, and exposes the entry EBP on the top of the stack), then pop the old EBP value from the stack, and then the function returns (popping the return address into EIP).
Upon returning back to the calling function, it can then increment ESP in order to remove the function arguments it pushed onto the stack just prior to calling the other function. At this point, the stack is back in the same state it was in prior to invoking the called function.
You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.
When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).
Doing memory allocation this way is very, very fast and efficient.
EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.
Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.
Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.
This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.
To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.
First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:
// push eax
--*esp = eax
// pop eax
eax = *esp++;
// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack
// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.
Long time since I've done Assembly programming, but this link might be useful...
The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.
The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.
Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.
Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.