Unexpected behaviour after calling __stack_chk_fail - c

In x86, GCC generates the following instructions when it wants to call __stack_chk_fail:
; start of the basic block
00000757 call sub_590 ; __stack_chk_fail#plt
0000075c add byte [ds:eax], al
0000075e add byte [ds:eax], al
; start point of another function
Similar behavior happens in ARM:
; start of the basic block
00001000 bl __stack_chk_fail#PLT
00001004 dd 0x0000309c ; data entry, NOT executable indeed!
In static analysis tools, when one wants to build a CFG, the CFG algorithm can't determine last instruction of the basic block which the __stack_chk_fails is called in.
It's reasonable to have some sort of return instruction after calling __stack_chk_fail to prevent CPU to execute instructions (or potentially data entries) which it shouldn't.
In these cases, CFG generator algorithm assumes it's a regular function call and continues traversing to another function's code (in the former example) or to data entries (in the later one) which is totally unwanted.
So, my question is why doesn't GCC insert a return (or branch) instruction at the end point of the basic block?

Related

How to find all the reachable labels in assembly files?

I'm working on programming a tool which aimed to separate assembly codes into different sections and labels. I'm trying to add a recursive mode.
If i'd like to print codes of one specific label and codes in the label content symbols of other labels, recursive mode should print labels referred to at the same time.
For Example:
.file sample.s
...
A:
...
call B
...
B:
...
C:
...
For codes above, if i'd like to print codes in label A on recursive mode, codes in label A and B should be printed at the same time.
To do this, i have to find all the label reference symbol for each line.
Some of instructions may be important like call, lea, jmp. But it's not easy to list all the conditions.
Any ideas? Thanks for your help!
So you want to print all code reachable from a given label, except by returning further up the call tree? (i.e. all other basic blocks of this function, all child functions, and tail-call siblings).
The normal / simplest way for execution to get from one label to another is so simply fall through. Like
mov ecx, 123
looptop: ; do {
...
dec ecx
jnz looptop ; }while(--ecx)
Unless the last instruction before the next label is an unconditional jump (like jmp or ret, but not call which can eventually return), you should also be following execution into that next block. A ret should end processing, jmp could be followed if you want, jnz might fall through.
For conditional branches, you presumably need to follow both sides.
Trying to trace through indirect jumps after code loads a function-pointer into a register with a RIP-relative LEA or a MOV is probably too hard. Do you really want to be able to trace foo(callback_func, 123) and be able to print the code for foo and the code it might call at callback_func?
If the arg is passed in a register (like x86-64 calling conventions) and it doesn't store it to the stack and reload it, then it's fairly easy to match that up with a jmp rdi after seeing there have been no intervening writes to RDI in between. But if it is more complex, like a debug built storing RDI to the stack and reloading somewhere else, you basically need an x86-64 simulator to trace the values.
I think it might be better to not even attempt tracing through indirect jumps, rather than having something that sometimes works (simple cases), sometimes doesn't. So probably you should forget about lea, unless you're thinking about dumping data declarations for static data referenced with LEA or MOV.
Some int 0x80 or syscall are noreturn (e.g. _exit, or sigreturn), but most aren't. The behaviour depends on the RAX/EAX value (and on the OS). Usually EAX gets set pretty soon before a system call, so you might want to special case the noreturn ones, otherwise you'll fall through past an exit into other code that shouldn't necessarily execute.
Same applies for library function calls like call exit.

Determining return address of function on ARM Cortex-M series

I want to determine the return address of a function in Keil. I opened diassembly section at debugging mode in Keil uvision. What is shown is some assembly code like this:
My intention is to inject a simple binary code to microcontroller via using buffer overflow at microcontroller.see: Buffer overflow
I want to determine the return address of "test" function . Is it a must to know how to read assembly code or are there any trick to find the return address?
I am newbie to assembly.
R14 or in other name LR hold the return address. On the left you can see it in the picture. It is 0x08000287.
When a function is called, R14 will be overwritten with the address following the call ("BL" or "BLX") instruction. If that function doesn't call any other functions, R14 will often be left holding the return address for its duration. Further, if the function tail-calls another function, the tail call may be replaced with a branch ("B" or "BX"), with R14 holding the return address of the original caller. If a function makes a non-tail call to another function, it will be necessary to save R14 "somewhere" (typically the stack, but possibly to another previously-used caller-saved register) at some time before that, and retrieve that value from the stack at some later time, but if optimizations are enabled the location where R14 is saved will generally be unpredictable.
Some compilers may have a mode that would stack things consistently enough to be usable, but code will be very compiler-dependent. The technique most likely to be successful may be to do something like:
extern int getStackAddress(uint8_t **addr); // Always returns zero
void myFunction(...whavever...)
{
uint8_t *returnAddress;
if (getStackAddress(&returnAddress)) return; // Put this first.
}
where the getStackAddress would be a machine-code function that stores R14 to the address in R0, loads R0 with zero, and then branches to R14. There are relatively few code sequences that would be likely to follow that, and if a code examines instructions at the address stored in returnAddress and recognizes one of these code sequences, it would know that the return address for myFunction is stored in a spot appropriate for the sequence in question. For example, if it sees:
test r0,r0
be ...
pop {r0,pc}
It would know that the caller's address is second on the stack. Likewise if it sees:
cmp r0,#0
bne somewhere:
somewhere: ; Compute address based on lower byte of bne
pop {r0,r1,r2,r4,r5,pc}
then it would know that the caller's address is sixth.
There are a few instructons compilers could use to test a register against zero, and some compilers might use be while others use bne, but for the code above compilers would be likely to use the above pattern, and so counting how many bits are set in the pop instruction would reveal the whereabouts of the return address on the stack. One wouldn't know until runtime whether this test would actually work, but in cases where it claims to identify the return address it should actually be correct.
You can find all the answers in the Cortex-M documentation
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337h/Chdedegj.html

GCC C and ARM Assembly Stack Cleanup

If I call an ARM assembly function from C, sometimes I need to pass in many arguments. If they do not fit in registers r0, r1, r2, r3 it is generally expected that 5-th, 6-th ... x-th arguments are pushed onto stack so that ARM assembly can read them from it.
So in the ARM function I receive some arguments that are on the stack. After finishing the assembly function I can either remove these arguments from stack or leave them there and expect that the C program will deal with them later.
If we are talking about GCC C and ARM assembly who is usually responsible for cleaning up the stack?
The function that made the call (A)
Or the function that was called (B)
I understand that when developing we could agree on either convention. But what is generally used as the default in this particular case (ARM assembly and GCC C)?
And how would generally a low level piece of code describe which behavior it implements? It seems that there should be some kind of standard description for this. If there isn't one it seems that you pretty much just have to try them both and look at which one does not crash.
If someone is interested in how the code could look like:
arm_function:
stmfd sp, {r4-r12, lr} # Save registers that are not the first three registers, SP->PASSED ARGUMENTS
ldmfd sp, {r4-r6} # Load 3 arguments that were passed through the stack, SP->PASSED ARGUMENTS
sub sp, sp, #40 # Adjust the stack pointer so it points to saved registers, STACK POINTER->SAVED REGISTERS->PASSED ARGUMENTS
#The main function body.
ldmfd sp!, {r4-r12, lr}, # Load saved registers STACK POINTER->PASSED ARGUMENTS
add sp, sp, #12 # Increment stack pointer to remove passed arguments, SP->NOTHING
# If the last code line would not be there, the caller would need to remove the arguments from stack.
UPDATE:
It seems that for C/C++ choice A. is pretty standard. Compilers usually use calling conventions like cdecl that work pretty similar to code in the answers below. More information can be found in this link about calling conventions. Changing C/C++ calling convention for a function does not seem to be so common/easy. With older C standard I could not manage to change it, so it looks like using A should be a decent default choice.
The current ARM procedure call standard is AAPCS.
The language-specific ABI can be found here. Relevant will be the document about C, but others should be similar (why reinvent the wheel?).
A good start for reading might be page 14 in the AAPCS.
It basically requires the caller to clean up the stack, as this is the most simple way: push additional arguments onto the stack, call the function and after return simply adjust the stack pointer by adding an offset (the number of bytes pushed on the stack; this is always a multiple of 4 (the "natural 32bit ARM word size).
But if you use gcc, you can just avoid handling the stack yourself by using inline assembler. This provides features to pass C variables (etc.) to the assembler code. This will also automatically load a parameter into a register if required. Just have a look at the gcc documentation. It is a bit hard to figure out in detail, but I prefer this to having raw assember stubs somewhere.
Ok, i added this as there might be problems understanding the principle:
caller:
...
push r5 // argument which does not fit into r0..r3 anymore
bl callee
add sp,4 // adjust SP
callee:
push r5-r7,lr // temp, variables, return address
sub sp,8 // local variables
// processing
add sp, 8 // restore previous stack frame
pop r5-r7,pc // restore temp. variables and return (replaces bx)
You can verify this by just disassmbling some sample C functions. Note that the pre- and postamble may vary if no temp registers are used or the function does not call another function (no need to stack lr for this).
Also, the caller might have to stack r0..r3 before the call. But that is a matter of compiler optimizations.
Disassembly can be done with gdb and objdump for example.
I use -mabi=aapcs for gcc invocation; not sure if gcc would otherwise use a different standard. Note that all object files have to use the same standard.
Edit:
Just had a peek in the AAPCS and that states that the SP need only 4 byte alignment. I might have confused this with the Cortex-M interrupt handling system which (for whatever reason, possibly for M7 which has 64 bit busses) aligns the SP to 8 bytes by default (software-config option).
However, SP must be 8 byte aligned at a public interface. Ok, the standard actually is more complicated than I remembered. That's why I prefer gcc caring about this stuff.
If some spaces allocated on the stack by caller function (argument passing), stack clearance done within the caller function. And how it happens you may ask. In ARM #Olaf has completely cleared, and in x86 it is usually like this:
sub esp, 8 ; make some room
... ; move arguments on stack
call func
add esp, 8 ; clean the stack
or
push eax ; push the arguments
push ebx ; or pusha, then after call, popa
call func
add esp, 8 ; assuming registers are 4 bytes each
Also how the interaction between caller and callee in a system takes places is explained in ABI (Application Binary Interface) You may find it useful.

Saving registers state in COM program

I disassembled a simple DOS .COM program and there was some code which saves and restores registers values
PUSH AX ; this is the first instruction
PUSH CX
....
POP CX
POP AX
MOV AX, 0x00 0x4C
INT 21 // call DOS interrupt 21 => END
This is very similar to function prologue and epilogue in C programs. But prologues are added automatically by compiler, and the program above was written manually in assembler, so the programmer took full responsibility for saving and restoring values in this code.
My question is what will happen if I unintentionally forgot to save some registers in my program?
And what if I intentionally replace these instructions to NOP in HEX editor? Will this lead to program crash? And why called function is responsible for saving outer context on the stack? From my point of view this should be done somehow in calling function to prevent problems if I use 3rd party libraries and poorly written code which may break my program execution.
One problem of making the calling function save all of its working registers before calling another function is that sometimes a function is interrupted (i.e. a hardware interrupt) without its knowledge. In DOS, for example, there was that pesky 54 millisecond timer tick. 18 times per second, a hardware interrupt would transfer control from whatever code was executing to the timer tick handler. This happened automatically unless your program specifically disabled interrupts.
The timer tick handler would then save all of the registers it was going to use, do its work, and then restore the registers it saved before returning.
Sure, you could say that interrupt handlers are special, but why? Even with the paucity of registers on the 8086 (AX, BX, CX, DX, SI, DI, Flags -- did I forget anything? I purposely didn't include the segment registers), making a function save its entire state before transferring control means that you'd be using a lot of unnecessary stack space and execution cycles to save things because they might be modified. But if the called function is responsible for saving just the registers it uses, and it only uses AX and CX, then it can save just those two registers. It makes for smaller and faster code, and much less stack space usage.
When you start talking about call hierarchies that are many levels deep, the difference between pushing 8 registers rather than 2 registers adds up pretty quickly.
Consider the x86-64, with its 64 general purpose registers. Do you really think a function should be forced to save all 64 of those registers before calling another function, even when the called function only uses two of them? Saving 64 64-bit registers requires 512 bytes of stack space. As opposed to saving two registers requiring only 16 bytes.
The primary point of writing things in assembly language these days is to write faster and smaller code than what a compiler can write. A guiding principle is don't do more work than you have to. That means it's up to you to know what registers your assembly language function is using, and to save those registers on entry and restore them on exit.
If you don't want to guard against forgetting what to push or pop I would advise sticking to a higher level language.
In assembler, if the function is your own then you should save and restore all registers you use within the function except those which return an output from the function. If others wrote the function, look up its documentation. If in doubt, save/restore registers before/after calling the function (except those which are supposed to return a value).
Since the DOS Terminate function does not rely on any register settings (other than AX) for its operation (*) both pushes/pops in the code you have posted seem superfluous. You should however be aware that the programmer could have pushed these values for the purpose of using them locally! So replacing both these pushes by NOP in HEX editor is surely a bad idea. You could however replace both pops by NOP because at that point in the program the restoration of AX/CX as well as balancing the stack are unnecessary because of (*).
Since your question is about saving registers on the program level the answer must be that pushing/popping registers for the sake of saving them is useless. Nothing bad will happen if you unintentionally forgot to save some registers in your program.

ARM: link register and frame pointer

I'm trying to understand how the link register and the frame pointer work in ARM. I've been to a couple of sites, and I wanted to confirm my understanding.
Suppose I had the following code:
int foo(void)
{
// ..
bar();
// (A)
// ..
}
int bar(void)
{
// (B)
int b1;
// ..
// (C)
baz();
// (D)
}
int baz(void)
{
// (E)
int a;
int b;
// (F)
}
and I call foo(). Would the link register contain the address for the code at point (A) and the frame pointer contain the address at the code at point (B)? And the stack pointer would could be any where inside bar(), after all the locals have been declared?
Some register calling conventions are dependent on the ABI (Application Binary Interface). The FP is required in the APCS standard and not in the newer AAPCS (2003). For the AAPCS (GCC 5.0+) the FP does not have to be used but certainly can be; debug info is annotated with stack and frame pointer use for stack tracing and unwinding code with the AAPCS. If a function is static, a compiler really doesn't have to adhere to any conventions.
Generally all ARM registers are general purpose. The lr (link register, also R14) and pc (program counter also R15) are special and enshrine in the instruction set. You are correct that the lr would point to A. The pc and lr are related. One is "where you are" and the other is "where you were". They are the code aspect of a function.
Typically, we have the sp (stack pointer, R13) and the fp (frame pointer, R11). These two are also related. This
Microsoft layout does a good job describing things. The stack is used to store temporary data or locals in your function. Any variables in foo() and bar(), are stored here, on the stack or in available registers. The fp keeps track of the variables from function to function. It is a frame or picture window on the stack for that function. The ABI defines a layout of this frame. Typically the lr and other registers are saved here behind the scenes by the compiler as well as the previous value of fp. This makes a linked list of stack frames and if you want you can trace it all the way back to main(). The root is fp, which points to one stack frame (like a struct) with one variable in the struct being the previous fp. You can go along the list until the final fp which is normally NULL.
So the sp is where the stack is and the fp is where the stack was, a lot like the pc and lr. Each old lr (link register) is stored in the old fp (frame pointer). The sp and fp are a data aspect of functions.
Your point B is the active pc and sp. Point A is actually the fp and lr; unless you call yet another function and then the compiler might get ready to setup the fp to point to the data in B.
Following is some ARM assembler that might demonstrate how this all works. This will be different depending on how the compiler optimizes, but it should give an idea,
; Prologue - setup
mov ip, sp ; get a copy of sp.
stmdb sp!, {fp, ip, lr, pc} ; Save the frame on the stack. See Addendum
sub fp, ip, #4 ; Set the new frame pointer.
...
; Maybe other functions called here.
; Older caller return lr stored in stack frame.
bl baz
...
; Epilogue - return
ldm sp, {fp, sp, lr} ; restore stack, frame pointer and old link.
... ; maybe more stuff here.
bx lr ; return.
This is what foo() would look like. If you don't call bar(), then the compiler does a leaf optimization and doesn't need to save the frame; only the bx lr is needed. Most likely this maybe why you are confused by web examples. It is not always the same.
The take-away should be,
pc and lr are related code registers. One is "Where you are", the other is "Where you were".
sp and fp are related local data registers.One is "Where local data is", the other is "Where the last local data is".
The work together along with parameter passing to create function machinery.
It is hard to describe a general case because we want compilers to be as fast as possible, so they use every trick they can.
These concepts are generic to all CPUs and compiled languages, although the details can vary. The use of the link register, frame pointer are part of the function prologue and epilogue, and if you understood everything, you know how a stack overflow works on an ARM.
See also: ARM calling convention.
MSDN ARM stack article
University of Cambridge APCS overview
ARM stack trace blog
Apple ABI link
The basic frame layout is,
fp[-0] saved pc, where we stored this frame.
fp[-1] saved lr, the return address for this function.
fp[-2] previous sp, before this function eats stack.
fp[-3] previous fp, the last stack frame.
many optional registers...
An ABI may use other values, but the above are typical for most setups. The indexes above are for 32 bit values as all ARM registers are 32 bits. If you are byte-centric, multiply by four. The frame is also aligned to at least four bytes.
Addendum: This is not an error in the assembler; it is normal. An explanation is in the ARM generated prologs question.
Disclaimer: I think this is roughly right; please correct as needed.
As indicated elsewhere in this Q&A, be aware that the compiler may not be required to generate (ABI) code that uses frame pointers. Frames on the call stack can often require useless information to be put there.
If the compiler options call for 'no frames' (a pseudo option flag), then the compiler can generate smaller code that keeps call stack data smaller. The calling function is compiled to only store the needed calling info on the stack, and the called function is compiled to only pop the needed calling information from the stack.
This saves execution time and stack space - but it makes tracing backwards in the calling code extremely hard (I gave up trying to...)
Info about the size and shape of the calling information on the stack is only known by the compiler and that info was thrown away after compile time.

Resources