Suppose you are running a program with interrupts handling enabled on a processor. Instruction Pointer points to zero. How can we get to know the cause that caused the Instruction Pointer to point to 0.
I'm not clear whether is it something related to the location of ISRs? As far as I know in some of the processors, IP=0 means the reset address. But why would a running program goto the address?
What all could be the reasons causing IP to be pointing to 0?
Basically all jmp instructions and ret can jump to 0. Examples:
jnz 0 ;; encoded as relative jump JNZ -(next IP)
jmp 00000000 ;; absolute jump
mov ebx, 0
jmp ebx ;; indirect jump
call 0
mov ecx,0
push ecx
ret ;; jump through stack
In C one can (try to) jump through NULL/uninitialized function pointer, as well as through corrupting stack. Some esoteric tricks would be to insert an exception handler (signal) to point to null, or use longjmp.
In x86 architecture (real mode) pointers to the interrupt handlers start at address 0:0, but one doesn't jump there. Instead the table contains 'segment:offset' pairs to be jumped to indirectly.
Debugging methods include bisecting the code with breakpoints, until the next instruction you run, causes your error. Inspecting stack should tell what was the last function that was executed. Sometimes the stack is still valid to show the complete callback trace.
Related
What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.
I want to understand how my data String ends up in rdx. In my mind the mov instruction puts data found at address into the target. So the content from rbp-0x28 is put into rdx. I checked whats in rbp-0x28 and it is not the data string ('AAAAAAA'). If, however, I let the command execute with ni then rdx contains the string. I dont know how the String ends up in rdx as it is not contained in rbp-0x28 beforehand. I know that my data is contained in 0x7fffffffe58f but Im not sure how or when its loaded into rdx. Any help is greatly appreciated!
This depends a lot on which compiler or debugger you're using as well as the architecture and calling convention. I did run your code with Apple's Clang compiler and lldb and got the expected results. There are minior variations between my output and your output but it's relatively easily to follow. Since you only posted partial output of your functions debug at offset+0x12 I'll assume that prior whichever register register held the first argument to the function call (in my case RDI) moved the pointer into [rbp-0x28]
This was my output.
mov rsi, qword ptr[rbp-0x30] is the equivellent of your mov rdx,[rbp-0x28] I think you're under Microsoft's x64 ABI calling convention so your first argument is passed through rcx. But prior to that instruction it's mov [rbp-0x30], rdi which I believe in your case will be mov [rbp-0x28],rcx
In the next instruction mov rdi,rcx I breakpointed again. Here I read the contents rsi which in your case would be rdx. It printed rsi = 0x00007ffeefbff94a
At that specific memory address I got the results 'AAAAAAA' Next I read the register rbp and printed rbp = 0x00007ffeefbff740 Then I read the memory address of 0x0x00007ffeefbff740-0x30 (in your case it would be -0x28) which is 0x0x7ffeefbff710 and here it was the same address stored in rsi
0x7ffeefbff94a (Little endian). Which we know points to the string 'AAAAAAA' So I'm going to assume what you're expecting at RBP-0x28 is the string itself. It should be the address which holds a pointer to the string. Also make sure to do your offsets correctly. Follow these steps:
Breakpoint at lea rax,[rbp-0x20]
Check the value of rdx, view the memory at that address and it should give you the string.
Then check the value of rbp. Subtract 0x28 from it. View the memory at the offset.
This should give you the value of rdx. Which should in turn point to the string you're looking for.
What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.
We are using arm9 with ucos. The OS_CPU_ARM_ExceptHndlr_BrkTask common porting function's last instrument has strange behavior in our system.
Instrument: LDMFD SP!,{R0-R12,LR,PC}^
Let's suppose the SP is 0x10002000, and the following 15 DWORDs (which will be copied to R0-R12, LR, PC) have values from 1 to 15. We find the PC (R15) is changed and jumps to 15, but the SP (R13) is changed to a strange value (an address far outside the stack memory space). I expected it would become 0x1000203C (0x10002000+4*15).
Why is R13 changed this way?
This instruction loads r14, like the other registers, from the stack. Write to PC causes the jump. This is not a branch and link that would set the return address to the link register.
Additionally, this instruction is actually an exception return (Because of the ^). So depending on the mode you are returning from, r14 might be banked. So after the exception return, you might see a different r14 than the one that was loaded from memory.
I have an application which creates .text segment dumps of win32 processes. Then it divides the code on basic blocks. Basic block is a set of instructions which are executed always one after another (jumps are always the last instructions of such basic blocks). Here is an example:
Basic block 1
mov ecx, dword ptr [ecx]
test ecx, ecx
je 00401013h
Basic block 2
mov eax, dword ptr [ecx]
call dword ptr [eax+08h]
Basic block 3
test eax, eax
je 0040100Ah
Basic block 4
mov edx, dword ptr [eax]
push 00000001h
mov ecx, eax
call dword ptr [edx]
Basic block 5
ret 000008h
Now I would like to group such basic blocks in functions - say which basic blocks form a function. What's the algorithm? I have to remember that there might be many ret instructions inside one function. How to detect fast_call functions?
The simplest algorithm for grouping blocks into functions would be:
note all addresses to which calls are made with call some_address instructions
if the first block after such an address ends with ret, you're done with the function, else
follow the jump in the block to another block and so on until you've followed all possible execution paths (remember about conditional jumps, each of which splits a path into two) and all the paths have finished with ret. You'll need to recognize jumps that organize loops so your program itself does not hang by entering an infinite loop
Problems:
a number of calls can be made indirectly by reading function pointers from memory, e.g. you'd have call [some_address] instead of call some_address
some indirect calls can be made to calculated addresses
functions that call other functions before returning may have jump some_address instead of call some_address immediately followed by ret
call some_address can be simulated with a combination of push some_address + ret OR push some_address + jmp some_other_address
some functions may share code at their end (e.g. they have different entry points, but one or more exit points are the same)
You may use some heuristic to determine where functions start by looking for the most common prolog instruction sequence:
push ebp
mov ebp, esp
Again, this may not work if functions are compiled with the frame pointer suppressed (i.e. they'd use esp instead of ebp to access their parameters on the stack, it's possible).
The compiler (e.g. MSVC++) may also pad the inter-function space with the int 3 instruction and that too can serve as a hint for an upcoming function beginning.
As for differentiating between the various calling conventions, it's perhaps the easiest to look at the symbols (of course, if you have them). MSVC++ generates different name prefixes and suffixes, e.g.:
_function - cdecl
_function#number - stdcall
#function#number - fastcall
If you cannot extract this information from the symbols, you must analyze code to see how parameters are passed to functions and whether functions or their callers remove them from the stack.
You could use the presence of enter to denote the beginning of a function, or certain code which sets up a frame.
push ebp
mov ebp, esp
sub esp, (bytes for "local" stack space)
Later you'll find the opposite code (or leave) before a call to ret:
mov esp, ebp
pop ebp
You can also use the number of bytes for local stack space to identify local variables.
Identifying thiscall, fastcall, etc, will take some analysis of the code just prior to calls which use the initial location and an evaluation of the registers used/cleaned up.
Have a look at software like windasm or ollydbg. The call and ret operations denote function calls. However code does not run sequentially and jumps can be made all over the place. call dword ptr [edx] depends on the edx register and thus you won't be able to know where it goes unless you do runtime debugging.
To recognize fastcall functions you have to look at how parameters are passed on. Fastcall will put the first two pointer sized parameters in edx and ecx registers, where stdcall will push them on the stack. See this article for an explanation.