How to find all the reachable labels in assembly files? - c

I'm working on programming a tool which aimed to separate assembly codes into different sections and labels. I'm trying to add a recursive mode.
If i'd like to print codes of one specific label and codes in the label content symbols of other labels, recursive mode should print labels referred to at the same time.
For Example:
.file sample.s
...
A:
...
call B
...
B:
...
C:
...
For codes above, if i'd like to print codes in label A on recursive mode, codes in label A and B should be printed at the same time.
To do this, i have to find all the label reference symbol for each line.
Some of instructions may be important like call, lea, jmp. But it's not easy to list all the conditions.
Any ideas? Thanks for your help!

So you want to print all code reachable from a given label, except by returning further up the call tree? (i.e. all other basic blocks of this function, all child functions, and tail-call siblings).
The normal / simplest way for execution to get from one label to another is so simply fall through. Like
mov ecx, 123
looptop: ; do {
...
dec ecx
jnz looptop ; }while(--ecx)
Unless the last instruction before the next label is an unconditional jump (like jmp or ret, but not call which can eventually return), you should also be following execution into that next block. A ret should end processing, jmp could be followed if you want, jnz might fall through.
For conditional branches, you presumably need to follow both sides.
Trying to trace through indirect jumps after code loads a function-pointer into a register with a RIP-relative LEA or a MOV is probably too hard. Do you really want to be able to trace foo(callback_func, 123) and be able to print the code for foo and the code it might call at callback_func?
If the arg is passed in a register (like x86-64 calling conventions) and it doesn't store it to the stack and reload it, then it's fairly easy to match that up with a jmp rdi after seeing there have been no intervening writes to RDI in between. But if it is more complex, like a debug built storing RDI to the stack and reloading somewhere else, you basically need an x86-64 simulator to trace the values.
I think it might be better to not even attempt tracing through indirect jumps, rather than having something that sometimes works (simple cases), sometimes doesn't. So probably you should forget about lea, unless you're thinking about dumping data declarations for static data referenced with LEA or MOV.
Some int 0x80 or syscall are noreturn (e.g. _exit, or sigreturn), but most aren't. The behaviour depends on the RAX/EAX value (and on the OS). Usually EAX gets set pretty soon before a system call, so you might want to special case the noreturn ones, otherwise you'll fall through past an exit into other code that shouldn't necessarily execute.
Same applies for library function calls like call exit.

Related

Why does this assembly code segfaults upon printf [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

ARMv8 illegal instruction [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

MPLAB/XC8 can't jump in ASM?

I have a project for the PIC18F25K50 of mixed C and Assembly; most of what I want to do I can easily manage (and must for efficiency) in Assembly, but some parts where I care more about ease of development use C. I actually have a couple of these, and I keep encountering the same issue: I can't use ASM to jump to a label. Every single function to jump - CALL, GOTO, BNC, and so on - will fail if given a label, setting PC to some random-but-consistent value where there are no instructions, causing the program to hang. Using an address works fine: BC $+4 skips the next line.
An example of what does not work is this:
#asm
_waitUS:
GLOBAL _waitUS
waitLoop:
//12 cycles = 1 microsecond:
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
DECFSZ WREG, F, ACCESS
GOTO waitLoop
RETURN
#endasm
void main() {
//DEBUG:
waitUS(6);
}
Now, this may not work overall, and I am begging you to focus on the issue of jumping - this is still in prototyping because I can't even get the function called. The program does compile without issue.
As soon as waitUS(6) is called, the PC jumps from - in my case - 0x7C96 to 0x52. Swapping the C call out for MOVLW 6; CALL _waitUS breaks in exactly the same way.
If I strictly use C for calling/jumping (as I had to in the previous project), it works fine, and figures out where it's going.
I've been searching for an answer to this for a few weeks now, and still haven't seen anyone else with this problem, even though every project I make (including plaintext in notepad, compiling via command line) has the exact same issue. What the heck is up with this?
Edit: Having discovered the program memory view, I was able to get a better idea of what it's doing. The compiler does know where the functions are, and it is trying to jump to the right location. Apparently, CALL just doesn't know where it's going.
Example:
Address 0x7C92 contains CALL 0x2044, 0. That is precisely what it ought to, that is where the desired function starts. However, upon running this instruction, PC is altered to 0x205E, missing half of the function.
Attempting to be clever, I decided to tack on several NOPs to the start of the function after its label, lining the real code up with 0x205E. Unfortunately, it seems any change alters where its unpredictable jumping will land, and it then landed at 0x2086 instead.
Incidentally, when it starts running at random places, it will often run across a GOTO - and it will jump to the specified location as intended. This only works within the same function, as trying to use GOTO instead of CALL ends up in the same incorrect location, despite what the compiled result demands.
The PDF document at http://ww1.microchip.com/downloads/en/DeviceDoc/33014K.pdf has many examples on how to code the PIC18.
Here is one such example:
RST CODE 0x0 ;The code section named RST
;is placed at program memory
;location 0x0. The next two
;instructions are placed in
;code section RST.
pagesel start ;Jumps to the location labelled
goto start ;’start’.
PGM CODE ;This is the beginning of the
;code section named PGM. It is
;a relocatable code section
;since no absolute address is
;given along with directive CODE.
start
movlw D'10'
movwf delay_value
xorlw 0x80
call delay
goto start
end

redirecting a c function at runtime and calling the original function

My program redirects a function to another function by writing a jmp instruction to the first few bytes of the function (only i386). It works like expected but it means that I can't call the original function anymore, because it will always jump to the new one.
There are two possible workarounds I could think of:
Create a new function, which overwrites the jmp instruction of the target function and call it. Afterwards the function writes back the jmp instruction.
But I'm not sure how to pass the arguments since there can be any number of them. And I wonder if the target function can jmp somewhere else and skip writing back the jmp instruction (like throw catch?).
Create a new function which executes the code I have overwritten with the jmp instruction. But I can't be sure that the overwritten data is a complete instruction. I'd have to know how many bytes I have to copy for a complete instructions.
So, finally, my questions:
Is there another way I didn't think of?
How do I find the size of an instruction? I already looked at binutils and found this but I don't know how to interpret it.
Here is a sample:
mov, 2, 0xa0, None, 1, Cpu64, D|W|CheckRegSize|No_sSuf|No_ldSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword }
the 2nd column shows the number of operands (2) and the last column has information about the operands, seperated by a comma
I also found this question which is pretty much the same but I can't be sure that the 7 bytes contain a whole instruction.
Writing a Trampoline Function
Any help is appreciated! Thanks.
Sebastian, you can use the exe_load_symbols() function in hotpatch to get a list of the symbols and their location in the existing exe and then see if you can overwrite that in memory. I have not tried it yet. You may be able to do it with the LD_PRELOAD environment variable as well instead of hotpatch.
--Vikas
How about something like this:
Let's say this is the original function:
Instruction1
Instruction2
Instruction3
...
RET
convert it to this:
JMP new_stuff
old:
Instruction2
Instruction3
...
RET
...
new_stuff:
CMP call_my_function,0
JNZ my_function
Instruction1
JMP old
my_function:
...
Of course you'd have to take the size of the original instructions into account (you could find that out by disassembling with objdump, for example) so that the first JMP fits perfectly (pad with NOPs if the JMP is shorter than the original instruction(s)).

Is 2 pass on the source file necessary for assembler and linker?

I heard many times that the assembler and linker need to traverse its input file at least 2 times, is this really necessary? Why cannot it been done in one pass?
The assembler translates a symbolic assembler language into a binary representation.
In the input language (assembler), labels are symbolic too.
In the binary output language they are typically a distance in bytes, relative to the current position or some other fixed point (e.g. jump so many bytes ahead or back).
The first pass just determines the offset from the start of the code or some other fixed point of all assembler instructions to fixate the position of the labels.
This allows to calculate the correct jump distances from branch instructions in the second pass.
One pass assembler would be possible, but you would only be able to jump to labels you already had declared ("bacK") not forward.
One example when this is necessary is when two functions call each other.
int sub_a(int v);
int sub_b(int v);
int sub_a(int v) {
int u = v;
if ( 0 < u ) {
u = sub_b( v - 1 );
}
return u - 1;
}
int sub_b(int v) {
int u = v;
if ( 0 < u ) {
u = sub_a( v - 1 );
}
return u - 1;
}
It is then necessary to do a two-pass scan. As any ordering of the functions will have a dependency on a function that hasn’t been scanned.
it may even take more than two.
here:
...
jmp outside
...
jmp there
...
jmp here
...
there:
In particular for instruction sets that have some form of a near jump and some form of a far jump. The assembler doesnt always want to waste a far jump on every branch/jmp. Take the code above for example when it gets to the jmp here line it knows how many instructions are between the here label and the jump to here instruction. it can make a pretty good estimate if it is going to need to encode that as a near or far jump. Normally the far version of a jump is a case where it takes more bytes to implement causing all the instructions and labels that follow to shift.
When it encounters the jmp there instruction it does not know long or far and has to come
back later on a separate pass (through the data). When it encounters the label there it could go back and look to see if up to this point there has been a reference to it, and patch up that reference. that is another pass through the data, pass 2. or you just make one complete pass through the source code, then start to go back and forth through the data more times.
Lets say the jump outside does not resolve a label. Now depending on the instruction set the assembler has to respond. Some instruction sets, lets say the msp430 where a far jump simply means an absolute address in memory, all of memory space, no segments or nothing like that. Well you could simply assume a far jump and leave the address for the linker to fill in later. some instruction sets like ARM you have to allocate some memory, within near
reach of the instruction. often hiding things behind unconditional branches (this can be a bad thing and fail). Basically you need to allocate a place where the whole address to the
external item can be referenced, encode the instruction to load from that near memory location and let the linker fill in the address later.
Back to here and there. What if on the first pass you assumed that all of the unknown jumps were near and on the first pass computed addresses based on that. And if on that pass here was exactly 128 bytes from the jmp here instruction for an instruction set that has a reach of only 128 bytes. So you assume jmp here is also near, and to make this painful what if when there was found jump there to there was 127 bytes which was your maximum near jump forward. But outside is not found! it has to be far, so you need to burn some more bytes, now the here to jmp here is too far it needs to be more bytes, now the jmp there is too far and it needs to be more bytes. How many passes through the data did it take to figure those three things out? More than two. One pass to start. the second pass marks outside as far, the assumption has jmp there as near on the second pass, when it gets to jmp here it discovers that has to be a far jump causing the there address to change. The third pass it discovers that jmp there needs to be far and that affects everything after that instruction. For this simple code that is it everything is resolved.
think about a bubble sort. you keep looping through the data doing swaps until you have a flag that says, I made no changes on that last pass, indicating everything is resolved, we are done. You have to play the same game with an assembler. For instruction sets like ARM you need to do things like try to find places to tuck away addresses and constants/immediates that dont encode into a single instruction. That is if the assembler
wants to do that work for you. You could easily declare an error and say the destination
is too far for the instruction chosen. Arm assemblers allow you to be lazy and do things like:
ldr r0,=0x1234567
...
ldr r1,=lab7
...
lab7:
The assembler looks at that = and knows it has to determine, can I encode that constant/immediate in the instruction (changing your ldr to a mov for you) or do I need to find a place wedged in your code to place the word, and then encode the instruction with a
near address offset.
Even without dealing with near and far, simply resolving addresses, the outside, there, here example above takes two passes. first pass reads everything, jump here happens to know where here is on the first pass. but you have to make a second pass through the program (not necessarily from the disk, can keep the info in memory) there might be a jump to here that preceeds the here: label. the second pass will find the jump outside and know there is no outside label in the program marking it on the second pass as unresolved or external depending on the rules of the assembler. The second pas resolves the jump there as being a known label, and the second pass doesnt mess with the jump here because it resolved it on
the first pass. This is your classic two pass problem/solution.
The linker has the same problem, it has to pass through all the sources, think of each object as a complicated line in source code. it finds all the labels, both ones found in the objects and ones not resolved in the object. If it finds the I need an "outside" label in the second file out of 10 files, it has to pass through all 10 files, then go back through the data either on file or in memory to resolve all the forward referenced labels. It wont know on the first occurrence of jmp outside that there was no outside label, on the second pass through is when it finds jmp outside, looks through the list it keeps of found labels (that could be considered a third pass) finds no outside label and declares an error.

Resources