MPLAB/XC8 can't jump in ASM? - c

I have a project for the PIC18F25K50 of mixed C and Assembly; most of what I want to do I can easily manage (and must for efficiency) in Assembly, but some parts where I care more about ease of development use C. I actually have a couple of these, and I keep encountering the same issue: I can't use ASM to jump to a label. Every single function to jump - CALL, GOTO, BNC, and so on - will fail if given a label, setting PC to some random-but-consistent value where there are no instructions, causing the program to hang. Using an address works fine: BC $+4 skips the next line.
An example of what does not work is this:
#asm
_waitUS:
GLOBAL _waitUS
waitLoop:
//12 cycles = 1 microsecond:
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
DECFSZ WREG, F, ACCESS
GOTO waitLoop
RETURN
#endasm
void main() {
//DEBUG:
waitUS(6);
}
Now, this may not work overall, and I am begging you to focus on the issue of jumping - this is still in prototyping because I can't even get the function called. The program does compile without issue.
As soon as waitUS(6) is called, the PC jumps from - in my case - 0x7C96 to 0x52. Swapping the C call out for MOVLW 6; CALL _waitUS breaks in exactly the same way.
If I strictly use C for calling/jumping (as I had to in the previous project), it works fine, and figures out where it's going.
I've been searching for an answer to this for a few weeks now, and still haven't seen anyone else with this problem, even though every project I make (including plaintext in notepad, compiling via command line) has the exact same issue. What the heck is up with this?
Edit: Having discovered the program memory view, I was able to get a better idea of what it's doing. The compiler does know where the functions are, and it is trying to jump to the right location. Apparently, CALL just doesn't know where it's going.
Example:
Address 0x7C92 contains CALL 0x2044, 0. That is precisely what it ought to, that is where the desired function starts. However, upon running this instruction, PC is altered to 0x205E, missing half of the function.
Attempting to be clever, I decided to tack on several NOPs to the start of the function after its label, lining the real code up with 0x205E. Unfortunately, it seems any change alters where its unpredictable jumping will land, and it then landed at 0x2086 instead.
Incidentally, when it starts running at random places, it will often run across a GOTO - and it will jump to the specified location as intended. This only works within the same function, as trying to use GOTO instead of CALL ends up in the same incorrect location, despite what the compiled result demands.

The PDF document at http://ww1.microchip.com/downloads/en/DeviceDoc/33014K.pdf has many examples on how to code the PIC18.
Here is one such example:
RST CODE 0x0 ;The code section named RST
;is placed at program memory
;location 0x0. The next two
;instructions are placed in
;code section RST.
pagesel start ;Jumps to the location labelled
goto start ;’start’.
PGM CODE ;This is the beginning of the
;code section named PGM. It is
;a relocatable code section
;since no absolute address is
;given along with directive CODE.
start
movlw D'10'
movwf delay_value
xorlw 0x80
call delay
goto start
end

Related

How to find all the reachable labels in assembly files?

I'm working on programming a tool which aimed to separate assembly codes into different sections and labels. I'm trying to add a recursive mode.
If i'd like to print codes of one specific label and codes in the label content symbols of other labels, recursive mode should print labels referred to at the same time.
For Example:
.file sample.s
...
A:
...
call B
...
B:
...
C:
...
For codes above, if i'd like to print codes in label A on recursive mode, codes in label A and B should be printed at the same time.
To do this, i have to find all the label reference symbol for each line.
Some of instructions may be important like call, lea, jmp. But it's not easy to list all the conditions.
Any ideas? Thanks for your help!
So you want to print all code reachable from a given label, except by returning further up the call tree? (i.e. all other basic blocks of this function, all child functions, and tail-call siblings).
The normal / simplest way for execution to get from one label to another is so simply fall through. Like
mov ecx, 123
looptop: ; do {
...
dec ecx
jnz looptop ; }while(--ecx)
Unless the last instruction before the next label is an unconditional jump (like jmp or ret, but not call which can eventually return), you should also be following execution into that next block. A ret should end processing, jmp could be followed if you want, jnz might fall through.
For conditional branches, you presumably need to follow both sides.
Trying to trace through indirect jumps after code loads a function-pointer into a register with a RIP-relative LEA or a MOV is probably too hard. Do you really want to be able to trace foo(callback_func, 123) and be able to print the code for foo and the code it might call at callback_func?
If the arg is passed in a register (like x86-64 calling conventions) and it doesn't store it to the stack and reload it, then it's fairly easy to match that up with a jmp rdi after seeing there have been no intervening writes to RDI in between. But if it is more complex, like a debug built storing RDI to the stack and reloading somewhere else, you basically need an x86-64 simulator to trace the values.
I think it might be better to not even attempt tracing through indirect jumps, rather than having something that sometimes works (simple cases), sometimes doesn't. So probably you should forget about lea, unless you're thinking about dumping data declarations for static data referenced with LEA or MOV.
Some int 0x80 or syscall are noreturn (e.g. _exit, or sigreturn), but most aren't. The behaviour depends on the RAX/EAX value (and on the OS). Usually EAX gets set pretty soon before a system call, so you might want to special case the noreturn ones, otherwise you'll fall through past an exit into other code that shouldn't necessarily execute.
Same applies for library function calls like call exit.

Custom Bootloader for Kinetis MKE06Z microcontrollers on IAR EWARM issue

First I'd like to introduce myself, as I'm new to the site. I'm an Electronic Engineer, specialized in embedded systems design and development. I've been gathering info from the site for a long time, and I think that there's a lot of people with great deal of knowledge. I'm hoping some other of you may have stumbled upon this or a similar issue.
I've been having some trouble in the implementation of a custom bootloader for a Kinetis MKE06Z microcontroller, not in the bootloader itself but in the relocation of the application code and the behavior after jumping to it. The application is completely coded in C.
The bootloader executes everything as expected, determines if it should run or jump to user application. This is the sequence that implements the jump:
__disable_interrupt();
SCB->VTOR = RELOCATION_VECTOR_ADDR & 0x3FFFFE00;
JumpToUserApplication(RELOCATION_VECTOR_ADDR);
where:
void JumpToUserApplication(uint32_t userStartup)
{
/* set up stack pointer */
asm("LDR r1, [r0]");
asm("MOV r13, r1");
/* jump to application reset vector */
asm("ADDS r0,r0,#0x04 ");
asm("LDR r0, [r0]");
asm("BX r0");
}
as implemented in Frescale's AN4767.
So far, so good. Once the jump is executed, I trace the application behavior (on the Disassembly Window) and find out after some instructions, it gets stuck at some specific address with a jump instruction, which ends up being an infinite loop. I then run it step by step to determine which was the instruction that causes this malfunction. It's very strange, as it is running OK and suddenly jumps to a RAM address. A couple of cycles and then jumps to the infinite loop. I took note of the addresses with the instruction causing this strange jump and the one with the infinite loop. I look at the core registers and find out there is an exception, and notice it's the number 0x03 (Hard Fault). Then switch to debugging the user application.
Once in the user application, I start debugging. The user application works fine running like this (no jump from the bootloader). Then I look for the relevant addresses and discover that the routine causing the hard fault when jumping from bootloader is from IAR: __iar_data_init3. The thing is, it's part of a precompiled library and I'm not sure if it's safe to remove it (by removing the __iar_program_start and replacing it directly with the call to main on the startup file.
The real question is: why does the application behave like that after the jump from the booloader but not if there is no such jump? Why does this routine jumps to a RAM address (when it shouldn't)?
Of course, it may be a little to specific, but hopefully there's someone that can help me.
It seems that something IAR does with the linker configuration is not very clear to me, but has something to do with this problem. The thing is I relocated .text segment:
define symbol __ICFEDIT_intvec_start__ = 0x00001800;
define symbol __ICFEDIT_region_ROM_start__ = 0x00002000;
define symbol __ICFEDIT_region_ROM_end__ = 0x0000FFFF;
define region APP_ROM = mem:[from (__ICFEDIT_region_ROM_start__) to (__ICFEDIT_region_ROM_end__)];
place at address mem:__ICFEDIT_intvec_start__ { readonly section .intvec };
place at start of APP_ROM { readonly section .text };
It seems that the linker doesn't appreciate this and something make the app misbehave when jumping from other app. Instead of this, keeping the original .icf file and editing within the GUI only the .intvec_start solved the problem, but code starts right next to the vector table. Not an issue, but I wanted to relocate code a little farther.
Thanks.

Randomizing registers

If certain conditions are not met I want to crash my program by jumping to a random location. I also want to randomize the registers by statements like
asm("rdtsc \n");
asm ("movq %rax, %r15 \n");
...
asm ("xor %rbp, %r13 \n");
...
Is there a better/stealthier method to do this? I am concerned, because rdtsc is not a frequent statement in programs. Calling it continually generates similar results too. Beside this, can I somehow clear/randomize the stack content too?
If you just want to crash, your random choice of destination might jump somewhere legal. Just run the ud2 instruction (0F 0B), which is guaranteed to cause an invalid-instruction exception (leading to SIGILL) on every future x86 CPU. i.e. it's reserved, so no future instruction-set extension will ever use that two-byte sequence at the beginning of an instruction.
If you care about high-quality randomness to frustrate any potential backtrace or core dump, then call a random number generator to fill a buffer of random data (or just one 32bit random value which you repeat). Fill all the registers with that garbage data. In 32bit code, you could use a popa instruction to fill all the registers with that garbage data. In 64bit mode, you have to load them manually.
Then scribble over the stack with that data, so your program eventually stops with a segfault when you try to write to an unmapped address (because you've gone outside the stack area).
You could do that scribbling with a rep stosd or something.
As far as "stealthier", you'll need to be much more elaborate about what your threat model is, and what you're trying to stop anyone from learning / doing. i.e. defend against someone modifying your binary to not crash this way?
In addition to Peter Cordes suggestions, I would add that the OP wants to code responsible for this obfuscation to stay out of scope (stealthier). The instruction causing the crash needs to be somewhere else, otherwise the obfuscation code will be obvious from a crash dump and the code will be easy to patch to remove the bomb.
A rather easy solution is to locate the RET opcode from a common library function such as read or strlen and JUMP there by pushing the address on the stack and executing a RET statement. This solution is not perfect: advanced debuggers exist that store the execution trace and will be able to backtrack to the obfuscator from the crash location. In order to defeat that, you may prefer to enter an infinite loop instead of crashing, but that loop can be easily found and removed.
You can also embed some complex code in your app that computes for a while by executing many different functions in a random manner and use that as a honey pot to jump to from the obfuscator.

Saving registers state in COM program

I disassembled a simple DOS .COM program and there was some code which saves and restores registers values
PUSH AX ; this is the first instruction
PUSH CX
....
POP CX
POP AX
MOV AX, 0x00 0x4C
INT 21 // call DOS interrupt 21 => END
This is very similar to function prologue and epilogue in C programs. But prologues are added automatically by compiler, and the program above was written manually in assembler, so the programmer took full responsibility for saving and restoring values in this code.
My question is what will happen if I unintentionally forgot to save some registers in my program?
And what if I intentionally replace these instructions to NOP in HEX editor? Will this lead to program crash? And why called function is responsible for saving outer context on the stack? From my point of view this should be done somehow in calling function to prevent problems if I use 3rd party libraries and poorly written code which may break my program execution.
One problem of making the calling function save all of its working registers before calling another function is that sometimes a function is interrupted (i.e. a hardware interrupt) without its knowledge. In DOS, for example, there was that pesky 54 millisecond timer tick. 18 times per second, a hardware interrupt would transfer control from whatever code was executing to the timer tick handler. This happened automatically unless your program specifically disabled interrupts.
The timer tick handler would then save all of the registers it was going to use, do its work, and then restore the registers it saved before returning.
Sure, you could say that interrupt handlers are special, but why? Even with the paucity of registers on the 8086 (AX, BX, CX, DX, SI, DI, Flags -- did I forget anything? I purposely didn't include the segment registers), making a function save its entire state before transferring control means that you'd be using a lot of unnecessary stack space and execution cycles to save things because they might be modified. But if the called function is responsible for saving just the registers it uses, and it only uses AX and CX, then it can save just those two registers. It makes for smaller and faster code, and much less stack space usage.
When you start talking about call hierarchies that are many levels deep, the difference between pushing 8 registers rather than 2 registers adds up pretty quickly.
Consider the x86-64, with its 64 general purpose registers. Do you really think a function should be forced to save all 64 of those registers before calling another function, even when the called function only uses two of them? Saving 64 64-bit registers requires 512 bytes of stack space. As opposed to saving two registers requiring only 16 bytes.
The primary point of writing things in assembly language these days is to write faster and smaller code than what a compiler can write. A guiding principle is don't do more work than you have to. That means it's up to you to know what registers your assembly language function is using, and to save those registers on entry and restore them on exit.
If you don't want to guard against forgetting what to push or pop I would advise sticking to a higher level language.
In assembler, if the function is your own then you should save and restore all registers you use within the function except those which return an output from the function. If others wrote the function, look up its documentation. If in doubt, save/restore registers before/after calling the function (except those which are supposed to return a value).
Since the DOS Terminate function does not rely on any register settings (other than AX) for its operation (*) both pushes/pops in the code you have posted seem superfluous. You should however be aware that the programmer could have pushed these values for the purpose of using them locally! So replacing both these pushes by NOP in HEX editor is surely a bad idea. You could however replace both pops by NOP because at that point in the program the restoration of AX/CX as well as balancing the stack are unnecessary because of (*).
Since your question is about saving registers on the program level the answer must be that pushing/popping registers for the sake of saving them is useless. Nothing bad will happen if you unintentionally forgot to save some registers in your program.

Is 2 pass on the source file necessary for assembler and linker?

I heard many times that the assembler and linker need to traverse its input file at least 2 times, is this really necessary? Why cannot it been done in one pass?
The assembler translates a symbolic assembler language into a binary representation.
In the input language (assembler), labels are symbolic too.
In the binary output language they are typically a distance in bytes, relative to the current position or some other fixed point (e.g. jump so many bytes ahead or back).
The first pass just determines the offset from the start of the code or some other fixed point of all assembler instructions to fixate the position of the labels.
This allows to calculate the correct jump distances from branch instructions in the second pass.
One pass assembler would be possible, but you would only be able to jump to labels you already had declared ("bacK") not forward.
One example when this is necessary is when two functions call each other.
int sub_a(int v);
int sub_b(int v);
int sub_a(int v) {
int u = v;
if ( 0 < u ) {
u = sub_b( v - 1 );
}
return u - 1;
}
int sub_b(int v) {
int u = v;
if ( 0 < u ) {
u = sub_a( v - 1 );
}
return u - 1;
}
It is then necessary to do a two-pass scan. As any ordering of the functions will have a dependency on a function that hasn’t been scanned.
it may even take more than two.
here:
...
jmp outside
...
jmp there
...
jmp here
...
there:
In particular for instruction sets that have some form of a near jump and some form of a far jump. The assembler doesnt always want to waste a far jump on every branch/jmp. Take the code above for example when it gets to the jmp here line it knows how many instructions are between the here label and the jump to here instruction. it can make a pretty good estimate if it is going to need to encode that as a near or far jump. Normally the far version of a jump is a case where it takes more bytes to implement causing all the instructions and labels that follow to shift.
When it encounters the jmp there instruction it does not know long or far and has to come
back later on a separate pass (through the data). When it encounters the label there it could go back and look to see if up to this point there has been a reference to it, and patch up that reference. that is another pass through the data, pass 2. or you just make one complete pass through the source code, then start to go back and forth through the data more times.
Lets say the jump outside does not resolve a label. Now depending on the instruction set the assembler has to respond. Some instruction sets, lets say the msp430 where a far jump simply means an absolute address in memory, all of memory space, no segments or nothing like that. Well you could simply assume a far jump and leave the address for the linker to fill in later. some instruction sets like ARM you have to allocate some memory, within near
reach of the instruction. often hiding things behind unconditional branches (this can be a bad thing and fail). Basically you need to allocate a place where the whole address to the
external item can be referenced, encode the instruction to load from that near memory location and let the linker fill in the address later.
Back to here and there. What if on the first pass you assumed that all of the unknown jumps were near and on the first pass computed addresses based on that. And if on that pass here was exactly 128 bytes from the jmp here instruction for an instruction set that has a reach of only 128 bytes. So you assume jmp here is also near, and to make this painful what if when there was found jump there to there was 127 bytes which was your maximum near jump forward. But outside is not found! it has to be far, so you need to burn some more bytes, now the here to jmp here is too far it needs to be more bytes, now the jmp there is too far and it needs to be more bytes. How many passes through the data did it take to figure those three things out? More than two. One pass to start. the second pass marks outside as far, the assumption has jmp there as near on the second pass, when it gets to jmp here it discovers that has to be a far jump causing the there address to change. The third pass it discovers that jmp there needs to be far and that affects everything after that instruction. For this simple code that is it everything is resolved.
think about a bubble sort. you keep looping through the data doing swaps until you have a flag that says, I made no changes on that last pass, indicating everything is resolved, we are done. You have to play the same game with an assembler. For instruction sets like ARM you need to do things like try to find places to tuck away addresses and constants/immediates that dont encode into a single instruction. That is if the assembler
wants to do that work for you. You could easily declare an error and say the destination
is too far for the instruction chosen. Arm assemblers allow you to be lazy and do things like:
ldr r0,=0x1234567
...
ldr r1,=lab7
...
lab7:
The assembler looks at that = and knows it has to determine, can I encode that constant/immediate in the instruction (changing your ldr to a mov for you) or do I need to find a place wedged in your code to place the word, and then encode the instruction with a
near address offset.
Even without dealing with near and far, simply resolving addresses, the outside, there, here example above takes two passes. first pass reads everything, jump here happens to know where here is on the first pass. but you have to make a second pass through the program (not necessarily from the disk, can keep the info in memory) there might be a jump to here that preceeds the here: label. the second pass will find the jump outside and know there is no outside label in the program marking it on the second pass as unresolved or external depending on the rules of the assembler. The second pas resolves the jump there as being a known label, and the second pass doesnt mess with the jump here because it resolved it on
the first pass. This is your classic two pass problem/solution.
The linker has the same problem, it has to pass through all the sources, think of each object as a complicated line in source code. it finds all the labels, both ones found in the objects and ones not resolved in the object. If it finds the I need an "outside" label in the second file out of 10 files, it has to pass through all 10 files, then go back through the data either on file or in memory to resolve all the forward referenced labels. It wont know on the first occurrence of jmp outside that there was no outside label, on the second pass through is when it finds jmp outside, looks through the list it keeps of found labels (that could be considered a third pass) finds no outside label and declares an error.

Resources