MSP430F5xxx RTOS restore context assembler not clear - c

I'm trying to port a FunkOS RTOS from MSP430F2xxx to MSP430F5529. I'm using CCS 10.4 with TI v20.2.5 LTS compiler. I ported most of the code but I have problem with the RTOS taking over the control. After I initialize all the tasks I call Task_StartTasks function. My problem is with the assembler part of this function.
void Task_StartTasks(void)
{
Task_SetScheduler(TRUE);
Task_Switch();
// Restore the context...
asm(" mov.w &pstCurrentTask, r12");
asm(" mov.w #r12, r1");
asm(" pop r15");
asm(" pop r14");
asm(" pop r13");
asm(" pop r12");
asm(" pop r11");
asm(" pop r10");
asm(" pop r9");
asm(" pop r8");
asm(" pop r7");
asm(" pop r6");
asm(" pop r5");
asm(" pop r4");
asm(" bic.w #0x00F0, 0(SP)");
asm(" reti");
}
pstCurrentTask is a global pointer to following structure:
typedef struct Task_Struct
{
/*! This is the basic task control block in the RTOS. It contains parameters
and state information required for a task, including stack, priority,
timeouts, entry funcitons, and task pending semaphore.
*/
//--[Task Control Block Entries]-----------------------------------------
WORD *pwTopStack; //!< Pointer to current stack top
WORD *pwStack; //!< Stack pointer, defined by the task.
USHORT usStackSize; //!< Size of the stack in MAU
//--[Task Definitions]---------------------------------------------------
BYTE *pacName; //!< Pointer to the name of the task (ASCII)
TASK_FUNC pfTaskFunc; //!< Pointer to the entry function
UCHAR ucPriority; //!< Task priority
TASK_STATE eState; //!< Current task state
USHORT usTimeLeft; //!< Ticks remaining in blocked/sleep state
BOOL bTimeout; //!< Indicates that an IO operation timed out
struct Task_Struct *pstNext; //!< Pointer to the next task (handled by scheduler)
} TASK_STRUCT;
Task_SetScheduler and Task_Switch make sure that pstCurrentTask is pointing to the correct task structure. As far as I understand this:
asm(" mov.w &pstCurrentTask, r12");
asm(" mov.w #r12, r1");
Moves the value of pstCurrentTask (in this case this is just an address to the structure?) to the R1 which for MSP430 is stack pointer (Why?). Then all registers are popped and the magic happens here.
I don't understand what is going on here:
asm(" bic.w #0x00F0, 0(SP)");
It would be great if someone could explaing the assembler here.

Don't miss the #. The stack pointer is (re)set to pstCurrentTask->pwTopStack (since it's the first field in the structure, dereferencing the pointer to the structure will do the trick without any extra offset needed), presumably the pointer to the original stack was stored here after the registers were pushed and is now put back into place.
Then the registers are popped. At the very end reti causes two more registers to be popped: the status register and the program counter (instruction pointer), the latter resulting in a jump/return to the stored value.
But just before that happens, the bic.w clears some bits in this value on the stack, namely it turns off low-power-mode by clearing CPUOFF, OSCOFF, SCG0, SCG1. It operates on the value that is currently on the top of the stack (dereferencing SP with offset zero) which is the soon-to-be status register. This means that even if the stored status register had those bits set, indicating low power mode, they won't be set anymore when it is popped again as part of reti.
If you were to translate that line to C, this is what it would look like:
SP[0] &= ~0x00f0;
// 0x00f0 comes from (CPUOFF | OSCOFF | SCG0 | SCG1)
Note that the 0x00f0 isn't a peripheral or anything like that. It's just a bitmask used on the status register. In the manual, check chapter 2.3.1 on page 40 (entering and exiting lower power mode). A very similar command is used there, but with a sum of named constants instead of a numerical value, in our case that would be CPUOFF+OSCOFF+SCG0+SCG1 instead of 0x00f0. If you look at chapter 3.2.3 on page 46 (status register) you can see why - those 4 flags are at bits 4-7 of the status register, i.e. their values are 0x0010, 0x0020, 0x0040 and 0x0080 respectively, which when added or ORed together gives you 0x00f0.

Related

Why does the stack frame also store instructions(besides data)? What is the precise mechanism by which instructions on stack frame get executed?

Short version:
0: 48 c7 c7 ee 4f 37 45 mov $0x45374fee, %rdi
7: 68 60 18 40 00 pushq $0x401860
c: c3 retq
How can these 3 lines of instruction(0,7,c), saved in the stack frame, get executed? I thought stack frame only store data, does it also store instructions? I know data is read to registers, but how do these instructions get executed?
Long version:
I am self-studying 15-213(Computer Systems) from CMU. In the Attack lab, there is an instance (phase 2) where the stack frame gets overwritten with "attack" instructions. The attack happens by then overwriting the return address from the calling function getbuf() with the address %rsp points to, which I know is the top of the stack frame. In this case, the top of the stack frame is in turn injected with the attack code mentioned above.
Here is the question, by reading the book(CSAPP), I get the sense that the stack frame only stores data the is overflown from the registers(including return address, extra arguments, etc.). But I don't get why it can also store instructions(attack code) and be executed. How exactly did the content in the stack frame, which %rsp points to, get executed? I also know that %rsp stores the return address of the calling function, the point being it is an address, not an instruction? So exactly by which mechanism does an supposed address get executed as an instruction? I am very confused.
Edit: Here is a link to the question(4.2 level 2):
http://csapp.cs.cmu.edu/3e/attacklab.pdf
This is a post that is helpful for me in understanding: https://github.com/magna25/Attack-Lab/blob/master/Phase%202.md
Thanks for your explanation!
ret instruction gets a pointer from the current position of the stack and jumps to it. If, while in a function, you modify the stack to point to another function or piece of code that could be used maliciously, the code can return to it.
The code below doesn't necessarily compile, and it is just meant to represent the concept.
For example, we have two functions: add(), and badcode():
int add(int a, int b)
{
return a + b;
}
void badcode()
{
// Some very bad code
}
Let's also assume that we have a stack such as the below when we call add()
...
0x00....18 extra arguments
0x00....10 return address
0x00....08 saved RBP
0x00....00 local variables and etc.
...
If during the execution of add, we managed to change the return address to address of badcode(), on ret instruction we will automatically start executing badcode(). I don't know if this answer your question.
Edit:
An instruction is simply an array of numbers. Where you store them is irrelevant (mostly) to their execution. A stack is essentially an abstract data structure, it is not a special place in RAM. If your OS doesn't mark the stack as non-executable, there is nothing stopping the code on the stack from being returned to by the ret.
Edit 2:
I get the sense that the stack frame only stores data that is overflown
from the registers(including return address, extra arguments, etc.)
I do not think that you know how registers, RAM, stack, and programs are incorporated. The sense that stack frame only stores data that is overflown is incorrect.
Let's start over.
Registers are pieces of memory on your CPU. They are independent of RAM. There are mainly 8 registers on a CPU. a, c, d, b, si, di, sp, and bp. a is for accumulator and it generally used for arithmetic operations, likewise b stands for base, c stands for counter, d stands for data, si stands for source, di stands for destination, sp is the stack pointer, and bp is the base pointer.
On 16 bit computers a, b, c, d, si, di, sp, and bp are 16 bits (2 byte). The a, b, c, and d are often shown as ax, bx, cx, and dx where the x stands for extension from their original 8 bit versions. They can also be referred to as eax, ecx, edx, ebx, esi, edi, esp, ebp for 32 bit (e again stands for extended) and rax, rcx, rdx, rbx, rsi, rdi, rsp, rbp for 64 bit.
Once again these are on your CPU and are independent of RAM. CPU uses these registers to do everything that it does. You wanna add two numbers? put one of them inside ax and another one inside cx and add them.
You also have RAM. RAM (standing for Random Access Memory) is a storage device that allows you to access and modify all of its values using equal computation power or time (hence the term random access). Each value that RAM holds also has an address that determines where on the RAM this value is. CPU can use numbers and treat such numbers as addresses to access memory addresses of RAM. Numbers that are used for such purposes are called pointers.
A stack is an abstract data structure. It has a FILO (first in last out) structure which means that to access the first datum that you have stored you have to access all of the other data. To manipulate the stack CPU provides us with sp which holds the pointer to the current position of the stack, and bp which holds the top of the stack. The position that bp holds is called the top of the stack because the stack usually grows downwards meaning that if we start a stack from the memory address 0x100 and store 4 bytes in it, sp will now be at the memory address 0x100 - 4 = 0x9C. To do such operations automatically we have the push and pop instructions. In that sense a stack could be used to store any type of data regardless of the data's relation to registers are programs.
Programs are pieces of structured code that are placed on the RAM by the operating system. The operating system reads program headers and relevant information and sets up an environment for the program to run on. For each program a stack is set up, usually, some space for the heap is given, and instructions (which are the building blocks of a program) are placed in arbitrary memory locations that are either predetermined by the program itself or automatically given by the OS.
Over the years some conventions have been set to standardize CPUs. For example, on most CPU's ret instruction receives the system pointer size amount of data from the stack and jumps to it. Jumping means executing code at a particular RAM address. This is only a convention and has no relation to being overflown from registers and etc. For that reason when a function is called firstly the return address (or the current address in the program at the time of execution) is pushed onto the stack so that it could be retrieved later by ret. Local variables are also stored in the stack, along with arguments if a function has more than 6(?).
Does this help?
I know it is a long read but I couldn't be sure on what you know and what you don't know.
Yet Another Edit:
Lets also take a look at the code from the PDF:
void test()
{
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget there is code for a function touch2 having the following C representation:
void touch2(unsigned val)
{
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get CTARGET to execute the code for touch2 rather than returning to test. In this case,
however, you must make it appear to touch2 as if you have passed your cookie as its argument.
Let's think about what you need to do:
You need to modify the stack of test() so that two things happen. The first thing is that you do not return to test() but you rather return to touch2. The other thing you need to do is give touch2 an argument which is your cookie. Since you are giving only one argument you don't need to modify the stack for the argument at all. The first argument is stored on rdi as a part of x86_64 calling convention.
The final code that you write has to change the return address to touch2()'s address and also call mov rdi, cookie
Edit:
I before talked about RAM being able to store data on addresses and CPU being able to interact with them. There is a secret register on your CPU that you are not able to reach from you assembly code. This register is called ip/eip/rip. It stands for instruction pointer. This register holds a 16/32/64 bit pointer to an address on RAM. this particular address is the address that the CPU will execute in its clock cycle. With that in my we can say that what a ret instruction is doing is
pop rip
which means get the last 64 bits (8 bytes for a pointer) on the stack into this instruction pointer. Once rip is set to this value, the CPU begins executing this code. The CPU doesn't do any checks on rip whatsoever. You can technically do the following thing (excuse me, my assembly is in intel syntax):
mov rax, str ; move the RAM address of "str" into rax
push rax ; push rax into stack
ret ; return to the last pushed qword (8 bytes) on the stack
str: db "Hello, world!", 0 ; define a string
This code can call/execute a string. Your CPU will be very upset tho, that there is no valid instruction there and will probably stop working.

AVR C compilers behavior. Memory management

Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
absoluteAdr = functionDataStartAdr + localShiftOfVariable.
And do they increase data stack point when variable declared by it's length or stack pointer increased in end/start of function for all it's variables lengths.
Let's have a look at avr-gcc, which is freely available including its ABI:
Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
Yes, no, it depends:
Static Storage
For variables in static storage, i.e. variables as defined by
unsigned char func (void)
{
static unsigned char var;
return ++var;
}
the compiler generates a symbol like var.123 with appropriate size (1 byte in this case). The linker / locator will then assign the address.
func:
lds r24,var.1505
subi r24,lo8(-(1))
sts var.1505,r24
ret
.local var.1505
.comm var.1505,1,1
Automatic
Automatic variables are held in registers if possible, otherwise the compiler allocates space in the frame of the function. It may even be the case that variables are optimized out, and in that case they do not exist anywhere in the program:
int add (void)
{
int a = 1;
int b = 2;
return a + b;
}
→
add:
ldi r24,lo8(3)
ldi r25,0
ret
There are 3 types of entities that are stored in the frame of a function, all of which might be present or absent depending on the program:
Callee-saved registers that are saved (PUSH'ed) by the function prologue and restored (POP'ed) by the epilogue. This is needed when local variables are allocated to callee-saved registers.
Space for local variables that cannot be allocated to registers. This happens when the variable is too big to be held in registers, there are too many auto variables, or the address of a variable is taken (and taking the address cannot be optimized out). This is because you cannot take the address of a register1.
void use_address (int*);
void func (void)
{
int a;
use_address (&a);
}
The space for these variables is allocated in the prologue and deallocated in the epilogue. Shrink-wrapping is not implemented:
func:
push r28
push r29
rcall .
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 2 */
/* stack size = 4 */
movw r24,r28
adiw r24,1
rcall use_address
pop __tmp_reg__
pop __tmp_reg__
pop r29
pop r28
ret
In this example, a occupies 2 bytes which are allocated by rcall . (it was compiled for a device with 16-bit program counter). Then the compiler initialized the frame-pointer Y (R29:R28) with the value of the stack pointer. This is needed because on AVR, you cannot access memory via SP; the only memory operations that involve SP are PUSH and POP. Then the address of that variable which is Y+1 is passed in R24. After the call of the function, the epilogue frees the frame and restores R28 and R29.
Arguments that have to be passed on the stack:
void xfunc (int, ...);
void call_xfunc (void)
{
xfunc (42);
}
These arguments are pushed and the callee is picking them up from the stack. These arguments are pushed / popped around the call, but can also be accumulated by means of -maccumulate-args.
call_func:
push __zero_reg__
ldi r24,lo8(42)
push r24
rcall xfunc
pop __tmp_reg__
pop __tmp_reg__
ret
In this example, the argument has to be passed on the stack because the ABI says that all arguments of a varargs function have to be passed on the stack, including the named ones.
For a description on how exactly the frame is being layed out and arguments are being passed, see [Frame Layout and Argument Passing] (https://gcc.gnu.org/wiki/avr-gcc#Frame_Layout).
1 Some AVRs actually allow this, but you never (like in NEVER) want to pass around the address of a general purpose register!
Compilers are not managing the RAM, compilers at compilation time calculate the required size for each data sections like bss, data, text, rodata, .. etc and generate relocatable object file for each translation unit
The linker comes after and generate one object file and assign the relocatable addresses to absolute ones mapped according to the Linker configuration File LCF.
In run time, the mechanism depends on the architecture itself. normally, each function call has a frame in the stack where it's arguments, return address and local variables are defined. the stack extend with a creation of variables and for low cost AVR microcontrollers, there is no memory management protection regarding the stack increase or the overlapping between the stack and another memory section -normally the heap-. even if there is OS managing the protection from the tasks to exceed its allocated stack, without a memory management unit, all what OS can do is to assert a RESET with illegal memory access reason.

moving the vector table on an STM32F405

I want to move my code in the flash memory on an STM32F405.
I changed the linker script to change the start of flash like so:
FLASH (rx) : ORIGIN = 0x08008000, LENGTH = 1024K-32K
If I am correct the vector table will also be located at 0x08008000. I would like to create a bootloader for a start I would like to run my application in the new memory location. Do my bootloader and application have sepperate vector tables? How can I initialize the stack pointer to 0x8008000?
Yes, your bootloader will have a separate vector table to your main code. The last thing your bootloader, or the first thing your main code should do is remap the vector table, using the SCB->VTOR register. The vector table is 4 bytes in from the start of the image, so using your numbers, SCB->VTOR should be 0x08008004. The first 4 bytes of the image are the value the stack pointer should be initialised with.
You don't want to initalise your stack pointer to 0x8008000, that address is in flash and will cause a hard fault as soon as you try to push something, if that is where your application starts then the memory at 0x08008000 contains the address you should use as the stack pointer.
To set it I have always used an asm function which just loads SP with the value passed to the function in R0, something like the following.
SetSP PROC
EXPORT SetSP
MOV SP, R0
BX LR
ENDP
To call from a C context:
extern void SetSP(uint32_t address);
uint32_t sp = *((uint32_t *)0x08008000);
SetSP(sp);
That dereferences a pointer to 0x08008000, to get the initial stack pointer, then sets it.

lpc 1768 Secondary Boot Loader error

I am working on lpc 1768 SBL which includes the following code to jump to user application.
#define NVIC_VectTab_FLASH (0x00000000)
#define USER_FLASH_START (0x00002000)
void NVIC_SetVectorTable(DWORD NVIC_VectTab, DWORD Offset)
{
NVIC_VECT_TABLE = NVIC_VectTab | (Offset & 0x1FFFFF80);
}
void execute_user_code(void)
{
void (*user_code_entry)(void);
/* Change the Vector Table to the USER_FLASH_START
in case the user application uses interrupts */
NVIC_SetVectorTable(NVIC_VectTab_FLASH, USER_FLASH_START);
user_code_entry = (void (*)(void))((USER_FLASH_START)+1);
user_code_entry();
}
It was working without any errors. After adding some heap memory to the code, the machine is stuck. I tried out different values for heap. Some of them are working. After some deep debugging ,I could find out that machine was not stuck when there is a value which is divisible by 64 is at first locations of application bin file.
ie,
When I select heap memory as 0x00002E90 ,it generates stack base as 0x10005240 . Then stack base + stack size(0x2900) gives a value = 0x10007B40.
I found this is loaded at first locations of application bin file. This value is divisible by 64 and the code is running without stuck.
But ,when I select heap memory as 0x00002E88 ,it generates stack base as 0x10005238 . Then stack base + stack size(0x2900) gives a value = 0x10007B38.
This value is not divisible by 64 and the code is stuck.
The disassembly is as follows in this case.
When stepping from address 0x0000 2000 ,it goes to hard fault handler. But in the earlier case it doesn't go to hard fault. It continues and works as well.
I cannot understand the instruction DCW and why it goes to hard fault.
Can anyone tell me the reason behind this?
Executing the vector table is what you do on older ARM7/ARM9 parts (or bigger Cortex-A ones) where the vectors are instructions, and the first entry will be a jump to the reset handler, but on Cortex-M, the vector table is pure data - the first entry is your initial stack pointer, and the second entry is the address of the reset handler - so trying to execute it is liable to go horribly wrong..
As it happens, in this case you can actually get away with executing most of that vector table by sheer chance, because the memory layout leads to each halfword of the flash addresses becoming fairly innocuous instructions:
2: 1000 asrs r0, r0, #32
4: 20d9 movs r0, #217 ; 0xd9
6: 0000 movs r0, r0
8: 20f5 movs r0, #245 ; 0xf5
a: 0000 movs r0, r0
...
Until you eventually bumble through all the remaining NOPs to 0x20d8 where you pick up the real entry point. However, the killer is that initial stack pointer, because thanks to the RAM being higher up, you get this:
0: 7b38 ldrb r0, [r7, #12]
The lower byte of 0x7bxx is where the base register is encoded, so by varying the address you have a crapshoot as to which register that is, and furthermore whether whatever junk value is left in there also happens to be a valid address to load from. Do you feel lucky?
Anyway, in summary: Rather than call the address of the vector table directly, you need to load the second word from it, then call whatever address that contains.

Compiler Error C2432 : illegal reference to 16-bit data in 'identifier'

So I was trying to dump the contents of the Interrupt Vector Table on 32 bit Widows 7 using the following code excerpt. It does not compile with Visual Studio as Visual Studio has probably withdrawn support for 16 Bit compilation. I built it in Pelles C, however the executable would crash when I try to run it. The problem, as I figured from some research over the internet, has to do with the 16 bit register reference (to ES). I do not however clearly understand the issue. I would really appreciate if someone could help me out with getting this to work on win32
#include <stdio.h>
#define WORD unsigned short
#define IDT_001_ADDR 0 // start address of the first IVT vector
#define IDT_255_ADDR 1020 // start address of the last IVT vector
#define IDT_VECTOR_SZ 4 // size of the each IVT vector
int main(int argc, char **argv) {
WORD csAddr; // code segment of given interrupt
WORD ipAddr; // starting IP for given interrupt
short address; // address in memory (0-1020)
WORD vector ; // IVT entry ID (0..255)
vector = 0x0;
printf("n-- -Dumping IVT from bottom up ---n");
printf("Vector\tAddresst\n");
for(address=IDT_001_ADDR; address<=IDT_255_ADDR; address=address+IDT_VECTOR_SZ,vector++) {
printf("%03d\t%08d\t", vector , address);
// IVT starts at bottom of memory, so CS is always 0x0
__asm {
PUSH ES
mov AX, 0
mov ES,AX
mov BX, address
mov AX, ES:[BX]
mov ipAddr ,AX
inc BX
inc BX
mov AX, ES:[BX]
mov csAddr, AX
pop ES
};
printf("[CS:IP] = [%04X,%04X]n" ,csAddr, ipAddr);
}
}
Thanks in advance
The issue with es (or any segment register) is that in real mode (which your dos is "faking" with vm86), the value in the segment register is multiplied by 16 and added to the offset to get a linear address - which is the physical address. In protected mode (your win32) the segment registers are "selectors", an index into an array of structures (descriptors) containing (among other things) a "base" which is added to an offset to get a linear address. The value zero is explicitly the invalid selector, so it crashes. The good news is that the "base" of most segment registers (fs an exception) is zero, so you can address the memory you want without touching es.
The bad news is virtual memory. Paging is enabled, so the linear address calculated by base + offset may not be a physical address. If you're lucky, your OS may have kindly "identity mapped" low memory, so that linear memory equals physical memory. If you're really lucky, your OS may let you at it from user code.
Try removing all references to es and see what happens. The results, if any, would be more recogizable in hex (%x), not decimal. Your best bet might be to do the whole thing in 16-bit and forget win32.

Resources