As I understood before, each process has its own address space called vitual address space or program memory,
and every process has a location called stack which is used to store local variables and parameters of a function.
Also, when an exception occurs the processor (say an ARM cortex-A) switches to privileged mode and then branches to the exception handler.
According to what I understood, most applications run in non-privileged user mode, and this mode has a special register called stack pointer to hold the address of top of the stack; but this is a single register and can't actually hold the address of top of the stack of several processes at the same time. Would you please explain what actually happens?
As for all registers, it's saved and put away in a data structure associated to the process once the OS decides it's time for another process to run ("context switch"); it's as if it took a snapshot of the current processor state.
When the process is scheduled again, all registers are restored (including the instruction pointer) and execution resumes as if nothing happened.
According to what I understood, most applications run in non-privileged user mode, and this mode has a special register called stack pointer to hold the address of top of the stack
The stack pointer is not specific of user mode, the processor always has (and can use) it, regardless of the mode.
Related
I m trying to follow a tutorial implementinng task schedular in stm32f407 discovery board.
There are four functions which will be executed one at time for 1ms each and then switch to next function.
Tutorial defined the whole flow like, we will save each stack register of each function, namely these register xpsr,pc,lr,R0...R13 and then loading this value of the next function to PSP (processor stack pointer) at time of context switching (this is will happen inside systick_handler which will get trigger at 1ms interval).
What I dont understand is, I thought the registers are global and not private like variables inside a function.So how is he saving these register value for each function. This is the given code https://github.com/niekiran/CortexMxProgramming/blob/master/Source_code/015_task_scheduler/Src/main.c if anyone can brief me about the context switching part only a bit then I will be very much confident about what I m doing
Thank You
Imagine you could take a photograph of the CPU at some point in time, and that the photograph could show you the individual 1s and 0s in the CPU at that instant. If you had a way to restore the 1s and 0s from your photograph back into the CPU at some point in the future, and you could then let the CPU run, then assuming RAM and ROM contents were unaltered it would continue doing what it had been doing at the point the photograph was taken.
This is essentially what the context switch is doing. It is saving all of the "volatile context" of the CPU: the contents of all of the general purpose registers (including the program counter which tells it which instruction it was executing, roughly speaking, and the stack pointer) as well as the processor status register (PSR). This is sufficient information to allow the CPU to resume again from this exact point at some future time.
On the Cortex-M, there are two stack pointers, and these exist to make this process easier. One or the other of them is always accessible as sp (r13). The way this example is configured, handler-mode code uses the MSP (main stack pointer) and thread-mode code uses the PSP (process stack pointer). The registers r0-r3, r12, lr (r14), pc (r15) and the PSR are pushed to the active stack on entry to handler mode. That just leaves r4-r11 and the stack pointer (r13 in thread mode, but now accessed via the special-purpose PSP register because the handler is using the MSP).
So the context switch grabs the value of PSP, and then pushes r4-r11 to the task's own stack before saving the updated value of the task's stack pointer in its task control block. Now the entire volatile context of the CPU at the point where it entered handler mode has been saved to the stack of the task that was running, and the stack pointer has been saved in the TCB. All that remains is to find a new task to run, get its stack pointer out of its TCB, use it to pop r4-r11, and then update PSP before returning. On exit from handler mode, r0-r3, r12, lr, pc and the PSR will all be popped automatically by the hardware.
So yes, the registers are 'global', kind of, in that the same registers are used by every task. But when a task isn't running, the contents of those registers are stored on its stack, and restored back into the registers when it is next ready to run. That's the purpose of a context switch.
If I were to use pthreads in POSIX environments, and a context switch is about to happen, the current value of the esp register has to be stored somewhere so I can retrieve it when I context switch back to this thread, as the esp register's value will be overwritten by another thread's saved SP value. I think it is impossible to have separate esp register for every thread (correct me if I am wrong). Having said that, I would like to know in what data structure the SP value of the current thread is stored right before the context switching is hit?
I tried looking up the struct pthread*'s value casted from the value of pthread_t, but nothing was changing when, say, I call a certain function to change the current SP of the thread I am testing (i.e. compare before and after calling the testing function).
This depends entirely upon how the Posix library is implemented. If the threads are implemented by the OS, the values of all registers are stored in the thread's [process] context block before a context switch.
if the thread are implemented in a library, the registers have to be stored in some data structure managed by the library. Such a library implementation needs to save all the general registers but does not (and cannot) need to save the process-specific kernel registers.
For a mono threaded program, I want to check whether or not a given virtual address is in the process's stack. I want to do that inside the process which is written in C.
I am thinking of reading /proc/self/maps to find the line labelled [stack] to get start and end address for my process's stack. Thinking about this solution led me to the following questions:
/proc/self/maps shows a stack of 132k for my particular process and the maximum size for the stack (ulimit -s) is 8 mega on my system. How does Linux know that a given page fault occurring because we are above the stack limit belongs to the stack (and that the stack must be made larger) rather than that we are reaching another memory area of the process ?
Does Linux shrink back the stack ? In other words, when returning from deep function calls for example, does the OS reduce the virtual memory area corresponding to the stack ?
How much virtual space is initially allocated for the stack by the OS ?
Is my solution correct and is there any other cleaner way to do that ?
Lots of the stack setup details depend on which architecture you're running on, executable format, and various kernel configuration options (stack pointer randomization, 4GB address space for i386, etc).
At the time the process is exec'd, the kernel picks a default stack top (for example, on the traditional i386 arch it's 0xc0000000, i.e. the end of the user-mode area of the virtual address space).
The type of executable format (ELF vs a.out, etc) can in theory change the initial stack top. Any additional stack randomization and any other fixups are then done (for example, the vdso [system call springboard] area generally is put here, when used). Now you have an actual initial top of stack.
The kernel now allocates whatever space is needed to construct argument and environment vectors and so forth for the process, initializes the stack pointer, creates initial register values, and initiates the process. I believe this provides the answer for (3): i.e. the kernel allocates only enough space to contain the argument and environment vectors, other pages are allocated on demand.
Other answers, as best as I can tell:
(1) When a process attempts to store data in the area below the current bottom of the stack region, a page fault is generated. The kernel fault handler determines where the next populated virtual memory region within the process' virtual address space begins. It then looks at what type of area that is. If it's a "grows down" area (at least on x86, all stack regions should be marked grows-down), and if the process' stack pointer (ESP/RSP) value at the time of the fault is less than the bottom of that region and if the process hasn't exceeded the ulimit -s setting, and the new size of the region wouldn't collide with another region, then it's assumed to be a valid attempt to grow the stack and additional pages are allocated to satisfy the process.
(2) Not 100% sure, but I don't think there's any attempt to shrink stack areas. Presumably normal LRU page sweeping would be performed making now-unused areas candidates for paging out to the swap area if they're really not being re-used.
(4) Your plan seems reasonable to me: the /proc/NN/maps should get start and end addresses for the stack region as a whole. This would be the largest your stack has ever been, I think. The current actual working stack area OTOH should reside between your current stack pointer and the end of the region (ordinarily nothing should be using the area of the stack below the stack pointer).
My answer is for linux on x64 with kernel 3.12.23 only. It might or might not apply to aother versions or architectures.
(1)+(2) I'm not sure here, but I believe it is as Gil Hamilton said before.
(3) You can see the amount in /proc/pid/maps (or /proc/self/maps if you target the calling process). However not all of that it actually useable as stack for your application. Argument- (argv[]) and environment vectors (__environ[]) usually consume quite a bit of space at the bottom (highest address) of that area.
To actually find the area the kernel designated as "stack" for your application, you can have a look at /proc/self/stat. Its values are documented here. As you can see, there is a field for "startstack". Together with the size of the mapped area, you can compute the current amount of stack reserved. Along with "kstkesp", you could determine the amount of free stack space or actually used stack space (keep in mind that any operation done by your thread most likely will change those values).
Also note, that this works only for the processes main thread! Other threads won't get a labled "[stack]" mapping, but either use anonymous mappings or might even end up on the heap. (Use pthreads API to find those values, or remember the stack-start in the threads main function).
(4) As explained in (3), you solution is mostly OK, but not entirely accurate.
I want to learn and fill gaps in my knowledge with the help of this question.
So, a user is running a thread (kernel-level) and it now calls yield (a system call I presume).
The scheduler must now save the context of the current thread in the TCB (which is stored in the kernel somewhere) and choose another thread to run and loads its context and jump to its CS:EIP.
To narrow things down, I am working on Linux running on top of x86 architecture. Now, I want to get into the details:
So, first we have a system call:
1) The wrapper function for yield will push the system call arguments onto the stack. Push the return address and raise an interrupt with the system call number pushed onto some register (say EAX).
2) The interrupt changes the CPU mode from user to kernel and jumps to the interrupt vector table and from there to the actual system call in the kernel.
3) I guess the scheduler gets called now and now it must save the current state in the TCB. Here is my dilemma. Since, the scheduler will use the kernel stack and not the user stack for performing its operation (which means the SS and SP have to be changed) how does it store the state of the user without modifying any registers in the process. I have read on forums that there are special hardware instructions for saving state but then how does the scheduler get access to them and who runs these instructions and when?
4) The scheduler now stores the state into the TCB and loads another TCB.
5) When the scheduler runs the original thread, the control gets back to the wrapper function which clears the stack and the thread resumes.
Side questions: Does the scheduler run as a kernel-only thread (i.e. a thread which can run only kernel code)? Is there a separate kernel stack for each kernel-thread or each process?
At a high level, there are two separate mechanisms to understand. The first is the kernel entry/exit mechanism: this switches a single running thread from running usermode code to running kernel code in the context of that thread, and back again. The second is the context switch mechanism itself, which switches in kernel mode from running in the context of one thread to another.
So, when Thread A calls sched_yield() and is replaced by Thread B, what happens is:
Thread A enters the kernel, changing from user mode to kernel mode;
Thread A in the kernel context-switches to Thread B in the kernel;
Thread B exits the kernel, changing from kernel mode back to user mode.
Each user thread has both a user-mode stack and a kernel-mode stack. When a thread enters the kernel, the current value of the user-mode stack (SS:ESP) and instruction pointer (CS:EIP) are saved to the thread's kernel-mode stack, and the CPU switches to the kernel-mode stack - with the int $80 syscall mechanism, this is done by the CPU itself. The remaining register values and flags are then also saved to the kernel stack.
When a thread returns from the kernel to user-mode, the register values and flags are popped from the kernel-mode stack, then the user-mode stack and instruction pointer values are restored from the saved values on the kernel-mode stack.
When a thread context-switches, it calls into the scheduler (the scheduler does not run as a separate thread - it always runs in the context of the current thread). The scheduler code selects a process to run next, and calls the switch_to() function. This function essentially just switches the kernel stacks - it saves the current value of the stack pointer into the TCB for the current thread (called struct task_struct in Linux), and loads a previously-saved stack pointer from the TCB for the next thread. At this point it also saves and restores some other thread state that isn't usually used by the kernel - things like floating point/SSE registers. If the threads being switched don't share the same virtual memory space (ie. they're in different processes), the page tables are also switched.
So you can see that the core user-mode state of a thread isn't saved and restored at context-switch time - it's saved and restored to the thread's kernel stack when you enter and leave the kernel. The context-switch code doesn't have to worry about clobbering the user-mode register values - those are already safely saved away in the kernel stack by that point.
What you missed during step 2 is that the stack gets switched from a thread's user-level stack (where you pushed args) to a thread's protected-level stack. The current context of the thread interrupted by the syscall is actually saved on this protected stack. Inside the ISR and just before entering the kernel, this protected-stack is again switched to the kernel stack you are talking about. Once inside the kernel, kernel functions such as scheduler's functions eventually use the kernel-stack. Later on, a thread gets elected by the scheduler and the system returns to the ISR, it switchs back from the kernel stack to the newly elected (or the former if no higher priority thread is active) thread's protected-level stack, wich eventually contains the new thread context. Therefore the context is restored from this stack by code automatically (depending on the underlying architecture). Finally, a special instruction restores the latest touchy resgisters such as the stack pointer and the instruction pointer. Back in the userland...
To sum-up, a thread has (generally) two stacks, and the kernel itself has one. The kernel stack gets wiped at the end of each kernel entering. It's interesting to point out that since 2.6, the kernel itself gets threaded for some processing, therefore a kernel-thread has its own protected-level stack beside the general kernel-stack.
Some ressources:
3.3.3 Performing the Process Switch of Understanding the Linux Kernel, O'Reilly
5.12.1 Exception- or Interrupt-Handler Procedures of the Intel's manual 3A (sysprogramming). Chapter number may vary from edition to other, thus a lookup on "Stack Usage on Transfers to Interrupt and Exception-Handling Routines" should get you to the good one.
Hope this help!
Kernel itself have no stack at all. The same is true for the process. It also have no stack. Threads are only system citizens which are considered as execution units. Due to this only threads can be scheduled and only threads have stacks. But there is one point which kernel mode code exploits heavily - every moment of time system works in the context of the currently active thread. Due to this kernel itself can reuse the stack of the currently active stack. Note that only one of them can execute at the same moment of time either kernel code or user code. Due to this when kernel is invoked it just reuse thread stack and perform a cleanup before returning control back to the interrupted activities in the thread. The same mechanism works for interrupt handlers. The same mechanism is exploited by signal handlers.
In its turn thread stack is divided into two isolated parts, one of which called user stack (because it is used when thread executes in user mode), and second one is called kernel stack (because it is used when thread executes in kernel mode). Once thread crosses the border between user and kernel mode, CPU automatically switches it from one stack to another. Both stack are tracked by kernel and CPU differently. For the kernel stack, CPU permanently keeps in mind pointer to the top of the kernel stack of the thread. It is easy, because this address is constant for the thread. Each time when thread enters the kernel it found empty kernel stack and each time when it returns to the user mode it cleans kernel stack. In the same time CPU doesn't keep in mind pointer to the top of the user stack, when thread runs in the kernel mode. Instead during entering to the kernel, CPU creates special "interrupt" stack frame on the top of the kernel stack and stores the value of the user mode stack pointer in that frame. When thread exits the kernel, CPU restores the value of ESP from previously created "interrupt" stack frame, immediately before its cleanup. (on legacy x86 the pair of instructions int/iret handle enter and exit from kernel mode)
During entering to the kernel mode, immediately after CPU will have created "interrupt" stack frame, kernel pushes content of the rest of CPU registers to the kernel stack. Note that is saves values only for those registers, which can be used by kernel code. For example kernel doesn't save content of SSE registers just because it will never touch them. Similarly just before asking CPU to return control back to the user mode, kernel pops previously saved content back to the registers.
Note that in such systems as Windows and Linux there is a notion of system thread (frequently called kernel thread, I know it is confusing). System threads a kind of special threads, because they execute only in kernel mode and due to this have no user part of the stack. Kernel employs them for auxiliary housekeeping tasks.
Thread switch is performed only in kernel mode. That mean that both threads outgoing and incoming run in kernel mode, both uses their own kernel stacks, and both have kernel stacks have "interrupt" frames with pointers to the top of the user stacks. Key point of the thread switch is a switch between kernel stacks of threads, as simple as:
pushad; // save context of outgoing thread on the top of the kernel stack of outgoing thread
; here kernel uses kernel stack of outgoing thread
mov [TCB_of_outgoing_thread], ESP;
mov ESP , [TCB_of_incoming_thread]
; here kernel uses kernel stack of incoming thread
popad; // save context of incoming thread from the top of the kernel stack of incoming thread
Note that there is only one function in the kernel that performs thread switch. Due to this each time when kernel has stacks switched it can find a context of incoming thread on the top of the stack. Just because every time before stack switch kernel pushes context of outgoing thread to its stack.
Note also that every time after stack switch and before returning back to the user mode, kernel reloads the mind of CPU by new value of the top of kernel stack. Making this it assures that when new active thread will try to enter kernel in future it will be switched by CPU to its own kernel stack.
Note also that not all registers are saved on the stack during thread switch, some registers like FPU/MMX/SSE are saved in specially dedicated area in TCB of outgoing thread. Kernel employs different strategy here for two reasons. First of all not every thread in the system uses them. Pushing their content to and and popping it from the stack for every thread is inefficient. And second one there are special instructions for "fast" saving and loading of their content. And these instructions doesn't use stack.
Note also that in fact kernel part of the thread stack has fixed size and is allocated as part of TCB. (true for Linux and I believe for Windows too)
I have a simple bootloader, which initializes and prepares SDRAM. Then it loads an application from the Flash and starts it at some address in the RAM. After the application has finished its execution, the system does restart. There is no system stack.
Now, I would like this bootloader receives control back after an application finished its execution. The bootloader (let's call it OS) must also read an application's return code.
How can an application return a value to the calling OS and how the calling OS gets control back? I suppose, it can be done using interrupts - OS has a special resident function joined with some interrupt and every application just calls this interrupt at the end of its own execution. But how can a return code be read by OS if there is no system stack?
Normally you would leave a return code in one or more registers, but since you're in control, you can leave it wherever you like!
When an application is interrupted, the interrupt handling routine needs to save the application's state somewhere, which will probably mean copying from shadow registers to a predefined location in memory.
If an application surrenders control back to the OS (through a software interrupt / sytem call) then you need to define your own calling convention for which registers arguments are placed in, and the event handler needs to follow this before passing control back to the OS. You probably want to make the calling convention match up with that of your c compiler as much as possible, to keep things easy for yourself.
One solution is for the program to write its exit code at a fixed, known location in memory - the "OS" can then read it.