I need some help. I have a project to build an alternative scheduler for freeRTos, with a different algorithm, and try to replace it in the OS.
My questions are:
Is it possible in normal time? (for about few months)
How do I recognize the code of the scheduler in the whole OS code?
Given that FreeRTOS is only a few thousands lines of code it is certainly possible within a few months. If you know how to write a scheduler, of course.
However, FreeRTOS doesn't even have a real scheduler. It maintains a list of runnable tasks, and at every scheduling point (return from interrupt or explicit yield), it takes the highest priority task from that list.
To add more answers to question 2:
Task controls are in tasks.c, portable/port.c contains context switches.
Have a look at the source organization doc; a given function name gives away which file it's defined it. There really isn't too many places where they can be either. Use grep :)
Related
I was designing a custom scheduler for linux-6.0.19 version. I am a beginner in Linux kernel development. I tried to implement it and am currently facing a difficulty that I cannot figure out.
I want to know how the usual Linux scheduler integrations deal with dead tasks. The issue which I am facing right now is when a process terminates, the kernel crashes (dereferenced null pointer).
Let us consider the following situation, suppose a process is currently on the corresponding runqueue substructure (of my scheduler) of the actual runqueue. Now, this program terminates. What happens after this?
Some code should possibly set the p->__state to TASK_DEAD and then is the process dequeued by a call to dequeue_task? Or as per the p->__state, the necessary dequeue of the task shall be done in the pick_next_task_my_sched or put_prev_task_my_sched accordingly? [If some other section of the kernel actually does the dequeue, then our pick_next_task_my_sched should not be concerned about picking up a dead-task. Also, on_rq shall be set to 0 and put_prev_task_my_sched should not be putting it again into the queue.]
In the file /kernel/sched/core.c I found the following mention in the function finish_task_switch:
A task struct has one reference for the use as "current". If a task dies, then it sets TASK_DEAD in tsk->state and calls schedule one last time. The schedule call will never return, and the scheduled task must drop that reference.
I am unable to figure out the code flow, which accomplishes this. Can anyone, point me to the code regions which does this part? Drop that reference : is the task dequeued? Because when I tried to figure out, the code flow, I just saw that certain fields of the cfs related data structures were being reset. I did not find an explicit call to dequeue_task.
I am using FreeRtos and have multiple tasks using the same code at the same priority level. To test my code I pass the same data into each task. When optimization is above -O0 and timeslicing is turned on, there is some sort of problem where the context is not being saved correctly.
My understanding is the each Task has its own stack, and on the context switch from one to another, the stack pointer will be updated accordingly, assuring that each Task stays independent. This isn't happening for me. When I run each task individually, I get one answer, but if I test by running all three tasks, I get one answer correctly and the others are slightly off. There is some sort of crossover of data between the tasks making them not truly independent.
Any idea where this issue could be coming from? I am not using any global variables and my code is reentrant as far as I can tell.
In case anyone runs into this I discovered the problem.
I am running FreeRtos on an Arm Cortex-A9 chip. To prevent processor register corruption a task must not use any floating point registers unless it has a floating point context. In the case of my project, the tasks were not created by default with floating point context.
I added
portTASK_USES_FLOATING_POINT()
to the beginning of my task. That corrected the error and the multitasking works now.
Note that I also had to add this to my UnitTest task that was calling the original three "broken" tasks, as posting to a Queue is error prone as well.
You can see more here: https://www.freertos.org/Using-FreeRTOS-on-Cortex-A-Embedded-Processors.html
and here: https://www.freertos.org/FreeRTOS_Support_Forum_Archive/April_2017/freertos_FreeRtos_native_Floats_and_Task_switching_03b24664j.html
BACKGROUND
I'm integrating micropython into my custom cooperative multitasking OS (no, my company won't change to pre-preemptive)
Micropython uses garbage collection and this takes much more time than my alloted time slice even when there's nothing to collect i.e. I called it twice in a row, timed it and still takes A LOT of time.
OBVIOUS SOLUTION
Yes I could refactor micropython source but then whenever there's a change . . .
IDEAL SOLUTION
The ideal solution would involve calling some function void pause(&func_in_call_stack) that would jump out, leaving the stack intact, all the way to the function that is at the top of the call stack, say main. And resume would . . . resume.
QUESTION
Is it possible, using C and assembly, to implement pause?
UPDATE
As I wrote this, I realize that the C-based exception handling code nlr_push()/nlr_pop() already does most of what I need.
Your question is about implementing context switching. As we've covered fairly exhaustively in comments, support for context switching is among the key characteristics of any multitasking system, and of a multitasking OS in particular. Inasmuch as you posit no OS support for context switching, you are talking about implementing multitasking for a single-tasking OS.
That you describe the OS as providing some kind of task queue ("to relinquish control, a thread must simply exit its run loop") does not change this, though to some extent we could consider it a question of semantics. I imagine that a typical task for such a system would operate by creating and executing a series of microtasks (the work of the "run loop"), providing a shared, mutable memory context to each. Such a run loop could safely exit and later be reentered, to resume generating microtasks from where it left off.
Dividing tasks into microtasks at boundaries defined by affirmative application action (i.e. your pause()) would depend on capabilities beyond those provided by ISO C. Very likely, however, it could be done with the help of some assembly, plus some kind of framework support. You need at least these things:
A mechanism for recording a task's current execution context -- stack, register contents, and maybe other details. This is inherently system-specific.
A task-associated place to store recorded execution context. There are various ways in which such a thing could be established. Promising alternatives include (i) provided by the OS; (ii) provided by some kind of userland multi-tasking system running on top of the OS; (iii) built into the task by the compiler.
A mechanism for restoring recorded execution context -- this, too, will be system-specific.
If the OS does not provide such features, then you could consider the (now removed) POSIX context system as a model interface for recording and restoring execution context. (See makecontext(), swapcontext(), getcontext(), and setcontext().) You would need to implement those yourself, however, and you might want to wrap them to present a simpler interface to applications. Details will be highly dependent on hardware and underlying OS.
As an alternative, you might implement transparent multitasking support for such a system by providing compilers that emit specially instrumented code (i.e. even more specially instrumented than you otherwise need). For example, consider compilers that emit bytecode for a VM of your own design. The VMs in which the resulting programs run would naturally track the state of the program running within, and could yield after each sequence of a certain number of opcodes.
Task
I have a small kernel module I wrote for my RaspBerry Pi 2 which implements an additional system call for generating power consumption metrics. I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it. Otherwise, the call just skips the bulk of its body and returns success.
Background Work
I've read into the issue at length, and I've found a similar question on SO, but there are numerous problems with it, from my perspective (noted below).
Question
The linked question notes that struct task_struct contains a pointer element to struct cred, as defined in linux/sched.h and linux/cred.h. The latter of the two headers doesn't exist on my system(s), and the former doesn't show any declaration of a pointer to a struct cred element. Does this make sense?
Silly mistake. This is present in its entirety in the kernel headers (ie: /usr/src/linux-headers-$(uname -r)/include/linux/cred.h), I was searching in gcc-build headers in /usr/include/linux.
Even if the above worked, it doesn't mention if I would be getting the the real, effective, or saved UID for the process. Is it even possible to get each of these three values from within the system call?
cred.h already contains all of these.
Is there a safe way in the kernel module to quickly determine which groups the user belongs to without parsing /etc/group?
cred.h already contains all of these.
Update
So, the only valid question remaining is the following:
Note, that iterating through processes and reading process's
credentials should be done under RCU-critical section.
... how do I ensure my check is run in this critical section? Are there any working examples of how to accomplish this? I've found some existing kernel documentation that instructs readers to wrap the relevant code with rcu_read_lock() and rcu_read_unlock(). Do I just need to wrap an read operations against the struct cred and/or struct task_struct data structures?
First, adding a new system call is rarely the right way to do things. It's best to do things via the existing mechanisms because you'll benefit from already-existing tools on both sides: existing utility functions in the kernel, existing libc and high-level language support in userland. Files are a central concept in Linux (like other Unix systems) and most data is exchanged via files, either device files or special filesystems such as proc and sysfs.
I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it.
You can't do this in the kernel. Not only is it wrong from a design point of view, but it isn't even possible. The kernel knows nothing about user names. The only knowledge about users in the kernel in that some privileged actions are reserved to user 0 in the root namespace (don't forget that last part! And if that's new to you it's a sign that you shouldn't be doing advanced things like adding system calls). (Many actions actually look for a capability rather than being root.)
What you want to use is sysfs. Read the kernel documentation and look for non-ancient online tutorials or existing kernel code (code that uses sysfs is typically pretty clean nowadays). With sysfs, you expose information through files under /sys. Access control is up to userland — have a sane default in the kernel and do things like calling chgrp, chmod or setfacl in the boot scripts. That's one of the many wheels that you don't need to reinvent on the user side when using the existing mechanisms.
The sysfs show method automatically takes a lock around the file, so only one kernel thread can be executing it at a time. That's one of the many wheels that you don't need to reinvent on the kernel side when using the existing mechanisms.
The linked question concerns a fundamentally different issue. To quote:
Please note that the uid that I want to get is NOT of the current process.
Clearly, a thread which is not the currently executing thread can in principle exit at any point or change credentials. Measures need to be taken to ensure the stability of whatever we are fiddling with. RCU is often the right answer. The answer provided there is somewhat wrong in the sense that there are other ways as well.
Meanwhile, if you want to operate on the thread executing the very code, you can know it wont exit (because it is executing your code as opposed to an exit path). A question arises what about the stability of credentials -- good news, they are also guaranteed to be there and can be accessed with no preparation whatsoever. This can be easily verified by checking the code doing credential switching.
We are left with the question what primitives can be used to do the access. To that end one can use make_kuid, uid_eq and similar primitives.
The real question is why is this a syscall as opposed to just a /proc file.
See this blogpost for somewhat elaborated description of credential handling: http://codingtragedy.blogspot.com/2015/04/weird-stuff-thread-credentials-in-linux.html
I am aware that one cannot listen for, detect, and perform some action upon encountering context switches on Windows machines via managed languages such as C#, Java, etc. However, I was wondering if there was a way of doing this using assembly (or some other language, perhaps C)? If so, could you provide a small code snippet that gives an idea of how to do this (as I am relatively new to kernel programming)?
What this code will essentially be designed to do is run in the background on a standard Windows UI and listen for when a particular process is either context switched in or out of the CPU. Upon hearing either of these actions, it will send a signal. To clarify, I am looking to detect only the context switches directly involving a specific process, not any context switches. What I ultimately would like to achieve is to be able to notify another machine (via the internet signal) whenever a specific process begins making use of the CPU, as well as when it ceases doing so.
My first attempt at doing this involved simply calculating the CPU usage percentage of the specific process, but this ultimately proved to be too course-grained to catch the most minute calculations. For example, I wrote a test program that simply performed the operation 2+2 and placed the answer inside of an int. The CPU usage method did not pick up on this. Thus, I am looking for something lower level, hence the origin of this question. If there are potential alternatives, I would be more than happy to field them.
There's Event Tracing for Windows (ETW), which you can configure to receive messages about a variety of events occurring in the system.
You should be able to receive messages about thread scheduling events. The CSwitch class of events is for that.
Sorry, I don't know any good ETW samples that you could easily reuse for your task. Read MSDN and look around.
Simon pointed out a good link explaining why ETW can be useful. Very enlightening: http://randomascii.wordpress.com/2012/05/11/the-lost-xperf-documentationcpu-scheduling/
Please see the edits below. In particular #3, ETW appears to be the way to go.
In theory you could install your own trap handler for the old int 2Eh and the new sysenter. However, in practice this isn't going to be as easy anymore as it used to be because of Patchguard (since Vista) and signing requirements. I'm not aware of any other generic means to detect context switches, meaning you'd have to roll your own. All context switches of the OS go through call gates (the aforementioned trap handlers) and ReactOS allows you to peek behind the scenes if you feel uncomfortable with debugging/disassembling.
However, in either case there shouldn't be a generic way to install something like this without kernel mode privileges (usually referred to as ring 0) - anything else would be a security flaw in Windows. I'm not aware of a Windows-supplied method to achieve what you want either.
The book "Undocumented Windows NT" has a pretty good chapter about the exact topic (although obviously targeted at the old int 2Eh method).
If you can live with hooking only certain functions, you may be able to get away with some filter driver(s) or user-mode API hooking. Depends on your exact requirements.
Update: reading your updated question, I think you need to read up on the internals, in particular on the concept of IRQLs (not to be confused with IRQs from DOS times) and the scheduler. The problem is that there can - and usually will - be literally hundreds of context switches every second. However, your watcher process (the one watching for context switches) will, like any user-mode process be preemptable. This means that there is no way for you to achieve real-time signaling or anything close to it, which puts a big question mark on the method.
What is it actually that you want to achieve? The number of context switches doesn't really give you anything. Every single SEH exception will cause a context switch. What is it that you are interested in? Perhaps performance counters cater your needs better?
Update 2: the sheer amount of context switches even for a single thread will be flabbergasting within a single second. So assuming you'd install your own trap handler, you'd still end up (adversely) affecting all other threads on the system (after all you'd catch every context switch and then see whether it's the process/threads you care about and then do your thing or pass it on).
If you could tell us what you ultimately want to achieve, not with the means already pre-defined, we may be able to suggest alternatives.
Update 3: so apparently I was wrong in one respect here. Windows comes with something on board that signals context switches. And ETW can be harnessed to tap into those. Thanks to Simon for pointing out.