Programmatically Detect Context Switch via Assembly - c

I am aware that one cannot listen for, detect, and perform some action upon encountering context switches on Windows machines via managed languages such as C#, Java, etc. However, I was wondering if there was a way of doing this using assembly (or some other language, perhaps C)? If so, could you provide a small code snippet that gives an idea of how to do this (as I am relatively new to kernel programming)?
What this code will essentially be designed to do is run in the background on a standard Windows UI and listen for when a particular process is either context switched in or out of the CPU. Upon hearing either of these actions, it will send a signal. To clarify, I am looking to detect only the context switches directly involving a specific process, not any context switches. What I ultimately would like to achieve is to be able to notify another machine (via the internet signal) whenever a specific process begins making use of the CPU, as well as when it ceases doing so.
My first attempt at doing this involved simply calculating the CPU usage percentage of the specific process, but this ultimately proved to be too course-grained to catch the most minute calculations. For example, I wrote a test program that simply performed the operation 2+2 and placed the answer inside of an int. The CPU usage method did not pick up on this. Thus, I am looking for something lower level, hence the origin of this question. If there are potential alternatives, I would be more than happy to field them.

There's Event Tracing for Windows (ETW), which you can configure to receive messages about a variety of events occurring in the system.
You should be able to receive messages about thread scheduling events. The CSwitch class of events is for that.
Sorry, I don't know any good ETW samples that you could easily reuse for your task. Read MSDN and look around.
Simon pointed out a good link explaining why ETW can be useful. Very enlightening: http://randomascii.wordpress.com/2012/05/11/the-lost-xperf-documentationcpu-scheduling/

Please see the edits below. In particular #3, ETW appears to be the way to go.
In theory you could install your own trap handler for the old int 2Eh and the new sysenter. However, in practice this isn't going to be as easy anymore as it used to be because of Patchguard (since Vista) and signing requirements. I'm not aware of any other generic means to detect context switches, meaning you'd have to roll your own. All context switches of the OS go through call gates (the aforementioned trap handlers) and ReactOS allows you to peek behind the scenes if you feel uncomfortable with debugging/disassembling.
However, in either case there shouldn't be a generic way to install something like this without kernel mode privileges (usually referred to as ring 0) - anything else would be a security flaw in Windows. I'm not aware of a Windows-supplied method to achieve what you want either.
The book "Undocumented Windows NT" has a pretty good chapter about the exact topic (although obviously targeted at the old int 2Eh method).
If you can live with hooking only certain functions, you may be able to get away with some filter driver(s) or user-mode API hooking. Depends on your exact requirements.
Update: reading your updated question, I think you need to read up on the internals, in particular on the concept of IRQLs (not to be confused with IRQs from DOS times) and the scheduler. The problem is that there can - and usually will - be literally hundreds of context switches every second. However, your watcher process (the one watching for context switches) will, like any user-mode process be preemptable. This means that there is no way for you to achieve real-time signaling or anything close to it, which puts a big question mark on the method.
What is it actually that you want to achieve? The number of context switches doesn't really give you anything. Every single SEH exception will cause a context switch. What is it that you are interested in? Perhaps performance counters cater your needs better?
Update 2: the sheer amount of context switches even for a single thread will be flabbergasting within a single second. So assuming you'd install your own trap handler, you'd still end up (adversely) affecting all other threads on the system (after all you'd catch every context switch and then see whether it's the process/threads you care about and then do your thing or pass it on).
If you could tell us what you ultimately want to achieve, not with the means already pre-defined, we may be able to suggest alternatives.
Update 3: so apparently I was wrong in one respect here. Windows comes with something on board that signals context switches. And ETW can be harnessed to tap into those. Thanks to Simon for pointing out.

Related

Is there a way to have precise timed events in GTK/GLib?

I want to have a function that would run every N milliseconds and i want it to run precisely (relatively, i dont need atomic clock precision).
From what i can see, GLib manual says that g_timeout_add() does not guarantee precision and can be delayed due to other events.
Is there any other way to have precise time events with GTK/GLib? I would rather not use platform specific code, as i want my program to work on both Windows and Linux with as few platform related code changes as possible.
How precise is "not atomic clock"? In the end, timing precision is going to be limited by factors like the platform's context-switching behaviour. Unless you're using customer kernels or specialist hardware, there might not be much you can do about that.
g_timeout_add() is doubly problematic, because it's operation is tangled up with the GTK event handling mechanism, which was never designed for precision.
In the end, your best bets might be either
Use a conventional, signal-based timer (e.g., from setitimer), or
Spawn a new thread and just usleep() a fixed time between actions.
Both these approaches are problematic in GTK, because it's hard to update the user interface from outside the GTK main context thread. Some fairly complicated locking and inter-thread communication is usually required.
If practicable -- and I have no idea whether it would be -- I would suggest delegating the timing part to some separate process, and have the GTK application interact with it using, e.g., sockets.
Without more detail g_usleep would probably be your best bet, but keep in mind that it blocks the current thread so if you want other tasks to proceed in parallel you'll need to spawn a new thread to run it in.

how to jump out of and resume at arbitrary locations in c-code without refactoring

BACKGROUND
I'm integrating micropython into my custom cooperative multitasking OS (no, my company won't change to pre-preemptive)
Micropython uses garbage collection and this takes much more time than my alloted time slice even when there's nothing to collect i.e. I called it twice in a row, timed it and still takes A LOT of time.
OBVIOUS SOLUTION
Yes I could refactor micropython source but then whenever there's a change . . .
IDEAL SOLUTION
The ideal solution would involve calling some function void pause(&func_in_call_stack) that would jump out, leaving the stack intact, all the way to the function that is at the top of the call stack, say main. And resume would . . . resume.
QUESTION
Is it possible, using C and assembly, to implement pause?
UPDATE
As I wrote this, I realize that the C-based exception handling code nlr_push()/nlr_pop() already does most of what I need.
Your question is about implementing context switching. As we've covered fairly exhaustively in comments, support for context switching is among the key characteristics of any multitasking system, and of a multitasking OS in particular. Inasmuch as you posit no OS support for context switching, you are talking about implementing multitasking for a single-tasking OS.
That you describe the OS as providing some kind of task queue ("to relinquish control, a thread must simply exit its run loop") does not change this, though to some extent we could consider it a question of semantics. I imagine that a typical task for such a system would operate by creating and executing a series of microtasks (the work of the "run loop"), providing a shared, mutable memory context to each. Such a run loop could safely exit and later be reentered, to resume generating microtasks from where it left off.
Dividing tasks into microtasks at boundaries defined by affirmative application action (i.e. your pause()) would depend on capabilities beyond those provided by ISO C. Very likely, however, it could be done with the help of some assembly, plus some kind of framework support. You need at least these things:
A mechanism for recording a task's current execution context -- stack, register contents, and maybe other details. This is inherently system-specific.
A task-associated place to store recorded execution context. There are various ways in which such a thing could be established. Promising alternatives include (i) provided by the OS; (ii) provided by some kind of userland multi-tasking system running on top of the OS; (iii) built into the task by the compiler.
A mechanism for restoring recorded execution context -- this, too, will be system-specific.
If the OS does not provide such features, then you could consider the (now removed) POSIX context system as a model interface for recording and restoring execution context. (See makecontext(), swapcontext(), getcontext(), and setcontext().) You would need to implement those yourself, however, and you might want to wrap them to present a simpler interface to applications. Details will be highly dependent on hardware and underlying OS.
As an alternative, you might implement transparent multitasking support for such a system by providing compilers that emit specially instrumented code (i.e. even more specially instrumented than you otherwise need). For example, consider compilers that emit bytecode for a VM of your own design. The VMs in which the resulting programs run would naturally track the state of the program running within, and could yield after each sequence of a certain number of opcodes.

Making process survive failure in its thread

I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Thingsā„¢ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.
The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.
Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...

Is kernel/sched.c/context_switch() guaranteed to be invoked every time a process is switched in?

I want to alter the Linux kernel so that every time the current PID changes - i.e., a new process is switched in - some diagnostic code is executed (detailed explanation below, if curious). I did some digging around, and it seems that every time the scheduler chooses a new process, the function context_switch() is called, which makes sense (this is just from a cursory analysis of sched.c/schedule() ).
The problem is, the Linux scheduler is basically black magic to me right now, so I'd like to know if that assumption is correct. Is it guaranteed that, every time a new process is selected to get some time on the CPU, the context_switch() function is called? Or are there other places in the kernel source where scheduling could be handled in other situations? (Or am I totally misunderstanding all this?)
To give some context, I'm working with the MARSS x86 simulator trying to do some instrumentation and measurement of certain programs. The problem is that my instrumentation needs to know which executing process certain code events correspond to, in order to avoid misinterpreting the data. The idea is to use some built-in message passing systems in MARSS to pass the PID of the new process on every context switch, so it always knows what PID is currently in execution. If anyone can think of a simpler way to accomplish that, that would also be greatly appreciated.
Yes, you are correct.
The schedule() will call context_switch() which is responsible for switching from one task to another when the new process has been selected by schedule().
context_switch() basically does two things. It calls switch_mm() and switch_to().
switch_mm() - switch to the virtual memory mapping for the new process
switch_to() - switch the processor state from the previous process to the new process (save/restore registers, stack info and other architecture specific things)
As for your approach, I guess it's fine. It's important to keep things nice and clean when working with the kernel, and try to keep it relatively easy until you gain more knowledge.

How to trap read write system calls?

Whenever i attempt to write anything on my pendrive, a write system call is generated. What i want to do is, this write call should be trapped and and the user should be requested to input predecided password( which i can define during coding itself).
Please tell me whether this is possible or not? and if yes than how should i do it?
The windows DDK has an example of hooking the file reads/writes/copies in filesys\minifilter, with both pre and post op callbacks, that should have you set for the kernel side of things. For the gui part you'll need something to do a non-blocking spin till the drives signals an event, you'll probably also want a pipe or mapped memory view to pass data around
EasyHook is supposed to give you the ability to hook kernel functions. I have not tried it, so your mileage may vary. Be sure to hook functions cautiously - you may degrade the performance of your machine to a point where it's unusable. What you want is to interact with the user, meaning that you must put the hooked function on hold, and issue a callback into user space. This is probably not an exercise for mere mortals.
At any rate, good luck!

Resources