For a simple sampling profiler I'm suspending a target thread, get its current stacktrace, then continue it.
Now I would like to highlight a sample differently if the thread was in a waiting state.
So I want to know if the thread was blocking (waiting via WaitForSingleObject, pausing via Sleep, ...) at the time it was suspended.
I can get this information via NtQuerySystemInformation(SystemProcessInformation), but that gets much more than needed, the information of each thread of each process.
Also I saw Performance Counters, but I'm not sure if it's possible with this API, if I only have the thread ID/handle.
UPDATE:
IInspectable gave me a hint with Wait Chain Traversal, while it seemed a good fit, it gives back the status ObjectStatus==WctStatusBlocked for all suspended threads, which isn't unreasonable, but doesn't work for my problem. It is also very slow, I assume because it collects the data for the whole chain, while I only care for the first element.
While not exactly what I wanted, QueryThreadCycleTime is close enough.
So each time the thread is suspended, QueryThreadCycleTime is called, which returns the number of CPU clock cycles used by this thread up to this point.
If the difference to the previous call is below a certain limit, the current sample is then considered as waiting.
It's not perfect, the first sample taken while the thread entered a waiting state is not detected as waiting, and the limit might not work for all CPUs the same.
Related
pthread_yield is documented as "causes the calling thread to relinquish the CPU", but on a modern OS/scheduler, the relinquishing of the CPU happens automatically at the appropriate times (i.e. whenever the thread calls a blocking operation, and/or when the thread's quantum has expired). Is pthread_yield() therefore vestigial/useless except in the special case of running under a co-operative-only task scheduler? Or are there some use-cases where calling it would still be correct/useful even under a modern pre-emptive scheduler?
pthread_yield() gives you a chance to do a short sleep -- not a timed sleep. You relinquish the remainder of time slice to some other thread or process, but you don't put the thread in a wait queue.
Also a while ago I read about how schedulers prioritizing interactive processes. These are the processes that user interacts with directly and you feel their sluggishness most (you have less of a feeling of your system being slow if your UI is responsive). One of the properties of interactive processes is that they have little to do and mostly don't use entire time slice. So if a process keeps yielding before its time slice is up you assume it is interactive and you boost its priority. There were exploits that used this trick to effectively use 99% of CPU while showing the offending process as being at 0%.
I want to yield a multithreaded process in Linux. I know a thread can be yielded by calling sched_yield. I guess, on the other hand, the whole process can be yielded by calling sleep(0), since sleep works at process level. Am I right?
sched_yield will yield the thread that is currently running, relinquishing the rest of its timeslice. The processor then context switches to the next thread. Whether that thread is another which belongs to your process is unknown. It could be, it might not be.
To yield the whole process you would therefore need to yield each thread that existed in that process. sleep works similarly. It will sleep for that particular thread, not the whole process.
Wrong.
sleep(3)
sleep() makes the calling thread sleep until seconds seconds have
elapsed or a signal arrives which is not ignored.
EDIT
From the comments I see people use an outdated site for the manual pages. Stop using that site, use the kernel.org pages which should be up-to-date.
I am an embedded programmer attempting to simulate a real time preemptive scheduler in a Win32 environment using Visual Studio 2010 and MingW (as two separate build environments). I am very green on the Win32 scheduling environment and have hit a brick wall with what I am trying to do. I am not trying to achieve real time behaviour - just to get the simulated tasks to run in the same order and sequence as they would on the real target hardware.
The real time scheduler being simulated has a simple objective - always execute the highest priority task (thread) that is able to run. As soon a task becomes able to run - it must preempt the currently running task if it has a priority higher than the currently running task. A task can become able to run due to an external event it was waiting for, or a time out/block time/sleep time expiring - with a tick interrupt generating the time base.
In addition to this preemptive behaviour, a task can yield or volunteer to give up its time slice because is is executing a sleep or wait type function.
I am simulating this by creating a low priority Win32 thread for each task that is created by the real time scheduler being simulated (the thread effectively does the context switching the scheduler would do on a real embedded target), a medium priority Win32 thread as a pseudo interrupt handler (handles simulated tick interrupts and yield requests that are signalled to it using a Win32 event object), and a higher priority Win32 thread to simulate the peripheral that generates the tick interrupts.
When the pseudo interrupt handler establishes that a task switch should occur it suspends the currently executing thread using SuspendThread() and resumes the thread that executes the newly selected task using ResumeThread(). Of the many tasks and their associated Win32 threads that may be created, only one thread that manages the task will ever be out of the suspended state at any one time.
It is important that a suspended thread suspends immediately that SuspendThread() is called, and that the pseudo interrupt handling thread executes as soon as the event telling it that an interrupt is pending is signalled - but this is not the behaviour I am seeing.
As an example problem that I already have a work around for: When a task/thread yields the yield event is latched in a variable and the interrupt handling thread is signalled as there is a pseudo interrupt (the yield) that needs processing. Now in a real time system as I am used to programming I would expect the interrupt handling thread to execute immediately that it is signalled because it has a higher priority than the thread that signals it. What I am seeing in the Win32 environment is that the thread that signals the higher priority thread continues for some time before being suspended - either because it takes some time before the signalled higher priority thread starts to execute or because it takes some time for the suspended task to actually stop running - I'm not sure which. In any case this can easily be correct by making the signally Win32 thread block on a semaphore after signalling the Win32 interrupt handling thread, and have the interrupt handling Win32 thread unblock the thread when it has finished its function (handshake). Effectively using thread synchronisation to force the scheduling pattern to what I need. I am using SignalObjectAndWait() for this purpose.
Using this technique the simulation works perfectly when the real time scheduler being simulated is functioning in co-operative mode - but not (as is needed) in preemptive mode.
The problem with preemptive task switching is I guess the same, the task continues to execute for some time after it has been told to suspend before it actually stops running so the system cannot be guaranteed to be left in a consistent state when the thread that runs the task suspends. In the preemptive case though, because the task does not know when it is going to happen, the same technique of using a semaphore to prevent the Win32 thead continuing until it is next resumed cannot be used.
Has anybody made it this far down this post - sorry for its length!
My questions then are:
How I can force Win32 (XP) scheduling to start and stop tasks immediately that the suspend and resume thread functions are called - or - how can I force a higher priority Win32 thread to start executing immediately that it is able to do so (the object it is blocked on is signalled). Effectively forcing Win32 to reschedule its running processes.
Is there some way of asynchronously stopping a task to wait for an event when its not in the task/threads sequential execution path.
The simulator works well in a Linux environment where POSIX signals are used to effectively interrupt threads - is there an equivalent in Win32?
Thanks to anybody who has taken the time to read this long post, and especially thanks in advance to anybody that can hold my 'real time engineers' hand through this Win32 maze.
If you need to do your own scheduling, then you might consider using fibers instead of threads. Fibers are like threads, in that they are separate blocks of executable code, however fibers can be scheduled in user code whereas threads are scheduled by the OS only. A single thread can host and manage scheduling of multiple fibers, and fibers can even schedule each other.
Firstly, what priority values are you using for your threads?
If you set the high priority thread to THREAD_PRIORITY_TIME_CRITICAL it should run pretty much immediately --- only those threads associated with a real-time process will have higher priority.
Secondly, how do you know that the suspend and resume aren't happening immediately? Are you sure this is the problem?
You cannot force a thread to wait on something from outside without suspending the thread to inject the wait code; if SuspendThread isn't working for you then this isn't going to help.
The closest to a signal is probably QueueUserAPC, which will schedule a callback to run the next time the thread enters an "alertable wait state", e.g. by calling SleepEx or WaitForSingleObjectEx or similar.
#Anthony W - thanks for the advice. I was running the Win32 threads that simulated the real time tasks at THREAD_PRIORITY_ABOVE_NORMAL, and the threads that ran the pseudo interrupt handler and the tick interrupt generator at THREAD_PRIORITY_HIGHEST. The threads that were suspended I was changing to THREAD_PRIORITY_IDLE in case that made any difference. I just tried your suggestion of using THREAD_PRIORITY_TIME_CRITICAL but unfortunately it didn't make any difference.
With regards to your question am I sure that the suspend and resume not happening immediately is the problem - well no I'm not. It is my best guess in an environment I am unfamiliar with. My thinking regarding the failure of suspend and resume to work immediately stems from my observation when a task yields. If I make the call to yield (signal [using a Win32 event] a higher priority Win32 thread to switch to the next real time task) I can place a break point after the yield and that gets hit before a break point in the higher priority thread. It is unclear whether a delay in signalling the event and the higher priority task running, or a delay in suspending the thread and the thread actually stopping running was causing this - but the behaviour was definitely observed. This was fixed using a semaphore handshake, but that cannot be done for preemptions caused by tick interrupts.
I know the simulation is not running as I expect because a set of tests that check the sequence of scheduling of real time tasks is failing. It is always possible the scheduler has a problem, or the test has a problem, but the test will run for weeks without failing on a real real time target so I'm inclined to think the test and the scheduler are ok. A big difference is on the real time target the tick frequency is 1 ms, whereas on the Win32 simulated target it is 15ms with quite a lot of variation even then.
#Remy - I have done quite a bit of reading about fibers today, and my conclusion is that for simulating the scheduler in cooperative mode they would be perfect. However, as far as I can see they can only be scheduled by the fibers themselves calling the SwitchToFiber() function. Can a thread be made to block on a timer or sleep so it runs periodically, effectively preempting the fiber that was running at the time? From what I have read the answer is no because blocking one fiber will block all fibers running in the thread. If it could be made to work, could the periodically executing fiber then call the SwitchToFiber() function to select the next fiber to run before again sleeping for a fixed period? Again I think the answer is no because once it switches to another fiber it will no longer be executing and so will not actually call the Sleep() function until the next time the executing fiber switches back to it. Please correct my logic here if I have got the wrong idea of how fibers work.
I think it could work if the periodic functionality could remain in its own thread, separate from the thread that executed the fibers - but (again from what I have read) I don't think a one thread can influence the execution of fibers running in a different thread. Again I would be grateful if you could correct my conclusions here if they are wrong.
[EDIT] - simpler than the hack below - it seems just ensuring all the threads run on the same CPU core also fixes the problem :o) After all that. The only problem then is the CPU runs at nearly 100% and I'm not sure if the heat is damaging to it.
[/EDIT]
Ahaa! I think I have a work around for this - but its ugly. The uglyness is kept in the port layer though.
What I do now is store the thread ID each time a thread is created to run a task (a Win32 thread is created for each real time task that is created). I then added the function below - which is called using trace macros. The trace macros can be defined to do whatever you want, and have proven very useful in this case. The comments in the code below explain. The simulation is not perfect, and all this does is correct the thread scheduling when it has already deviated from the real time scheduling whereas I would prefer it not to go wrong in the first place, but the positioning of the trace macros makes the code containing this solution pass all the tests:
void vPortCheckCorrectThreadIsRunning( void )
{
xThreadState *pxThreadState;
/* When switching threads, Windows does not always seem to run the selected
thread immediately. This function can be called to check if the thread
that is currently running is the thread that is responsible for executing
the task selected by the real time scheduler. The demo project for the Win32
port calls this function from the trace macros which are seeded throughout
the real time kernel code at points where something significant occurs.
Adding this functionality allows all the standard tests to pass, but users
should still be aware that extra calls to this function could be required
if their application requires absolute fixes and predictable sequencing (as
the port tests do). This is still a simulation - not the real thing! */
if( xTaskGetSchedulerState() != taskSCHEDULER_NOT_STARTED )
{
/* Obtain the real time task to Win32 mapping state information. */
pxThreadState = ( xThreadState * ) *( ( unsigned long * ) pxCurrentTCB );
if( GetCurrentThreadId() != pxThreadState->ulThreadId )
{
SwitchToThread();
}
}
}
Looks like linux doesnt implement pthread_suspend and continue, but I really need em.
I have tried cond_wait, but it is too slow. The work being threaded mostly executes in 50us but occasionally executes upwards of 500ms. The problem with cond_wait is two-fold. The mutex locking is taking comparable times to the micro second executions and I don't need locking. Second, I have many worker threads and I don't really want to make N condition variables when they need to be woken up.
I know exactly which thread is waiting for which work and could just pthread_continue that thread. A thread knows when there is no more work and can easily pthread_suspend itself. This would use no locking, avoid the stampede, and be faster. Problem is....no pthread_suspend or _continue.
Any ideas?
Make the thread wait for a specific signal.
Use pthread_sigmask and sigwait.
Have the threads block on a pipe read. Then dispatch the data through the pipe. The threads will awaken as a result of the arrival of the data they need to process. If the data is very large, just send a pointer through the pipe.
If specific data needs to go to specific threads you need one pipe per thread. If any thread can process any data, then all threads can block on the same pipe and they will awaken round robin.
It seems to me that such a solution (that is, using "pthread_suspend" and "pthread_continue") is inevitably racy.
An arbitrary amount of time can elapse between the worker thread finishing work and deciding to suspend itself, and the suspend actually happening. If the main thread decides during that time that that worker thread should be working again, the "continue" will have no effect and the worker thread will suspend itself regardless.
(Note that this doesn't apply to methods of suspending that allow the "continue" to be queued, like the sigwait() and read() methods mentioned in other answers).
May be try an option of pthread_cancel but be careful if any locks to be released,Read the man page to identify cancel state
Why do you care which thread does the work? It sounds like you designed yourself into a corner and now you need a trick to get yourself out of it. If you let whatever thread happened to already be running do the work, you wouldn't need this trick, and you would need fewer context switches as well.
I spent a good long while looking for info on the differences between time.h::sleep() and pthread.h::pthread_yield() but was unable to find any solid reference material and so I am posting this question.
What is the difference between time.h::sleep() and pthread.h::pthread_yield()?
Update:
The reason I ask is because I was using sleep() to sleep() each individual thread... and my application started having issues when there was 8 threads vs 4 threads. When I went online to see if sleep() only affects each thread, I couldn't find any good reference stating whether Sleep() affects the entire process OR sleep() only affects the individual thread.
From pthread_yield:
The pthread_yield subroutine forces the calling thread to relinquish use of its processor, and to wait in the run queue before it is scheduled again. If the run queue is empty when the pthread_yield subroutine is called, the calling thread is immediately rescheduled.
From the sleep manpage:
sleep() makes the calling process sleep until seconds seconds have elapsed or a signal arrives which is not ignored.
If you don't want to have a real time delay in your threads and just want to allow other threads to do their work, then pthread_yield is better suited than sleep.
sleep() causes your program to stop executing for a certain length of time. No matter what else happens on the system, your thread will not start again until at least the length of time passed to sleep() has elapsed. pthread_yield() notifies the operating system that your thread is done working, and that it can switch execution to another thread. However, if there is no higher-priority thread that needs to do work at that time, your thread may start again immediately.
IOWs, after sleep() your thread is guaranteed to stop running even if no one else needs to run, while pthread_yield() is just a polite way to give other threads a chance to run if they need to.
Update in response to question update: both sleep() and pthread_yield() affect only the calling thread.
sleep(s) takes the current thread of execution and suspends it until s seconds have passed (or it is woken up by a signal.)
In more practical terms, when you call sleep(), that thread will cease execution and just... wait until the specified time has passed. Once it passes, that thread is placed into the ready queue.
pthread_yield() says "take this thread, and put it into the ready queue." Your thread will stop execution and be in the 'waiting' state to be selected/run by the scheduler. This does not gaurantee that your thread will not immediately resume running. But it gives another thread a chance to run at a given point in its execution.
I am going to go out on a limb and say that sleep(0) would accomplish the same thing as a pthread_yield() - both stopping execution and placing the thread in the ready queue.