I do understand what an APC is, how it works, and how Windows uses it, but I don't understand when I (as a programmer) should use QueueUserAPC instead of, say, a fiber, or thread pool thread.
When should I choose to use QueueUserAPC, and why?
QueueUserAPC is a neat tool that can often be a shortcut for some tasks that are otherwise handled with synchronization objects. It allows you to tell a particular thread to do something whenever it is convenient for that thread (i.e. when it finishes its current work and starts waiting on something).
Let's say you have a main thread and a worker thread. The worker thread opens a socket to a file server and starts downloading a 10GB file by calling recv() in a loop. The main thread wants to have the worker thread do something else in its downtime while it is waiting for net packets; it can queue a function to be run on the worker while it would otherwise be waiting and doing nothing.
You have to be careful with APCs, because as in the scenario I mentioned you would not want to make another blocking WinSock call (which would result in undefined behavior). You really have to be watching in order to find any good uses of this functionality because you can do the same thing in other ways. For example, by having the other thread check an event every time it is about to go to sleep, rather than giving it a function to run while it is waiting. Obviously the APC would be simpler in this scenario.
It is like when you have a call desk employee sitting and waiting for phone calls, and you give that person little tasks to do during their downtime. "Here, solve this Rubik's cube while you're waiting." Although, when a phone call comes in, the person would not put down the Rubik's cube to answer the phone (the APC has to return before the thread can go back to waiting).
QueueUserAPC is also useful if there is a single thread (Thread A) that is in charge of some data structure, and you want to perform some operation on the data structure from another thread (Thread B), but you don't want to have the synchronization overhead / complexity of trying to share that data between two threads. By having Thread B queue the operation to run on Thread A, which solely maintains that structure, you are executing any arbitrary function you want on that data without having to worry about synchronization.
It is just another tool like a thread pool. However with a thread pool you cannot send a task to a particular thread. You have no control over where the work is done. When you queue up a task that may end up creating a whole new thread. You may queue two tasks and they get done simultaneously on two different threads. With QueueUserAPC, you can be guaranteed that the tasks would get done in order and on the thread you designate.
Related
Say I have 3 threads.
1st Thread is the controller where I create threads
2nd and 3rd is where the data processing is done
The catch is, the 2nd thread does a different operation than the 3rd. So I need a way to differentiate between the two different logic. Would just creating threads with different methods suffice?
No! That's because should thread 2 die, become unresponsive I need the 3rd to take it's place and start doing the job that the 2nd thread was doing, and create a new one to act as the 3rd but creating a new one to act as the 3rd is not my issue.
It's as I said, how to suddenly make the 3rd thread start doing the 2nd one's logic without starting a whole new thread for the previous 2nd thread and resetting it's variables
E.g. Thread 2 dies, Thread 3 now needs to start doing Thread 2's job and I need to create a new Thread to cover for Thread 3.
Note: Bonus question is how I can do it without losing the Thread 2 so that in the eventuality that it stops being unresponsive I can still re-use it?
Any help appreciated!
You picked the wrong tool for the job. If you need to handle a task dying or becoming unresponsive, you must isolate them with processes, not threads.
Threads share an execution environment. So if a thread corrupts or damages that execution environment, it can affect all the other threads. For example, say your thread dies in the process of adjusting shared structures. If you keep the lock on those shared structures, no other thread can ever access them. If you release the lock, other threads may access them and find them in a corrupted state.
Use multiple processes for this kind of isolation.
I have a single thread C program implemented using event driven programming - a callback triggers every time the event happens.
The callback takes way too long to execute (do a bunch of calculations) and this processing time is important. Currently is 500 microseconds and need it to be less than 100.
Most of the calculations are independent, can be done in parallel.
I have a machine with many cores and was thinking if getting multiple threads to make the calculations in parallel could be possible / of help.
I think that the approach in which at the beginning of the callback I generate multiple threads, and then send the different calculations to the multiple threads will not work well because generating the threads takes time.
Is it possible to have a few threads up, waiting to be used, and that every time the callback is triggered I can send the calculations there without having to generate the threads in each callback?
You can use a thread pool for this (often called a worker pool). The basic idea is create some number of threads in advance and have them all sleep, waiting on a semaphore whenever there is no work to do.
Your code will be simpler if you can get away with one thread for each processing task, but you can also implement it (carefully) with a queue, where each worker tries to handle the next job in the queue and then sleep when the job queue is empty.
Either way, a single round of processing will look something like this:
assign or queue tasks to your worker pool
notify worker pool to wake up and begin processing tasks
wait for worker pool to signal all tasks complete (*)
(*) remember, "all tasks complete" is not the same as "task queue empty"
Now your main timing bottlenecks will depend on the mutex/semaphore implementation and your OS thread scheduler. It may be appropriate to set a high priority on all your worker threads.
If you have events at regular intervals, a common improvement to the above is to also double-buffer (i.e. output the result for the previous event, and assign the workers to begin processing input for the current event). To achieve that, you would move step 3 to happen before step 1.
This may or may not be suitable for your purposes. But it can provide some extra leeway with timing, if you're still having trouble processing fast enough. Try something simple first. Problems like this can get hairy very quickly when you start introducing extra requirements.
I am learning about threads. And I need to understand how threads communicate between each other, so what does it mean when we say something like "let Thread A send a message to Thread B"?
I can think of the following:
Thread B is blocking on some sort of queue, and Thread A places a new
entry in this queue, which causes Thread B to unblock, and retrieve
this entry.
Thread B is blocking on an event (for example, in Windows API there
is the Event object), and Thread A signals this event which will
cause Thread B to wake up (or is this called notifying a thread and
not sending a message to it?)
The "threads" world is subject of many ambiguity due to different nomenclature coming from different environments, sometimes using same words to mean different things.
Your first assertion makes sense in very general terms: the "message" is what makes the thread to wake-up and get some "input".
Depending on the OS and its own API, your second assertion makes sense and is nothing more then a way to implement the first using the Win32 API.
Another possible interpretation can be that the thread is blocked on a message loop (see GetMessage) and the other one calls PostThreadMessage.
In a more general term, you can think of a "message" as an "event" that carries a "state" with it: an event simply happens (and that's all the information it gives). A message "happens", and has some parameter associated with it.
Link to example Windows code that uses two threads to copy a file, the original thread reads, a created thread writes. There's a custom messaging system that uses Windows mutexes and semaphores. Other than the overhead to create and delete the mutexes and semaphores, the actual functions are fairly small. I've worked on embedded multi-threaded devices, using a similar messaging interface scheme.
http://rcgldr.net/misc/mtcopy.zip
I have a multi threaded program in which I sleep in one thread(Thread A) unconditionally for infinite time. When an event happens in another thread (Thread B), it wake up Thread-A by signaling. Now I know there are multiple ways to do it.
When my program runs in windows environment, I use WaitForSingleObject in Thread-A and SetEvent in the Thread-B. It is working without any issues.
I can also use file descriptor based model where I do poll, select. There are more than one way to do it.
However, I am trying to find which is the most efficient way. I want to wake up the Thread-A asap whenever Thread-B signals. What do you think is the best option.
I am ok to explore a driver based option.
Thanks
As said, triggering an SetEvent in thread B and a WaitForSingleObject in thread A is fast.
However some conditions have to be taken into account:
Single core/processor: As Martin says, the waiting thread will preempt the signalling thread. With such a scheme you should take care that the signalling thread (B) is going idle right after the SetEvent. This can be done by a sleep(0) for example.
Multi core/processor: One might think there is an advantage to put the two threads onto different cores/processors but this is not really such a good idea. If both threads are on the same core/processor, the time-span between calling SetEventand the return of WaitForSingleObject is much shorter shorter.
Handling both threads on one core (SetThreadAffinityMask) also allows to handle the behavior of them by means of their priority setting (SetThreadPriority). You may run the waiting thread at a higher priorty or you have to ensure that the signalling thread is really not doing anything after it has set the event.
You have to deal with some other synchronization matter: When is the next event going to happen? Will thread A have completed its task? Most effective a second event can be used to solve this matter: When thread A is done, it sets an event to indicate that thread B is allowed to set its event again. Thread B will effectively first set the event and then wait for the feedback event, it meets the requirment to go idle immedeately.
If you want to allow thread B to set the event even when thread A is not finished and not yet in a wait state, you should consider using semaphores instead of events. This way the number of "calls/events" from thread B is kept and the wait function in thread A can follow up, because it is returning for the number of times the semaphore has been released. Semaphore objects are about as fast as events.
Summary:
Have both threads on the same core/cpu by means of SetThreadAffinityMask.
Extend the SetEvent/WaitForSingleObject by another event to establish a Handshake.
Depending on the details of the processing you may also consider semaphore objects.
I am programming using pthreads in C.
I have a parent thread which needs to create 4 child threads with id 0, 1, 2, 3.
When the parent thread gets data, it will set split the data and assign it to 4 seperate context variables - one for each sub-thread.
The sub-threads have to process this data and in the mean time the parent thread should wait on these threads.
Once these sub-threads have done executing, they will set the output in their corresponding context variables and wait(for reuse).
Once the parent thread knows that all these sub-threads have completed this round, it computes the global output and prints it out.
Now it waits for new data(the sub-threads are not killed yet, they are just waiting).
If the parent thread gets more data the above process is repeated - albeit with the already created 4 threads.
If the parent thread receives a kill command (assume a specific kind of data), it indicates to all the sub-threads and they terminate themselves. Now the parent thread can terminate.
I am a Masters research student and I am encountering the need for the above scenario. I know that this can be done using pthread_cond_wait, pthread_Cond_signal. I have written the code but it is just running indefinitely and I cannot figure out why.
My guess is that, the way I have coded it, I have over-complicated the scenario. It will be very helpful to know how this can be implemented. If there is a need, I can post a simplified version of my code to show what I am trying to do(even though I think that my approach is flawed!)...
Can you please give me any insights into how this scenario can be implemented using pthreads?
As far what can be seen from your description, there seems to be nothing wrong with the principle.
What you are trying to implement is a worker pool, I guess, there should be a lot of implementations out there. If the work that your threads are doing is a substantial computation (say at least a CPU second or so) such a scheme is a complete overkill. Mondern implementations of POSIX threads are efficient enough that they support the creation of a lot of threads, really a lot, and the overhead is not prohibitive.
The only thing that would be important if you have your workers communicate through shared variables, mutexes etc (and not via the return value of the thread) is that you start your threads detached, by using the attribute parameter to pthread_create.
Once you have such an implementation for your task, measure. Only then, if your profiler tells you that you spend a substantial amount of time in the pthread routines, start thinking of implementing (or using) a worker pool to recycle your threads.
One producer-consumer thread with 4 threads hanging off it. The thread that wants to queue the four tasks assembles the four context structs containing, as well as all the other data stuff, a function pointer to an 'OnComplete' func. Then it submits all four contexts to the queue, atomically incrementing a a taskCount up to 4 as it does so, and waits on an event/condvar/semaphore.
The four threads get a context from the P-C queue and work away.
When done, the threads call the 'OnComplete' function pointer.
In OnComplete, the threads atomically count down taskCount. If a thread decrements it to zero, is signals the the event/condvar/semaphore and the originating thread runs on, knowing that all the tasks are done.
It's not that difficult to arrange it so that the assembly of the contexts and the synchro waiting is done in a task as well, so allowing the pool to process multiple 'ForkAndWait' operations at once for multiple requesting threads.
I have to add that operations like this are a huge pile easier in an OO language. The latest Java, for example, has a 'ForkAndWait' threadpool class that should do exactly this kind of stuff, but C++, (or even C#, if you're into serfdom), is better than plain C.