I have a fixed size array (example: struct bucket[DATASIZE]) where at the very beginning I load information from a file. Since I am concerned about scalability and execution time, no dynamic array was used.
Each time I process half of the array I am free to replace those spots with more data from the file. I don't have a clear idea on how I would do that but I thought about pthreads to start 2 parallel tasks: one would be the actual data processing and the other one would make sure to fill out the array.
However, all the examples that I've seen on pthreads show that they are all working on the same task but concurrently. Is there a way to have them do separate things? Any ideas, thoughts?
You can definitely have threads doing different tasks. The pattern you're after is very common - it's called a Producer-Consumer arrangement.
What you are trying to do seems very similar to standard concurrent program called producer-consumer (look it up, you surely find an example in pthreads). This program has one fixed size buffer which is processed by consumer and filled by producer.
Yes, that's an excellent use for pthreads: it's one of the very things that pthreads was made for.
You might think about fork( )ing twice, once to create the process to do the data manipulation; and then a second fork( ) to create the process that fills in the blanks. Use a mutex to let each process protect the array from the other process and it will work fine.
Why would your array need a mutex? How would you set it up? When would each process need to acquire the mutex and when would it need to release the mutex?
-- pete
Related
I know I am supposed to use mutexes but the way I currently use pthreads it would overly complicate the program...
anyway I basically have a variable which I use to denote if a thread is currently performing work or not. in the main thread I run over it in a while loop the check what threads are no longer busy. Now obviously my thread can write to this same variable once it is done.
Is it allowed to read and write from the same variable from 2 different threads, if 1 thread is ONLY reading and 1 thread is ONLY writing. reading of an old version is not of much concern since it will just read the correct once on the next iteration.
so is it safe to do something like that?
In general, NO.
The following article explains why:
http://www.domaigne.com/blog/computing/mutex-and-memory-visibility/
Here is a list of API functions that act as memory barriers:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11
How does one implement a multithreaded single process model in linux fedora under c where a single scheduler is used on a "main" core reading i/o availability (ex. tcp/ip, udp) then having a single-thread-per-core (started at init), the "execution thread", parse the data then update a small amount of info update to shared memory space (it is my understanding pthreads share data under a single process).
I beleive my options are:
Pthreads or the linux OS scheduler
I have a naive model in mind consisting of starting a certain number of these execution threads a single scheduler thread.
What is the best solution one could think when I know that I can use this sort of model.
Completing Benoit's answer, in order to communicate between your master and your worker threads, you could use conditional variable. The workers do something like:
while (true)
{
pthread_mutex_lock(workQueueMutex);
while (workQueue.empty())
pthread_cond_wait(workQueueCond, workQueueMutex);
/* if we get were then (a) we have work (b) we hold workQueueMutex */
work = pop(workQueue);
pthread_mutex_unlock(workQueueMutex);
/* do work */
}
and the master:
/* I/O received */
pthread_mutex_lock(workQueueMutex);
push(workQueue, work);
pthread_cond_signal(workQueueCond);
pthread_mutex_unlock(workQueueMutex);
This would wake up one idle work to immediately process the request. If no worker is available, the work will be dequeued and processed later.
Modifying the Linux scheduler is quite a tough work. I would just forget about it. Pthread is usually prefered. If I understand well, you want to have one core dedicated to the control plan, and a pool of other cores dedicated to the data plan processing? Then create a pool of threads from your master thread and setup core affinity for these slave threads with pthread_setaffinity_np(...).
Indeed threads of a process share the same address-space, and global variables are accessible by any threads of that process.
It looks to me that you have a version of the producer-consumer problem with a single consumer aggregating the results of n producers. This is a pretty standard problem, so I definitely think that pthread is more than enough for you. You don't need to go and mess around with the scheduler.
As one of the answer's states, a thread safe queue like the one described here works nicely for this sort of issue. Your original idea of spawning a bunch of threads is a good idea. You seem to be worried that the ability of the threads to share global state will cause you problems. I don't think that this is an issue if you keep shared state to a minimum and use sane locking discipline. Sharing state is fine as long as you do so responsibly.
Finally, unless you really know what you're doing, I would advise against manually messing with thread affinity. Just spawn the threads and let the scheduler handle when and on what core a thread runs. The thing to optimize is the number of threads you use. One for each core may not actually be the fastest approach if other threads are running.
Generally speaking, this is more or less exactly what the posix select and linux specific epoll functions are for.
Suppose I create threads with pthreads, is it possible to send them new things to work on after they have been initialized, so I don't waste resources in creating new threads? For instance, I create 3 threads, thread 2 signals completion and I send it another "task" without killing it and starting a new one. Thanks.
The usual, simple form is an ordinary (work) queue. In principle, you maintain a queue structure, perhaps as a linked list, protected by a mutex. Typically, condition variables are used by the main/producer threads to notify worker threads that new work is available, so they don't have to poll.
Some previous SO questions that may also be useful are:
How To Use Condition Variable
One producer, Two consumers and usage of pthread_cond_signal & pthread_mutex_lock
pthread conditional variable
Yes, and that is what servers like Apache do to increase their performance. The design pattern is called the Thread pool pattern and there are various implementations (this one for example) using pthreads.
Of course, you might want to keep your implementation as simple as possible, depending on what your goals are.
Of course. For example, you can use producer-consumer pattern. Here is an example in C#, but it can be easily implemented in pthreads as well.
The search-keyword to your question is "thread pooling" or "thread pool". Using this terms you will find plenty information on this site and also in Google.
Is there something equivalent to SIGSTOP and SICONT for threads? Am using pthreads.
Thanks
An edit:
I am implementing a crude form of file access syncronization among threads. So if a file is already opened by a thread, and another thread wants to open it again, I need to halt or pause the second thread at that point of its execution. When the first thread has completed its work it will check what other threads wanted to use a file it released and "wake" them up. The second thread then resumes execution from exactly that point. I use my own book keeping datastructures.
I'm going to tell you how to do things instead of answering the question. (Look up the "X Y problem".)
You are trying to prevent two threads from accessing the same file at the same time. In other words, access is MUTually EXclusive. A "mutex" is designed to do this. In general, it is easier to find help if you search for what you are trying to do (prevent two threads from accessing the same resource simultaneously) rather than searching for how you want to do it (make one thread wait for the other).
Edit: It sounds like you actually want many readers but one writer. This is probably the second most common synchronization problem (after the "producer-consumer" problem). Use a pthread_rwlock: readers call pthread_rdlock and writers call pthread_wrlock.
If you're doing something this sophisticated, you really should start reading the relevant literature. If you think you can do multithreaded programming some serious reading, you are much smarter than me and you don't need my help. I recommend "The Little Book of Semaphores" which is a free download (source). It's not about pthreads, but it's good stuff. The readers-writers problem you are asking about is found under ยง4.2 in the chapter "Classical Synchronization Problems" (heck, this problem is even mentioned in the blurb).
Multithreaded programing is HARD with capital letters and a bold font.
Well, there is pthread_kill.
But you almost certainly do not want to do this. What if the other thread holds (e.g.) a mutex for the heap, and you try to call new while it is stopped?
Since you do not know what the runtime is doing with mutexes, there is no way to avoid this kind of problem in general unless you completely avoid the standard library.
[edit]
Actually, come to think of it, I am not sure what happens if you target a specific thread with SIGSTOP, since that signal usually affects the whole process.
So to update my answer, I do not believe there is any standard mechanism for suspending a thread asynchronously... And for the reason mentioned above, I do not think you want one.
Depending on your application, Pthreads supports what can be considered more refined mechanisms, such as http://www.unix.com/man-page/all/3t/pthread_suspend/ and Mutex mechnisms
I want to write a high performance synchronized generator in C. I want to be able to feed events to it and have multiple threads be able to poll/read asynchronously, such that threads never receive duplicates.
I don't really know that much about how synchronization is typically done. Can someone give me a high level explanation of one or more techniques that I might be able to use?
Thanks!
You need a thread implementation; C does not have any built-in support for multiprocessing concepts. Threads are thus often implemented as libraries. Such a library will typically provide you with ways to synchronize the execution of multiple threads, ways to protect data, and so on.
The main concept in thread safety is the Mutex (though there is different kind of locks).
It is used to protect your memory from multiple accesses and race conditions.
A good example of its use would be when using a Linked List. You can't allow two different threads to modify it in the same time. In your example, you could possibly use a linked-list to create a queue, and each thread would consume some data from it.
Obviously there are other synchronization mechanisms, but this one is (by far ?) the most important.
You could have a look at this page (and referenced pages at the bottom) for more implementation details.
Thread-safe will be the problem when there are shared variables between threads. If you don't have any shared variables, it's not a problem. Every event can be readonly and disptaching to listeners randomly.
Thread safety is achieved by using whatever synchronisation primitives the multithreading implementation provides.
Your start point would probably be a linked list of events, a lock that protects it, and every thread takes the lock, consumes one event by adjusting the pointer to the first event and then releases the lock; appending events also locks the entire list. When the list is empty, the workers exit.
From there, various optimisations are possible:
Caching the pointer to the last event, so appending an event to the list becomes cheaper.
Adding a notification mechanism so worker threads can sleep while the list is empty. Typically, this is achieved with something called a condition variable.
Using multiple lists, so if the first list is locked, the worker can retrieve an event from another list without having to wait for the thread that has currently locked the list.