I have two different applications that have to work together. process 1 acts as a time source and process 2 performs acts according to the time source provided by process 1. I need to run multiple copies of process 2. The goal is to have one time source process signaling 5-10 other processes at the same time, so they they all perform their work simultaneously.
Currently, I have this implemented in the following way:
The time source program starts, created a shared memory segment, creates an empty list of PIDs, then unlocks the segment.
Each time one of the client programs start, they go the shared memory, add their own pid to the list, and then unlock it.
The time source has a timer than goes off every 10ms. Every time the timer goes off, he cycles through the pid list, and sends a signal to everyone in it back to back.
This approach mostly works well, but I am hoping that it can be improved. I currently have two sticking points:
Very rarely, the signal delivered to one of the client processes will be skewed by ~2 milliseconds or so. The end result is: | 12ms | 8ms | instead of | 10ms | 10ms |.
The second issue is that all of the client programs are actually multithreaded and doing a lot of work (though only the original thread is responsible for handling the signal). If I have multiple client processes running at once, the delivery of the signals gets more sporatic and skewed, as if they are more difficult to deliver when the system is more taxed (even if the client process is ready and waiting for the interrupt).
What other approaches should I consider for doing this type of thing? I have considered the following (all in the shared memory segment):
Using volatile uin8_t flags (set by time source process, cleared by client).
Using semaphores, but if the time source process is running, and the client hasn't started yet, how do I keep from incrementing the semaphore over and over?
Condition variables, though there doesn't seem to be a solution that can be used in shared memory between unrelated processes.
Even if a process is in waiting state, ready to receive a signal, does not mean that the kernel is going to schedule the task yet, and especially when there are most tasks in running states than there are available CPU cores.
Adjusting the priority (or nice level) or processes and threads, will influence the kernel scheduler.
¨
You can also play around with the different schedulers that are available in your kernel, and their parameters.
Related
I have written code for two threads where is one is assigned priority 20 (lower) and another on 10 (higher). Upon executing my code, 70% of the time I get expected results i.e high_prio (With priority 10) thread executes first and then low_prio (With priority 20).
Why is my code not able to get 100 % correct result in all the executions? Is there any conceptual mistake that I am doing?
void *low_prio(){
Something here;
}
void *high_prio(){
Something here;
}
int main(){
Thread with priority 10 calls high_prio;
Thread with priority 20 calls low_prio;
return 0;
}
Is there any conceptual mistake that I am doing?
Yes — you have an incorrect expectation regarding what thread priorities do. Thread priorities are not meant to force one thread to execute before another thread.
In fact, in a scenario where there is no CPU contention (i.e. where there are always at least as many CPU cores available as there are threads that currently want to execute), thread priorities will have no effect at all -- because there would be no benefit to forcing a low-priority thread not to run when there is a CPU core available for it to run on. In this no-contention scenario, all of the threads will get to run simultaneously and continuously for as long as they want to.
The only time thread priorities may make a difference is when there is CPU contention -- i.e. there are more threads that want to run than there are CPU cores available to run them. At that point, the OS's thread-scheduler has to make a decision about which thread will get to run and which thread will have to wait for a while. In this instance, thread priorities can be used to indicate to the scheduler which thread it should prefer allow to run.
Note that it's even more complicated than that, however -- for example, in your posted program, both of your threads are calling printf() rather a lot, and printf() invokes I/O, which means that the thread may be temporarily put to sleep while the I/O (e.g. to your Terminal window, or to a file if you have redirected stdout to file) completes. And while that thread is sleeping, the thread-scheduler can take advantage of the now-available CPU core to let another thread run, even if that other thread is of lower priority. Later, when the I/O operation completes, your high-priority thread will be re-awoken and re-assigned to a CPU core (possibly "bumping" a low-priority thread off of that core in order to get it).
Note that inconsistent results are normal for multithreaded programs -- threads are inherently non-deterministic, since their execution patterns are determined by the thread-scheduler's decisions, which in turn are determined by lots of factors (e.g. what other programs are running on the computer at the time, the system clock's granularity, etc).
I have a project with some soft real-time requirements. I have two processes (programs that I've written) that do some data acquisition. In either case, I need to continuously read in data that's coming in and process it.
The first program is heavily threaded, and the second one uses a library which should be threaded, but I have no clue what's going on under the hood. Each program is executed by the user and (by default) I see each with a priority of 20 and a nice value of 0. Each program uses roughly 30% of the CPU.
As it stands, both processes have to contended with a few background processes, and I want to give my two programs the best shot at the CPU as possible. My main issue is that I have a device that I talk to that has a 64 byte hardware buffer, and if I don't read from it in time, I get an overflow. I have noted this condition occurring once every 2-3 hours of run time.
Based on my research (http://oreilly.com/catalog/linuxkernel/chapter/ch10.html) there appear to be three ways of playing around with the priority:
Set the nice value to a lower number, and therefore give each process more priority. I can do this without any modification to my code (or use the system call) using the nice command.
Use sched_setscheduler() for the entire process to a particular scheduling policy.
Use pthread_setschedparam() to individually set each pthread.
I have run into the following roadblocks:
Say I go with choice 3, how do I prevent lower priority threads from being starved? Is there also a way to ensure that shared locks cause lower priority threads to be promoted to a higher priority? Say I have a thread that's real-time, SCHED_RR and it shared a lock with a default, SCHED_OTHER thread. When the SCHED_OTHER thread gets the lock, I want it to execute # higher priority to free the lock. How do I ensure this?
If a thread of SCHED_RR creates another thread, is the new thread automatically SCHED_RR, or do I need to specify this? What if I have a process that I have set to SCHED_RR, do all its threads automatically follow this policy? What if a process of SCHED_RR spawns a child process, is it too automatically SCHED_RR?
Does any of this matter given that the code only uses up 60% of the CPU? Or are there still issues with the CPU being shared with background processes that I should be concerned with and could be caused my buffer overflows?
Sorry for the long winded question, but I felt it needed some background info. Thanks in advance for the help.
(1) pthread_mutex_setprioceiling
(2) A newly created thread inherits the schedule and priority of its creating thread unless it's thread attributes (e.g. pthread_attr_setschedparam / pthread_attr_setschedpolicy) are directed to do otherwise when you call pthread_create.
(3) Since you don't know what causes it now it is in fairness hard for anyone say with assurance.
I have a question to ask. I have a program (Process 1) which has three threads:
Thread 1 runs continuously, receives packets from a lock socket (AF_UNIX, NON_BLOCK) and copies them to a buffer.
Thread 2 reads from the buffer and writes the information received to a file (disk).
Thread 3 compresses the file if the file grows larger than 5 MB
There is another process (Process 2) which is continuously sending packets to the local socket read by Process 1. Number of packets (of around 100 bytes) sent per second can be as high as 3000-5000 packets per second. This setup runs on an embedded hardware with ARM v9 controller.
I have to ensure that the none of the packets are lost and all of them are written to disk. With the current implementation, I receive sending errors at Process 2 from "sendto" (Resource unavailable) every now and then.
I disable all locks and mutexes for avoiding race conditions (remove all checks to prevent write while read and vice-versa), even then I get sending errors from "sendto".
Then in the second step, I disable the writing to disk. Now, the Thread 1 of Process 1 can read as fast as possible from the local socket and there is no sending error. My guess is that since the threads are running on an ARM controller with no hyperthreading, there is only one thread executing at a single point of time and OS is handling the scheduling of threads.
My question here is,
Is it possible to run the three threads in parallel (each of them executing simultaneously) ? Is there a gcc construct or a compiler flag which can force the running of threads in parallel (in foreground) ? Can I change something in the program to achieve the above without splitting the functionality into multiple programs and using a shared memory for buffer ?
Regards,
Anupam
No. You can't force any kind of thread order. So your first question, is it possible for them to execute simultaneously? Yes. How can you do it? You can't. The operating system chooses to do that. You can set priorities and things like that, but I still think linux (or windows) will switch threads pretty randomly and without telling you/allowing you to change the scheduler. Think about all the threads from all the programs running on your computer; which ones can execute and when? The answer is, who knows! There is no way to tell when your thread will block, even if its holding a lock (which is why you're getting a resource busy response probably). So how do you stop this from happening? Make sure you check to see if the resource is still locked before trying to use the resource! Then, it doens't matter when the threads lock a resource.
Also, if its IPC, why are you using sockets? Why not try a pipe, and then it doesn't matter if you lock it(unless more than one thread writes to the resource at a time).
In this case Sender is faster than Receiver so the NON_BLOCK option on the socket may cause the send error (returns error when sender needs to block). I have the following 2 suggestions
In Sender (Process 2) you can resend the packets which are resulted in send error.
Remove NON_BLOCK option on the socket.
I am programming using pthreads in C.
I have a parent thread which needs to create 4 child threads with id 0, 1, 2, 3.
When the parent thread gets data, it will set split the data and assign it to 4 seperate context variables - one for each sub-thread.
The sub-threads have to process this data and in the mean time the parent thread should wait on these threads.
Once these sub-threads have done executing, they will set the output in their corresponding context variables and wait(for reuse).
Once the parent thread knows that all these sub-threads have completed this round, it computes the global output and prints it out.
Now it waits for new data(the sub-threads are not killed yet, they are just waiting).
If the parent thread gets more data the above process is repeated - albeit with the already created 4 threads.
If the parent thread receives a kill command (assume a specific kind of data), it indicates to all the sub-threads and they terminate themselves. Now the parent thread can terminate.
I am a Masters research student and I am encountering the need for the above scenario. I know that this can be done using pthread_cond_wait, pthread_Cond_signal. I have written the code but it is just running indefinitely and I cannot figure out why.
My guess is that, the way I have coded it, I have over-complicated the scenario. It will be very helpful to know how this can be implemented. If there is a need, I can post a simplified version of my code to show what I am trying to do(even though I think that my approach is flawed!)...
Can you please give me any insights into how this scenario can be implemented using pthreads?
As far what can be seen from your description, there seems to be nothing wrong with the principle.
What you are trying to implement is a worker pool, I guess, there should be a lot of implementations out there. If the work that your threads are doing is a substantial computation (say at least a CPU second or so) such a scheme is a complete overkill. Mondern implementations of POSIX threads are efficient enough that they support the creation of a lot of threads, really a lot, and the overhead is not prohibitive.
The only thing that would be important if you have your workers communicate through shared variables, mutexes etc (and not via the return value of the thread) is that you start your threads detached, by using the attribute parameter to pthread_create.
Once you have such an implementation for your task, measure. Only then, if your profiler tells you that you spend a substantial amount of time in the pthread routines, start thinking of implementing (or using) a worker pool to recycle your threads.
One producer-consumer thread with 4 threads hanging off it. The thread that wants to queue the four tasks assembles the four context structs containing, as well as all the other data stuff, a function pointer to an 'OnComplete' func. Then it submits all four contexts to the queue, atomically incrementing a a taskCount up to 4 as it does so, and waits on an event/condvar/semaphore.
The four threads get a context from the P-C queue and work away.
When done, the threads call the 'OnComplete' function pointer.
In OnComplete, the threads atomically count down taskCount. If a thread decrements it to zero, is signals the the event/condvar/semaphore and the originating thread runs on, knowing that all the tasks are done.
It's not that difficult to arrange it so that the assembly of the contexts and the synchro waiting is done in a task as well, so allowing the pool to process multiple 'ForkAndWait' operations at once for multiple requesting threads.
I have to add that operations like this are a huge pile easier in an OO language. The latest Java, for example, has a 'ForkAndWait' threadpool class that should do exactly this kind of stuff, but C++, (or even C#, if you're into serfdom), is better than plain C.
I have an application that spawns multiple child processes, which then go on to spawn multiple threads. I can control the number of processes and threads that are spawned. The threads do a specific read/write operation to a NAS, and I record how long this takes.
What's odd is that the time it takes to perform the read/write operation is longer with multiple threads. I read /proc/stat before starting the application and when finished, and got this (after some math):
cpu0: 1.0050% usrtime, 2.5126% systime, 95.4774% idle, 0.5025% softirq
cpu1: 0.0000% usrtime, 0.0000% systime, 100.0000% idle, 0.0000% softirq
I also checked sched_getaffinity, and both CPUs are enabled for the child processes. Is there something that I must do, besides spawning multiple threads, to make use of the multiple cores?
You're hardly using your CPU at all. Going out to Network Attached Storage, your bottleneck is most likely your network connection. How much data are you pushing and how much bandwidth can your pipeline (and your NAS) tolerate?