I have tried to create a function that works similar to a barrier function, except it can handle the active amount of threads changing. (I can't seem to get it to work either by destroying and reinitializing the barrier whenever a thread exits the function loop).
My issue is that I can't get my replacement function to run properly, i.e. the program softlocks for some reason.
So far nothing I've tried has worked to ensure that the threads are synchronized and the program doesn't softlock.
I've tried using barriers, I've tried making the exiting threads enter barrier wait as well, to help with the barriers (but I couldn't figure out how to not softlock with the exiting threads, as I always ended up with some thread(s) invariably being left inside the barrier_wait function).
This is my replacement function for the pthread_barrier_wait function:
void SynchThreads()
{
pthread_mutex_lock(&lock);
if (threadsGoingToWait < maxActiveThreads)
{
threadsGoingToWait++;
pthread_cond_signal(&condVar2);
pthread_cond_wait(&condVar1, &lock);
} else
{
threadsGoingToWait=1;
pthread_cond_broadcast(&condVar1);
}
pthread_mutex_unlock(&lock);
}
To change the value of maxActiveThreads, I have the threads do the following before they exit the function loop:
pthread_mutex_lock(&tlock);
maxActiveThreads--;
if (maxActiveThreads>0)
{
pthread_cond_wait(&condVar2, &tlock);
pthread_cond_broadcast(&condVar1);
}
else pthread_cond_broadcast(&condVar2);
pthread_mutex_unlock(&tlock);
I have the pthread variables initialized before the thread creation as this:
pthread_barrier_init(&barrier, NULL, maxActiveThreads);
pthread_mutex_init(&lock, NULL);
pthread_mutex_init(&tlock, NULL);
pthread_cond_init(&condVar1, NULL);
pthread_cond_init(&condVar2, NULL);
I have no clue why the program is softlocking right now, since as far as I know, so long as there's at least 1 thread either remaining or in the waiting fireld, it should release the other threads from the cond_wait they're in.
Edit:
If I remove the condVar2 from being used, and instead end the function loop with a barrier_wait, the program no longer softlocks, however it still doesn't function as if it's being synchronized properly.
To give some more detail as to what I'm working on: I'm trying to make a sequential Gaussian elimination function parallel. So the issues I've had so far is that either the matrix has the wrong values, or the vectors have the wrong values, or they all have the wrong values. I was hoping by having synchronization points distributed as following would fix the issue of synchronization errors:
static void* gauss_par(void* params)
{
/*getting the threads and the related data*/
for (int k = startRow; k < N; k+=threadCount) /* Outer loop */
{
SynchThreads();
/* Division step */
SynchThreads();
/* Vector y and matrix diagonal */
SynchThreads();
for (int i = k+1; i < N; i++)
{
/* Elimination step */
SynchThreads();
/* Vector b and matrix zeroing */
SynchThreads();
}
}
}
As a preliminary, I see &lock in your SyncThreads() and &tlock in your other code snippet. These almost certainly do not go with each other, because proper protection via a mutex relies on all threads involved using the same mutex to guard access to the data in question. I'm having trouble coming up with a way, in C, that the expressions &lock and &tlock could evaluate to pointers of type pthread_mutex_t *, pointing to the same object. Unless maybe one of lock and tlock were a macro expanding to the other, which would be nasty.
With that said,
I have tried to create a function that works similar to a barrier function, except it can handle the active amount of threads changing. (I can't seem to get it to work either by destroying and reinitializing the barrier whenever a thread exits the function loop).
Destroying and reinitializing the (same) barrier object should work if you can ensure that no thread is ever waiting at the barrier when you do so or arrives while you are doing so. But in practice, that's often a difficult condition to ensure.
Barriers are a somewhat specialized synchronization tool, whereas the Swiss army knife of thread synchronization is the condition variable. This appears to be the direction you've taken, and it's probably a good one. However, I suspect you're looking at CVs from a functional perspective rather than a semantic one. This leads to all sorts of issues.
The functional view is along these lines: a condition variable allows threads to block until another thread signals them.
The semantic view is this: a condition variable allows threads to block until some testable condition is satisfied. Generally, that condition is a function of one or more shared variables, and access to those is protected by the same mutex as is used with the CV. The same CV can be used by different threads at the same time to wait for different conditions, and that can make sense when the various conditions are all related to the same data.
The semantic view guides you better toward appropriate usage idioms, and it helps you come to the right answers on questions about who should signal or broadcast to the CV, under what conditions, and even whether to use a CV at all.
Diagrammatically, the basic usage pattern for a CV wait is this:
|
+++++++++++++++++|++++++++++++++++++++++++++++++++
+ V CRITICAL REGION +
+ +- - - - - + +
+ : optional : +
+ + - - - - -+ +
+ | +
+ V +
+ +-----------------------------+ +
+ | Is the condition satisfied? | <-+ +
+ +-----------------------------+ | +
+ | | | +
+ | Yes | No | on waking +
+ V V | +
+ +- - - - - + +---------+ | +
+ : optional : | CV wait |---+ +
+ + - - - - -+ +---------+ +
+ | +
+++++++|++++++++++++++++++++++++++++++++++++++++++
V
In particular, since the whole point is that threads don't proceed past the wait unless the condition is satisfied, it is essential that they check after waking whether they should, in fact, proceed. This protects against three things:
the thread was awakened via a signal / broadcast even though the condition it is waiting for was not satisfied;
the thread was awakened via a signal / broadcast but by the time it gets a chance to run, the condition it was waiting for is no longer satisfied; and
the thread woke despite the CV not having been signaled / broadcasted to (so there's no reason to expect the condition to be satisified, though it's ok to continue if it happens to be satisfied anyway).
With that in mind, let's consider your pseudo-barrier function. In English, the condition for a thread passing through would be something like "all currently active threads have (re-)reached the barrier since the last time it released." I take maxActiveThreads to be the number of currently active threads (and thus, slightly misnamed). There are some simple but wrong ways to implement that condition, most of them falling over in the event that a thread passes through the barrier and then returns to it before some of the other threads have passed through. A simple counter of waiting threads is not enough, given threads' need to check the condition before proceeding.
One thing you can do is switch which variable you use for a waiter count between different passages through the barrier. That might look something like this:
int wait_count[2];
int wait_counter_index;
void cycleBarrier() {
// this function is to be called only while holding the mutex locked
wait_counter_index = !wait_counter_index; // flip between 0 and 1
wait_count[wait_counter_index] = 0;
}
void SynchThreads() {
pthread_mutex_lock(&lock);
// A thread-specific copy of the wait counter number upon arriving
// at the barrier
const int my_wait_index = wait_counter_index;
if (++wait_count[my_wait_index] < maxActiveThreads) {
do {
pthread_cond_wait(&condVar1, &lock);
} while (wait_count[my_wait_index] < maxActiveThreads);
} else {
// This is the thread that crested the barrier, and the first one
// through it; prepare for the next barrier passage
assert(wait_counter_index == my_wait_index);
cycleBarrier();
}
pthread_mutex_unlock(&lock);
}
That flip-flops between two waiter counts, so that each thread can check whether the count for the barrier passage it is trying to make has been reached, while also providing for threads arriving at the next barrier passage.
Do note also that although this can accommodate the number of threads decreasing, it needs more work if you need to accommodate the number of threads increasing. If the thread count is increased at a point where some threads have passed the barrier but others are still waiting to do so then the ones still waiting will erroneously think that their target thread count has not yet been reached.
Now let's consider your strategy when modifying maxActiveThreads. You've put a wait on condVar2 in there, so what condition does a thread require to be satisfied in order to proceed past that wait? The if condition guarding it suggests maxActiveThreads <= 0 would be the condition, but surely that's not correct. In fact, I don't think you need a CV wait here at all. Acquiring the mutex in the first place is all a thread ought to need to do to be able to reduce maxActiveThreads.
However, having modified maxActiveThreads does mean that other threads waiting for a condition based on that variable might be able to proceed, so broadcasting to condVar1 is the right thing to do. On the other hand, there's no particular need to signal / broadcast to a CV while holding its associated mutex locked, and it can be more efficient to avoid doing so. Also, it's cheap and safe to signal / broadcast to a CV that has no waiters, and that can allow some code simplification.
Lastly, what happens if all the other threads have already arrived at the barrier? In the above code, it is the one to crest the barrier that sets up for the next barrier passage, and if all the other threads have reached the barrier already, then that would be this thread. That's why I introduced the cycleBarrier() function -- the code that reduces maxActiveThreads might need to cycle the barrier, too.
That leaves us with this:
pthread_mutex_lock(&lock);
maxActiveThreads--;
if (wait_count[wait_counter_index] >= maxActiveThreads) {
cycleBarrier();
}
pthread_mutex_unlock(&lock);
pthread_cond_broadcast(&condVar1);
Finally, be well aware that all these mutex and cv functions can fail. For robustness, a program must check their return values and take appropriate action in the event of failure. I've omitted all that for clarity and simplicity, but I am rigorous about error checking in real code, and I advise you to be, too.
An Ada language solution to this problem is shown below. This example implements a producer-consumer pattern supporting N producers and M consumers. The test of the solution implements one producer and 5 consumers, but the solution is the same for any positive number of producers and consumers, up to the limits of your hardware.
This example is implemented in three files. The first file is the package specification, which defines the interface to the producer and consumer task types. The second file is the package body which contains the implementation of the task types and their shared buffer. The shared buffer is implemented as an Ada protected object. Ada protected objects are passive elements of concurrency which implicitly control their own locking and unlocking based on the kind of methods implemented.
This example uses only protected entries which acquire an exclusive read-write lock on the object only when their guard condition evaluates to true. The guard conditions are implicitly evaluated at the end of each entry. Tasks suspended because the guard condition on an entry evaluates to FALSE are placed in an entry queue with FIFO ordering by default.
package specification:
package multiple_consumers is
task type producer is
entry Set_Id(Id : Positive);
entry stop;
end producer;
task type consumer is
entry Set_Id(Id : Positive);
entry stop;
end consumer;
end multiple_consumers;
package body:
with Ada.Text_IO; use Ada.Text_IO;
package body multiple_consumers is
type Index_Type is mod 10;
type Buffer_Type is array (Index_Type) of Integer;
------------
-- Buffer --
------------
protected Buffer is
entry Enqueue (Item : in Integer);
entry Dequeue (Item : out Integer);
private
Buff : Buffer_Type;
Write_Index : Index_Type := 0;
Read_Index : Index_Type := 0;
Count : Natural := 0;
end Buffer;
protected body Buffer is
entry Enqueue (Item : in Integer) when Count < Buff'Length is
begin
Buff (Write_Index) := Item;
Write_Index := Write_Index + 1;
Count := Count + 1;
end Enqueue;
entry Dequeue (Item : out Integer) when Count > 0 is
begin
Item := Buff (Read_Index);
Read_Index := Read_Index + 1;
Count := Count - 1;
end Dequeue;
end Buffer;
--------------
-- producer --
--------------
task body producer is
Value : Integer := 1;
Me : Positive;
begin
accept Set_Id (Id : in Positive) do
Me := Id;
end Set_Id;
loop
select
accept stop;
exit; -- Exit loop
else
select
Buffer.Enqueue (Value);
Put_Line
(" Producer" & Me'Image & " produced " & Value'Image);
Value := Value + 1;
or
delay 0.001;
end select;
end select;
end loop;
Put_Line ("Producer" & Me'Image & " stopped.");
end producer;
--------------
-- consumer --
--------------
task body consumer is
My_Value : Integer;
Me : Positive;
begin
accept Set_Id (Id : Positive) do
Me := Id;
end Set_Id;
loop
select
accept stop;
exit; -- exit loop
else
select
Buffer.Dequeue (My_Value);
Put_Line
("Consumer" & Me'Image & " consumed " & My_Value'Image);
or
delay 0.001;
end select;
end select;
end loop;
Put_Line ("Consumer" & Me'Image & " stopped.");
end consumer;
end multiple_consumers;
main procedure:
with multiple_consumers; use multiple_consumers;
procedure Main is
subtype Consumer_Idx is Positive range 1 .. 5;
Consumer_list : array (Consumer_Idx) of consumer;
P : producer;
begin
for I in Consumer_list'Range loop
Consumer_list (I).Set_Id (I);
end loop;
P.Set_Id (1);
delay 0.01; -- wait 0.01 seconds
P.stop;
for I in Consumer_list'Range loop
Consumer_list (I).stop;
end loop;
end Main;
Output:
Producer 1 produced 1
Consumer 1 consumed 1
Producer 1 produced 2
Consumer 2 consumed 2
Consumer 3 consumed 3
Producer 1 produced 3
Producer 1 produced 4
Producer 1 produced 5
Consumer 4 consumed 4
Consumer 5 consumed 5
Producer 1 produced 6
Producer 1 produced 7
Consumer 1 consumed 6
Consumer 2 consumed 7
Consumer 3 consumed 8
Producer 1 produced 8
Producer 1 produced 9
Producer 1 produced 10
Consumer 4 consumed 9
Consumer 5 consumed 10
Consumer 1 consumed 11
Producer 1 produced 11
Producer 1 produced 12
Consumer 2 consumed 12
Producer 1 produced 13
Consumer 3 consumed 13
Producer 1 produced 14
Consumer 4 consumed 14
Producer 1 produced 15
Consumer 5 consumed 15
Producer 1 produced 16
Consumer 1 consumed 16
Producer 1 produced 17
Consumer 2 consumed 17
Consumer 3 consumed 18
Producer 1 produced 18
Producer 1 produced 19
Consumer 4 consumed 19
Producer 1 produced 20
Consumer 5 consumed 20
Producer 1 produced 21
Consumer 1 consumed 21
Producer 1 produced 22
Consumer 2 consumed 22
Producer 1 produced 23
Consumer 3 consumed 23
Producer 1 produced 24
Consumer 4 consumed 24
Producer 1 produced 25
Consumer 5 consumed 25
Producer 1 produced 26
Consumer 1 consumed 26
Producer 1 produced 27
Consumer 2 consumed 27
Consumer 3 consumed 28
Producer 1 produced 28
Producer 1 produced 29
Consumer 4 consumed 29
Producer 1 produced 30
Consumer 5 consumed 30
Producer 1 produced 31
Consumer 1 consumed 31
Producer 1 produced 32
Consumer 2 consumed 32
Consumer 3 consumed 33
Producer 1 produced 33
Producer 1 produced 34
Consumer 4 consumed 34
Producer 1 produced 35
Consumer 5 consumed 35
Producer 1 produced 36
Consumer 1 consumed 36
Producer 1 produced 37
Consumer 2 consumed 37
Producer 1 produced 38
Consumer 3 consumed 38
Producer 1 produced 39
Consumer 4 consumed 39
Producer 1 produced 40
Consumer 5 consumed 40
Producer 1 produced 41
Consumer 1 consumed 41
Producer 1 produced 42
Consumer 2 consumed 42
Consumer 3 consumed 43
Producer 1 produced 43
Producer 1 produced 44
Consumer 4 consumed 44
Producer 1 produced 45
Consumer 5 consumed 45
Producer 1 produced 46
Consumer 1 consumed 46
Producer 1 produced 47
Consumer 2 consumed 47
Consumer 3 consumed 48
Producer 1 produced 48
Producer 1 produced 49
Consumer 4 consumed 49
Producer 1 produced 50
Consumer 5 consumed 50
Producer 1 produced 51
Consumer 1 consumed 51
Producer 1 produced 52
Consumer 2 consumed 52
Producer 1 produced 53
Consumer 3 consumed 53
Producer 1 produced 54
Consumer 4 consumed 54
Producer 1 produced 55
Consumer 5 consumed 55
Producer 1 produced 56
Consumer 1 consumed 56
Producer 1 produced 57
Consumer 2 consumed 57
Consumer 3 consumed 58
Producer 1 produced 58
Producer 1 produced 59
Consumer 4 consumed 59
Producer 1 produced 60
Consumer 5 consumed 60
Producer 1 produced 61
Consumer 1 consumed 61
Producer 1 produced 62
Consumer 2 consumed 62
Consumer 3 consumed 63
Producer 1 produced 63
Producer 1 produced 64
Consumer 4 consumed 64
Consumer 5 consumed 65
Producer 1 produced 65
Producer 1 produced 66
Consumer 1 consumed 66
Consumer 2 consumed 67
Producer 1 produced 67
Producer 1 produced 68
Consumer 3 consumed 68
Producer 1 produced 69
Consumer 4 consumed 69
Producer 1 produced 70
Consumer 5 consumed 70
Producer 1 produced 71
Consumer 1 consumed 71
Producer 1 produced 72
Consumer 2 consumed 72
Producer 1 produced 73
Consumer 3 consumed 73
Producer 1 produced 74
Consumer 4 consumed 74
Producer 1 produced 75
Consumer 5 consumed 75
Producer 1 produced 76
Consumer 1 consumed 76
Producer 1 stopped.
Consumer 1 stopped.
Consumer 2 stopped.
Consumer 3 stopped.
Consumer 4 stopped.
Consumer 5 stopped.
If a running process's executable is deleted, I've noticed fork fails where the child process is never executed.
For example, consider the code below:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
sleep(5);
pid_t forkResult;
forkResult = fork();
printf("after fork %d \n", forkResult);
return 0;
}
If I compile this and delete the resulting executable before fork is called, I never see fork return a pid of 0, meaning the child process never starts. I only have a Mac running Big Sur, so not sure if this repros on other OS's.
Does anyone know why this would be? My understanding is an executable should work just fine even if it's deleted while still running.
The expectation that the process should continue even if the binary was deleted is correct, however not fully correct in case of macOS. The example is tripping on a side-effect of the System Integrity Protection (SIP) mechanism inside the macOS kernel, however before explaining what is exactly going on, we need to make several experiments which will help us to better understand the whole scenario.
Modified example to better demonstrate the issue
To demonstrate what is going on, I had modified the example to count to 9, than do the fork, after the fork, the child will print a message "I am done", wait 1 second and exit by printing the 0 as the PID. The parent will continue to count to 14 and print the child PID. The code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
for(int i=0; i <10; i++)
{
sleep(1);
printf("%i ", i);
}
pid_t forkResult;
forkResult = fork();
if (forkResult != 0) {
for(int i=10; i < 15; i++) {
sleep(1);
printf("%i ", i);
}
} else {
sleep(1);
printf("I am done ");
}
printf("after fork %d \n", forkResult);
return 0;
}
After compiling it, I have started the normal scenario:
╰> ./a.out
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 4385
So, the normal scenario works as expected. The fact that we see the count from 0 to 9 two times, is due to the copy of the buffers for stdout that was done in the fork call.
Tracing the failing example
Now is time to do the negative scenario, we will wait for 5 seconds after the start and remove the binary.
╰> ./a.out & (sleep 5 && rm a.out)
[4] 8555
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 8677
[4] 8555 done ./a.out
We see that the output is only from the parent. Since the parent had counted to 14, and shows valid PID for the child, however the child is missing, it never printed anything. So, the child creation failed after the fork() was performed, otherwise fork() would have received and error instead of a valid PID. Traces from ktrace reveal that the child was created under the pid and was waken up:
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.3 MACH_DISPATCH 1bc 0 84 4 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.2 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 41 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0(0.0) TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_thread_qos_and_relprio 88775d 20000 20200 6 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_update_thread 88775d 811200 140000100 1f 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.8) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(1.1) imp_thread_qos_and_relprio 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_thread_qos_workq_override 88775d 30000 20200 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.1) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(0.2) imp_thread_qos_workq_override 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623857 +04 1.3 TURNSTILE_turnstile_added_to_thread_heap 88775d 9931ba6049ddcc77 0 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623858 +04 1.0 MACH_MKRUNNABLE 88775d 25 0 5 888065 2 a.out(8677)
t
So the child's process was dispatched with MACH_DISPATCH and made runnable with MACH_MKRUNNABLE. This is the reason the parent got valid PID after the fork().
Further more the ktrace for the normal scenario shows that the process had issued BSC_exit and and imp_task_terminated system call occurred, which is the normal way for a process to exit. However, in the second scenario where we had deleted the file, the trace doesn't show BSC_exit. This means that the child was terminated by the kernel, not by a normal termination. And we know that the termination happend after the child was created properly, since the parent had received the valid PID and the PID was made runnable.
This bring us closer to the understanding of what is going on here. But, before we have the conclusion, let's show another even more "twisted" example.
Even more strange example
What if we replace the binary on the filesystem after we started the process?
Here is the test to answer this question: we will start the process, remove the binary and create an empty file with the same name on his place with touch.
╰> ./a.out & (sleep 5 && rm a.out; touch a.out)
[1] 6264
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 6851
[1] + 6722 done ./a.out
Wait a minute, this works!? What is going on here!?!?
This strange example gives us important clue that will help us to explain what is going on.
The root-cause of the issue
The reason why the third example works, while the second one is failing, reveals a lot of what is going on here. As mentioned on the beginning, we are tripping on a side-effect of SIP, more precisely on the runtime protection mechanism.
To protect the system integrity, SIP will examine the running processes for the system protection and special entitlement. From the apple documentation: ...When a process is started, the kernel checks to see whether the main executable is protected on disk or is signed with an special system entitlement. If either is true, then a flag is set to denote that it is protected against modification. Any attempt to attach to a protected process is denied by the kernel...
When we had removed the binary from the filesystem, the protection mechanism was not able to identify the type of process for the child nor the special system entitlements since the binary file was missing from the disk. This triggered the protection mechanism to treat this process as an intruder in the system and terminate it, hanse we had not seen the BSC_exit for the child process.
In the third example, when we created dummy entry on the file system with touch, the SIP was able to detect that this is not a special process nor it has special entitlements and allowed the process to continue. This is a very solid indication that we ware tripping on the SIP realtime protection mechanism.
To prove that this is the case, I have disabled the SIP which requires a restart in the recovery mode and executed the test
╰> csrutil status
System Integrity Protection status: disabled.
╰> ./a.out & (sleep 5 && rm a.out)
[1] 1504
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 1626
Conclusion
So, the whole issue was caused by the System Integrity Protection. More details can be fond in the documentation
All the SIP needed was to have a file on the filesystem with the process name, so the mechanism can run the verification and decide to allow the child to continue the execution. This is showing us that we are observing a side-effect, rather than designed behavior, since the empty file was not even a valid dwarf, yet the execution had proceed.
I am doing a homework about multi-processing.
In this homework we are asked to use fork() function to fork different process from one parent process. Then we do same simulation on different child process. The problem is related to sports game. The program should be able to simulate the grouping function for different teams.
Now I am focusing on the first part that is test mode and I meet some problems.
I got the output like
Parent, pid 57361 : 2 children T Mode
Child 1 , pid 57362 : teams for pot 0 are : Liverpool Arsenal Barcelona ManCity Juventus Bayern Paris Zenit
Child 1 , pid 57362 : teams for pot 1 are : RealMadrid Atletico Chelsea Dortmund Napoli Shakhtar Tottenham Ajax
Child 1 , pid 57362 : teams for pot 2 are : Benfica Lyon Leverkusen Salzburg Olympiacos Brugge Valencia Internazionale
Child 1 , pid 57362 : teams for pot 3 are : Zagreb Lokomotiv Genk Galatasaray Leipzig SlaviaPraha Atalanta Lille
Child 1 , pid 57362 : Teams for group A are RealMadrid Brugge Galatasaray Paris
Child 1 , pid 57362 : pots for group A are 2 3 4 1
Child 1 , pid 57362 : country for group A are Spain Belgium Turkey France
Child 1 , pid 57362 : Teams for group B are Lille Olympiacos Tottenham Bayern
Child 1 , pid 57362 : pots for group B are 4 3 2 1
Child 1 , pid 57362 : country for group B are France Greece England Germany
Child 1 , pid 57362 : Teams for group C are ManCity Valencia Atalanta Shakhtar
Child 1 , pid 57362 : pots for group C are 1 3 4 2
Child 2 , pid 57363 : teams for pot 0 are : Liverpool Arsenal Barcelona ManCity Juventus Bayern Paris Zenit
Child 1 , pid 57362 : country for group C are England Spain Italy Ukraine
Child 2 , pid 57363 : teams for pot 1 are : RealMadrid Atletico Chelsea Dortmund Napoli Shakhtar Tottenham Ajax
Child 2 , pid 57363 : teams for pot 2 are : Benfica Lyon Leverkusen Salzburg Olympiacos Brugge Valencia Internazionale
Child 2 , pid 57363 : teams for pot 3 are : Zagreb Lokomotiv Genk Galatasaray Leipzig SlaviaPraha Atalanta Lille
Child 1 , pid 57362 : Teams for group D are Juventus Leverkusen Lokomotiv Atletico
Child 2 , pid 57363 : Teams for group A are RealMadrid Brugge Galatasaray Paris
Child 1 , pid 57362 : pots for group D are 1 3 4 2
Child 2 , pid 57363 : pots for group A are 2 3 4 1
Child 1 , pid 57362 : country for group D are Italy Germany Russia Spain
Child 2 , pid 57363 : country for group A are Spain Belgium Turkey France
Child 1 , pid 57362 : Teams for group E are Genk Napoli Liverpool Salzburg
Child 1 , pid 57362 : pots for group E are 4 2 1 3
Child 2 , pid 57363 : Teams for group B are Lille Olympiacos Tottenham Bayern
Child 1 , pid 57362 : country for group E are Belgium Italy England Austria
Child 1 , pid 57362 : Teams for group F are Barcelona Internazionale SlaviaPraha Dortmund
Child 2 , pid 57363 : pots for group B are 4 3 2 1
Child 1 , pid 57362 : pots for group F are 1 3 4 2
Child 1 , pid 57362 : country for group F are Spain Italy Czech Germany
Child 2 , pid 57363 : country for group B are France Greece England Germany
Child 1 , pid 57362 : Teams for group G are Leipzig Lyon Zenit Chelsea
Child 2 , pid 57363 : Teams for group C are ManCity Valencia Atalanta Shakhtar
Child 1 , pid 57362 : pots for group G are 4 3 1 2
Child 2 , pid 57363 : pots for group C are 1 3 4 2
Child 1 , pid 57362 : country for group G are Germany France Russia England
Child 2 , pid 57363 : country for group C are England Spain Italy Ukraine
Child 1 , pid 57362 : Teams for group H are Arsenal Benfica Zagreb Ajax
Child 2 , pid 57363 : Teams for group D are Juventus Leverkusen Lokomotiv Atletico
Child 1 , pid 57362 : pots for group H are 1 3 4 2
Child 2 , pid 57363 : pots for group D are 1 3 4 2
Child 1 , pid 57362 : country for group H are England Portugal Croatia Netherlands
Child 2 , pid 57363 : country for group D are Italy Germany Russia Spain
Child 1, pid 57362 : Valid GroupiChild 2 , pid 57363 : Teams for group E are Genk Napoli Liverpool Salzburg
ng
Child 2 , pid 57363 : pots for group E are 4 2 1 3
Child 2 , pid 57363 : country for group E are Belgium Italy England Austria
Child 2 , pid 57363 : Teams for group F are Barcelona Internazionale SlaviaPraha Dortmund
Child 2 , pid 57363 : pots for group F are 1 3 4 2
Child 2 , pid 57363 : country for group F are Spain Italy Czech Germany
Child 2 , pid 57363 : Teams for group G are Leipzig Lyon Zenit Chelsea
Child 2 , pid 57363 : pots for group G are 4 3 1 2
Child 2 , pid 57363 : country for group G are Germany France Russia England
Child 2 , pid 57363 : Teams for group H are Arsenal Benfica Zagreb Ajax
Child 2 , pid 57363 : pots for group H are 1 3 4 2
Child 2 , pid 57363 : country for group H are England Portugal Croatia Netherlands
Child 2, pid 57363 : Valid Grouping
In this program, I fork two processes and the output has a problem that Child 1 interupts and then the Child2 prints its message. Therefore, the "ing" is seperated by one piece of output of Child2.
The sample output should be like
Parent, pid 12352: 2 children, test mode
Child 1, pid 12353: teams for pot 1 are Liverpool Arsenal Barcelona ManCity Juventus Bayern Paris
Zenit
Child 1, pid 12353: teams for pot 2 are RealMadrid Atletico Chelsea Dortmund Napoli Shakhtar
Tottenham Ajax
Child 1, pid 12353: teams for pot 3 are Benfica Lyon Leverkusen Salzburg Olympiacos Brugge
Valencia Internazionale
Child 2, pid 12355: teams for pot 1 are Liverpool Arsenal Barcelona ManCity Juventus Bayern Paris
Zenit
Child 2, pid 12355: teams for pot 2 are RealMadrid Atletico Chelsea Dortmund Napoli Shakhtar
Tottenham Ajax
Child 2, pid 12355: teams for pot 3 are Benfica Lyon Leverkusen Salzburg Olympiacos Brugge
Valencia Internazionale
Child 2, pid 12355: teams for pot 4 are Zagreb Lokomotiv Genk Galatasaray Leipzig SlaviaPraha
Atalanta Lille
Child 2, pid 12355: teams for group A are RealMadrid Brugge Galatasaray Paris
Child 1, pid 12353: teams for pot 4 are Zagreb Lokomotiv Genk Galatasaray Leipzig SlaviaPraha
Atalanta Lille
Child 1, pid 12353: teams for group A are RealMadrid Brugge Galatasaray Paris
Child 1, pid 12353: pots for group A are 2 3 4 1
Child 1, pid 12353: countries for group A are Spain Belgium Turkey France
Child 1, pid 12353: teams for group B are Lille Olympiacos Tottenham Bayern
Child 1, pid 12353: pots for group B are 4 3 2 1
Child 1, pid 12353: countries for group B are France Greece England Germany
. . .
Child 2, pid 12355: countries for group F are Spain Italy Czech Germany
Child 2, pid 12355: teams for group G are Leipzig Lyon Zenit Chelsea
Child 2, pid 12355: pots for group G are 4 3 1 2
Child 1, pid 12353: pots for group H are 1 3 4 2
Child 1, pid 12353: countries for group H are Austria Portugal Croatia Netherlands
Child 1, pid 12353: Valid grouping
Child 2, pid 12355: countries for group G are Germany France Russia England
Child 2, pid 12355: teams for group H are Arsenal Benfica Zagreb Ajax
Child 2, pid 12355: pots for group H are 1 3 4 2
Child 2, pid 12355: countries for group H are Austria Portugal Croatia Netherlands
Child 2, pid 12355: Valid grouping
Here is my code.
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#include <errno.h>
#define LENGTH sizeof(TEAM)/sizeof(TEAM[0])
#define GROUPNUM sizeof(GROUP)/sizeof(GROUP[0])
char *TEAM[] = {"Ajax", "Atalanta", "Atletico", "Barcelona", "Bayern",
"Benfica", "Brugge", "Chelsea", "Crvenazvezda", "Dortmund", "Galatasaray",
"Genk", "Internazionale", "Juventus", "Leipzig", "Leverkusen", "Liverpool",
"Lokomotiv", "Lille", "Lyon", "ManCity", "Napoli", "Olympiacos", "Paris",
"RealMadrid", "Salzburg", "Shakhtar", "SlaviaPraha", "Tottenham", "Valencia",
"Zagreb", "Zenit", "Arsenal"};
char *COUNTRY[] = {"Netherlands", "Italy", "Spain", "Spain", "Germany",
"Portugal", "Belgium", "England", "Serbia", "Germany", "Turkey", "Belgium",
"Italy", "Italy", "Germany", "Germany", "England", "Russia", "France",
"France", "England", "Italy", "Greece", "France", "Spain", "Austria",
"Ukraine", "Czech", "England", "Spain", "Croatia", "Russia", "England"};
char *GROUP[] ={"group A", "group B", "group C", "group D", "group E", "group F", "group G", "group H"};
const char* getCountry(char * team){
int i;
char* str;
for(i=0;i<LENGTH;i++){
str = TEAM[i];
if ( ( strcmp(team, str) ) == 0){
break;
}
}
return COUNTRY[i];
}
int canTheyMeet(char* str1, char* str2){
str1 = getCountry(str1);
str2 = getCountry(str2);
int flag=1;
if ( (strcmp(str1, str2)) == 0 )
flag = 0;
if ( (strcmp(str1, "Ukraine")) == 0 && (strcmp(str2, "Russia")) == 0 )
flag = 0;
if ( (strcmp(str2, "Ukraine")) == 0 && (strcmp(str1, "Russia")) == 0 )
flag = 0;
return flag;
}
// Return Country and It cannot be changed
int whichPot(char * team, char *** result){
int i,j,flag=0;
for(i=0;i<4;i++){
for(j=0;j<8;j++){
char *str = result[i][j];
if( (strcmp(team, str)) == 0 ){
flag=1;
break;
}
}
if(flag==1)
break;
}
return i+1;
}
int isGroupValid(char ** group, char *** result){
int i,j;
int flag = 1;
for(i=0;i<3;i++){
for(j=i+1;j<4;j++){
char *str1 = group[i];
char *str2 = group[j];
if(!canTheyMeet(str1, str2) || whichPot(str1, result) == whichPot(str2, result)){
flag=0;
break;
}
}
}
return flag;
}
// only for test mode, we generate the Group for each teams
char *** generateGroup(char** argv, int length){
int m;
char *** result;
result = (char ***)malloc(sizeof(char**)*8);
for(m=0;m<8;m++){
result[m] = (char**)malloc(sizeof(char*)*4);
}
int k,j;
int i=35;
for(k=0;k<8;k++){
for(j=0;j<4;j++){
char* string = argv[i + j];
result[k][j] = string;
}
i+=4;
}
return result;
}
// set Teams to different pots
// The pots are three dimensional Array
char *** getTeams(char** argv, int length){
int m;
char *** result;
result = (char ***)malloc(sizeof(char**)*4);
for (m=0;m<4;m++){
result[m] = (char**)malloc(sizeof(char*)*8);
}
int i=3;
int j=0;
int k=0;
//Iterative get all the information of setmode
for(; k<4; k++) {
// four pots
for (; j < 8; j ++) {
// get string
char *string = argv[i + j];
result[k][j] = string;
}
i+=8;
j=0;
}
return result;
}
void TestMode(char ** argv, int length, int id){
char *** teamPot = getTeams(argv, length);
char *** groups = generateGroup(argv, length);
int i,j,validBit=1;
for(i=0;i<4;i++){
printf("Child %d , pid %d : teams for pot %d are : ", id, getpid(), i);
for(j=0;j<8;j++){
printf("%s ", teamPot[i][j]);
}
printf("\n");
}
for(i=0;i<8;i++){
printf("Child %d , pid %d : Teams for %s are ", id, getpid(), GROUP[i]);
for(j=0;j<4;j++){
printf("%s ", groups[i][j]);
}
printf("\n");
printf("Child %d , pid %d : pots for %s are ", id, getpid() ,GROUP[i]);
for(j=0;j<4;j++){
printf("%d ", whichPot(groups[i][j], teamPot));
}
printf("\n");
printf("Child %d , pid %d : country for %s are ", id, getpid() ,GROUP[i]);
for(j=0;j<4;j++){
printf("%s ", getCountry(groups[i][j]));
}
printf("\n");
char ** group = groups[i];
if(!isGroupValid(group, teamPot)){
printf("Child %d, pid %d : InValid Grouping\n" , id, getpid());
validBit=0;
break;
}
}
if(validBit)
printf("Child %d, pid %d : Valid Grouping\n", id, getpid());
free(teamPot);
free(groups);
}
void GenerateMode(){}
int main(int argc, char **argv) {
printf("Hello, World! %d\n", GROUPNUM);
int k=0;
int status=0;
// get number of child process
int numOfChild = 2;
// The parent Id
pid_t ppid = getpid();
printf("Parent, pid %d : %d children %s Mode \n", ppid, numOfChild, argv[2]);
pid_t pid;
for(k=0;k<numOfChild;k++){
if(fork() == 0){
if ((strcmp(argv[2], "T") == 0)){
TestMode(argv, argc, k+1);
exit(0);
}else if ((strcmp(argv[2], "G")) == 0){
GenerateMode();
}
}
}
for(k=0;k<numOfChild;k++){
wait(NULL);
}
return 0;
}
Do I need to do some jobs with Lock?
Thanks!!!
All forked processes are writing to stdout asynchronously. There are no guarantees about order. Calling setlinebuf (equivalent to setvbuf(stream, NULL, _IOLBF, 0)) once at start of the program and fflush after each printf should help, but will not necessarily solve your problem in all cases.
Specifying a buffer size larger than your anticipated longest line when calling setvbuf will also help:
char buffer[16000]; // 16000 is the buffer size in bytes
...
int main(int argc, char **argv)
{
setvbuf(stdout, buffer, _IOLBF, sizeof buffer);
...