Convert a process based program into a thread based version?

Convert a process based program into a thread based version? - c

I currently have this program which spawns an arbitrary number of child processes and I'm interested in having it implement threads instead of processes. I'm having trouble understanding how to convert from what I have, here is the code which uses processes:
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
void childprocess(int num);
int main(int argc, char **argv) {
if (argc != 2)
{
fprintf(stderr, "Usage: %s num-procs\n", argv[0]);
exit(EXIT_FAILURE);
}
int counter;
pid_t pid = getpid();
int x = atoi(argv[1]);
printf("Parent Process, my PID is %d\n", pid);
for(counter = 1; counter <= x; counter++){
if(!fork()){
printf("Child %d is born, my PID is %d\n", counter, getpid());
childprocess(counter);
printf("Child %d dies\n", counter);
exit(0);
}
}
}
void childprocess(int num){
srand(getpid());
int max = rand() % 100;
for(int i = 0; i < max; i++){
printf("Child %d executes iteration: %d\n", num, i);
}
}
Is it as simple as changing a few lines to make it use threads instead? I understand the theory behind using threads but not how to actually write it in C given what I have here

The general case depends on whether the threads are wholly independent of each other. If you go multi-threaded, you must ensure that any shared resource is accessed appropriate protection.
The code shown has a shared resource in the use of srand() and rand(). You will get different results in a multi-threaded process compared with those from multiple separate processes. Does this matter? If not, you may be able to let the threads run, but the C standard says the srand() and rand() functions don't have to be thread-safe.
If it matters — and it probably does — you need to use a different PRNG (pseudo-random number generator) which allows you to have independent sequences (for example, POSIX nrand48() — there may well be better alternatives, especially if you use modern C++).
The use of getpid() won't work well in a multi-threaded process either. Each thread will get the same value (because the threads are all part of the same process), so the threads won't get independent sequences of random numbers for another reason.

Related

segfault with clone() and printf

I'm trying to experiment with how clone() is implemented for threads in Linux 3.10.0-327.3.1.el7.x86_64
I'm running this piece of code and having occasional segfaults. I know if I use CLONE_THREAD then there's no way to check if the thread finished, but why does printf cause problems? How does the Pthread library handle this issue? Without the printfs there's no segfault.
#define STACK_SIZE (1ULL<<22) //4MB
int tmp = 10;
int threadfunc(void *ptr)
{
printf("i'm here\n");
tmp++;
return 0;
}
int main()
{
char *child_stack = malloc(STACK_SIZE);
int child_pid = clone(&threadfunc, child_stack+(STACK_SIZE)-1, CLONE_THREAD|CLONE_SIGHAND|CLONE_VM|CLONE_FS|CLONE_FILES|SIGCHLD, NULL);
printf("my pid: %d\n", getpid());
printf("child pid: %d\n", child_pid);
printf("tmp is: %d\n", val);
return 0;
}

The printf function doesn't understand that it's being called in a clone
task; it is designed around the glibc pthread library.
What you're doing is similar to interrupting a printf with an asynchronous signal, and re-entering printf from the signal.
How the library handles the issue is that it uses clone as part of a more complicated implementation of threads, which "dresses up" the cloned task with a thread identity. This thread identity allows printf to lock a mutex around the stream operation.
What is the thread identity? What it means is that a thread-related function in the C library can inquire, in some way, "what thread is calling me?" to obtain an ID, which maps to a thread descriptor structure. The clone function alone won't set up this identity.

Thread and Forks

I am relatively new to threads and forks. So to understand them a bit better I have been writing simple programs. One of the little programs I have written two programs, one to print a counter on two processes, and another with two threads.
What I noticed is that the fork prints the counters interlaced while the thread prints one thread's counter and then the others. So the thread is not so parallel, but behaves more serial Why is that? Am I doing something wrong?
Also, what exactly does pthread_join do? Even when I don't do pthread_join the program runs similarly.
Here is my code for the thread
void * thread1(void *a){
int i =0;
for(i=0; i<100; i++)
printf("Thread 1 %d\n",i);
}
void * thread2(void *b){
int i =0;
for(i=0; i<100; i++)
printf("Thread 2 %d\n", i);
}
int main()
{
pthread_t tid1,tid2;
pthread_create(&tid1,NULL,thread1, NULL);
pthread_create(&tid2,NULL,thread2, NULL);
pthread_join(tid1,NULL);
pthread_join(tid2,NULL);
return 0;
}
And here is my code for fork
int main(void)
{
pid_t childPID;
childPID = fork();
if(childPID >= 0) // fork was successful
{
if(childPID == 0) // child process
{ int i;
for(i=0; i<100;i++)
printf("\n Child Process Counter : %d\n",i);
}
else //Parent process
{
int i;
for(i=0; i<100;i++)
printf("\n Parent Process Counter : %d\n",i);
}
}
else // fork failed
{
printf("\n Fork failed, quitting!!!!!!\n");
return 1;
}
return 0;
}
EDIT:
How can I make the threaded program behave more like the fork program? i.e. the counter prints interweave.

You are traveling down a bad road here. The lesson you should be learning is to not try and out think the OS scheduler. No matter what you do - processing schedules, priorities, or whatever knobs you turn - you cannot do it reliably.
You have backed your way into discovering the need for synchronization mechanisms - mutexes, semaphores, condition variables, thread barriers, etc. What you want to do is exactly why they exist and what you should use to accomplish your goals.
On your last question, pthread_join reclaims some resources from dead, joinable (i.e.not detached) threads and allows you to inspect any return variable from the expired thread. In your program they are mostly serving as a blocking mechanism. That is, main will block on those calls until the threads expire. Without the pthread_joins your main would end and the process would die, including the threads you created. If you don't want to join the threads and aren't doing anything useful in main then use pthread_exit in main as this will allow main to exit but the threads to continue processing.

First of all, a fork creates a second process while creating a thread creates a "dispatchable unit of work" within the same process.
Getting two different processes to be interleaved is usually a simple matter of letting the OS run. However, within a process you need to know more about how the OS chooses which of several threads to run.
You could probably, artificially, get the output from the threads to be interleaved by calling sleep for different times from each thread. That is, create thread A (code it to output one line, and then sleep for 100) then create thread B (code it to output one line and then sleep for 50, etc.)
I understand wanting to see how threads can run in parallel, similar to processes. But, is this a real requirement or just a "zoo" request?

c : how to use system() to launch simultaneoulsy N programs but keeping N<N_processors

I have a code called mc that I want to execute thousands of times with different input files. To call the code just one I type :
./mc -sim my_input_file
I'd like to write a small auxiliary program that allows me to run that program thousand times but in a way that it lunches only N times mc at a time (one execution per processor, assuming I have N processors)
What I've done so far is creating an auxiliary program that contains essentially :
unsigned int N_processors(8);
unsigned int m(0);
for(unsigned int i(0); i<1000;i++){
system("./mc -sim " + file[i] +"&"); /*file[i] is the ith input file*/
m++;
while(m>=N_processors){
usleep(1e6);
if(/*test condition*/){m--;}
}
}
My problem is that I don't know by what I should replace /*test condition*/

If you're on a Unix system, you'd better use fork() instead of system().
So you can check that a process is finished before launch another one.
Something like that :
unsigned int N_processors(8);
unsigned int m(0);
int status = 0;
pid_t* pID = new pid_t[pCount];
for(unsigned int i(0); i<1000;i++){
{
pid= fork()
if(!pid) // child process
{
system("./mc -sim " + file[i] +"&"); /*file[i] is the ith input file*/
}
else
{
nbfork++;
if(nbfork >= N_processors)
{
wait(&status); // waiting for a child to finish before spawning another one
nbfork--;
}
}
}
Not tested

Understanding forks in C

I am having some trouble understanding the following simple C code:
int main(int argc, char *argv[]) {
int n=0;
fork();
n++;
printf("hello: %d\n", n);
}
My current understanding of a fork is that from that line of code on, it will split the rest of the code in 2, that will run in parallel until there is "no more code" to execute.
From that prism, the code after the fork would be:
a)
n++; //sets n = 1
printf("hello: %d\n", n); //prints "hello: 1"
b)
n++; //sets n = 2
printf("hello: %d\n", n); //prints "hello: 2"
What happens, though, is that both print
hello: 1
Why is that?
EDIT: Only now it ocurred to me that contrary to threads, processes don't share the same memory. Is that right? If yes, then that'd be the reason.

After fork() you have two processes, each with its own "n" variable.

fork() starts a new process, sharing no variables/memory locations.
It is very similar to what happens if you execute ./yourprogram twice in a shell, assuming the first thing the program does is forking.

At fork() call's end, both the processes might be referring to the same copy of n. But at n++, each gets its own copy with n=0. At the end of n++; n becomes 1 in both the processes. The printf statement outputs this value.

Actually you spawn a new process of the same progarm. It is not the closure kind of thing. You could use pipes to exchange data between parent and child.

You did indeed answer your own question in your edit.

examine this code and everything should be clearer (see the man pages if you don't know what a certain function does):
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int count = 1;
int main(int argc, char *argv[]) {
// set the "startvalue" to create the random numbers
srand(time(NULL));
int pid;
// as long as count is <= 50
for (count; count<=50; count++) {
// create new proccess if count == 9
if (count==9) {
pid = fork();
// reset start value for generating the random numbers
srand(time(NULL)+pid);
}
if (count<=25) {
// sleep for 300 ms
usleep(3*100000);
} else {
// create a random number between 1 and 5
int r = ( rand() % 5 ) + 1;
// sleep for r ms
usleep(r*100000);
}
if (pid==0) {
printf("Child: count:%d pid:%d\n", count, pid);
} else if (pid>0) {
printf("Father: count:%d pid:%d\n", count, pid);
}
}
return 0;
}
happy coding ;-)

The system call forks more than the execution thread: also forked is the data space. You have two n variables at that point.
There are a few interesting things that follow from all this:
A program that fork()s must consider unwritten output buffers. They can be flushed before the fork, or cleared after the fork, or the program can _exit() instead of exit() to at least avoid automatic buffer flushing on exit.
Fork is often implemented with copy-on-write in order to avoid unnecessarily duplicating a large data memory that won't be used in the child.
Finally, an alternate call vfork() has been revived in most current Unix versions, after vanishing for a period of time following its introduction i 4.0BSD. Vfork() does not pretend to duplicate the data space, and so the implementation can be even faster than a copy-on-write fork(). (Its implementation in Linux may be due less to speed reasons than because a few programs actually depend on the vfork() semantics.)

Problem starting a process with CreateProcess()

The question is given like this:
Using the CreateProcess () in the Win32 API. In this instance, you will need to specify a separate program to be invoked from CreateProcess(). It is this separate program that will run as a child process outputting the Fibonacci sequence. Perform necessary error checking to ensure that a non-negative number is passed on the command line.
I have done the following. It doesn't show any error message. When I try to execute it, it exits automatically:
#include <sys/types.h>
#include <windows.h>
#define _WIN32_WINNT 0x0501
#include <stdio.h>
int main()
{
STARTUPINFO si;
PROCESS_INFORMATION pi;
int a=0, b=1, n=a+b,i,ii;
ZeroMemory(&si, sizeof(si));
si.cb = sizeof(si);
if(! CreateProcess("C:\\WINDOWS\\system32\\cmd.exe",NULL,NULL,NULL,FALSE,0,
NULL,NULL,&si,&pi))
printf("\nSorry! CreateProcess() failed.\n\n");
else{
printf("Enter the number of a Fibonacci Sequence:\n");
scanf("%d", &ii);
if (ii < 0)
printf("Please enter a non-negative integer!\n");
else
{
{
printf("Child is producing the Fibonacci Sequence...\n");
printf("%d %d",a,b);
for (i=0;i<ii;i++)
{
n=a+b;
printf("%d ", n);
a=b;
b=n;
}
printf("Child ends\n");
}
{
printf("Parent is waiting for child to complete...\n");
printf("Parent ends\n");
}
}
}
WaitForSingleObject(pi.hProcess, 5000);
printf("\n");
// Close process and thread handles.
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
return 0;
}
What am I doing wrong?

I think you misunderstood your exercise. Also, you might want to do handle inheritance while using CreateProcess. This might be above your skill level, but still a useful lesson:
http://support.microsoft.com/kb/q190351/

Why are you invoking cmd.exe as your process? The problem states that it's the process that you're launching that should be printing the Fibonacci sequence. You shouldn't be doing it in the main process/app.
What you should have is a separate program that takes in a single argument and prints that many items from the Fibonacci sequence. You main program should ask for a number from the user and then launch the other program by passing in this number as an argument.

The question says to spawn a process that outputs a Fibonacci sequence. Get/validate the user input in the main process, then spawn another program (written separately) to print the Fibonacci sequence. Pass the user input to the spawned process as a command-line parameter.