Using system() in parallel in C (Linux)

Using system() in parallel in C (Linux) - c

My code has a large for loop that contains system(). I.e.
for(i=0;i<totalCycles;i++) {
<some C code>
//Bulk of the cpu time is taken up with the command below
system(command_string);
<some more C code>
}
This for loop can be easily parallelised insofar as an iteration does not depend on a previous iteration. Normally I would use MPI or openMP to parallelise the loop. However, I am not sure how to handle system(). There was a similar question here, but that was in the context of only parallelising system() only, as opposed to system() within a larger loop.

Read the system(3) man page, it mentions fork(2) and execl(3), so read these man pages too.
You probably want to fork all (or probably just many enough) your processes in parallel.
You should read Advanced Linux Programming. It has several chapters on the subject (that I won't summarize here).
You'll need to define if you want to get the output of your parallel processes. I guess that you'll need to use pipe(2) and probably poll(2) to have some event loop handling the incoming (or outgoing) data.
BTW, unless your many processes are mostly idle, you probably don't want to fork a lot of processes. Having much more running processes than you have cores in your processors would probably make your system slow down or thrash.

Related

How many threads will be created by given C fork code

Consider the code given below:
#include <stdio.h>
#include <unistd.h>
int main()
{
fork();
fork() && fork() || fork();
fork();
printf("forked\n");
return 0;
}
The question is how many times forked wil be printed. As per my analysis, it should be printed 20 times. Also this answer confirms the same.
However when I run the code on onlinegdb.com and ideone.com, they print it 18 and 5 times respectively. Why so?

Your code don't create any threads. These are called Pthreads on Linux and you'll use pthread_create(3) (which internally uses clone(2)) to create them.
Of course, your code is using (incorrectly) fork(2) so it creates processes (unless fork is failing). Notice that fork is difficult to understand (and to explain, so I won't even try here). You may need to read a lot about it, e.g. fork wikipage, ALP, and perhaps Operating Systems: Three Easy Pieces (both have several chapters explaining it).
You should handle the failure of fork. As explained here, there are three cases to consider for each fork, and you'll better rewrite your code to do at most one fork per statement (an assignment like pid_t pida = fork();)
BTW, you'll better flush standard streams (and the data in their buffers) before every fork. I recommend using fflush(3) by calling fflush(NULL); before every fork.
Notice that each process has its own (unique) pid (see getpid(2) and credentials(7)). You might understand things better if you print it, so try using something like printf("forked in %ld\n", (long) getpid());
when I run the code
You really should run that code on your computer under Linux. Consider installing a Linux distribution (perhaps in some virtual machine) on your laptop or desktop. Notice that Linux is very developer- and student- friendly, and it is mostly made of free software whose source code you can study.
they print it 18 and 5 times respectively. Why so?
Free web services should limit the resources used by outside clients (on Linux, they probably would use setrlimit(2) for that purpose). Obviously, such sites -giving the ability to run nearly arbitrary C code- want to avoid fork bombs. Very likely, some of your fork-s failed on them (and since your original code don't check for failures, you did not notice that).
Even on your own desktop, you could not create a lot of processes. As a rule of thumb, you might have a few hundreds processes on your computer with most of them being idle (waiting, perhaps with poll(2) or a blocking read(2), etc, ... for some IO or for some timeout, see also time(7)), and only a dozen of them being runnable (by the process scheduler of your kernel). In other words, a process is quite a costly computing resource. If you have too many runnable processes you could experiment thrashing.
Use ps(1) and top(1) (and also htop and pgrep(1)) to query the processes on your Linux system. If you want to do that programmatically, use /proc/ (and see proc(5) for more) - which is used by ps, top, pgrep, htop etc...

executing slow process with popen()

I'm creating a simple network scanning function using nmap and C. I want to use popen() to execute nmap, but nmap takes close to 30 seconds to complete because I'm scanning a wide range of IPs.
So, is there a way to check when the command has finished executing? I don't want my C code to hang at the system call, instead I would like to just check a flag or something in a loop that will know when popen/nmap has finished so other parts of my C program don't come to an halt. Is this possible??
Thanks!

I can think of 2 direct ways to do it
You could fork() directly and then establish a way to communicate the two processes, this would be very difficult because IPC is not an easy matter.
You could create a new thread with pthread_create() and call popen() there, it would be a lot easier and you could share data between threads by using an appropriate locking mechanism to prevent race conditions.

You can either use multi processing with fork (hardmode)
Or you can use multithreading using pthread (easymode)
Either one allows you to do 2 things at once. Multiprocessing is hard because you must worry about innerproccess communications (pipes) and the 2 tasks you're trying to do can not share memory.
Multithreading is a much more easy because all you need is to include the libraries (-lpthread) and then specify what function is on the seperate thread.

Splitting a large multi-thread binary into smaller individual processes/binaries

I'm not sure if the title accurately describes what I want to do but here's the rub:
We have a large and hairy codebase (not-invented-here courtesy of Elbonian Code Slaves) which currently compiles as one big binary which internally creates several pthreads for various specific tasks, communicating through IPC messages.
It's not ideal for a number of reasons, and several of the threads would be better as independent autonomous processes as they are all individual specific "workers" rather than multiple instances of the same piece of code.
I feel a bit like I'm missing some trick, is our only option to split off the various thread code and compile each as a standalone executable invoked using system() or exec() from the main blob of code? It feels clunky somehow.

If you want to take a part of your program that currently runs as a thread, and instead run it as a separate process launched by your main program, then you have two main options:
Instead of calling pthread_create(), fork() and in the child process call the thread-start function directly (do not use any of the exec-family functions).
Compile the code that the the thread executes as a separate executable. Launch that executable at need by the standard fork / exec sequence. (Or you could use system() instead of fork/exec, but don't. Doing so needlessly brings the shell into it, and also gives you much less control.)
The former has the disadvantage that each process image contains a lot of code that it will never use, since each is a complete copy of everything. Inasmuch as under Linux fork() uses copy-on-write, however, that's mostly an address-space issue, not a resource-wastage issue.
The latter has the disadvantage that the main program needs to be able to find the child programs on the file system. That's not necessarily a hard problem, mind you, but it is substantially different from already having the needed code at hand. If there is any way that any of the child programs would be independently useful, however, then breaking them out as separate programs makes a fair amount of sense.
Do note, by the way, that I do not in general accept your premise that it is inappropriate to implement specific for-purpose workers as threads. If you want to break out such tasks, however, then the above are your available alternatives.
Edited to add:
As #EOF pointed out, if you intend that after the revamp your main process will still be multi-threaded (that is, if you intend to convert only some threads to child processes) then you need to be aware of a significant restriction placed by POSIX:
If a multi-threaded process calls fork(), [...] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
On the other hand, I'm pretty sure the relevant definition of "multi-threaded" is that the process has multiple live threads at the time fork() is called. It should not present a problem if the child processes are all forked off before any additional threads are created, or after all but one thread is joined.

Serial communication C/C++ Linux thread safe?

My question is quite simple. Is reading and writing from and to a serial port under Linux thread-safe? Can I read and write at the same time from different threads? Is it even possible to do 2 writes simultaneously? I'm not planning on doing so but this might be interesting for others. I just have one thread that reads and another one that writes.
There is little to find about this topic.
More on detail—I am using write() and read() on a file descriptor that I obtained by open(); and I am doing so simultaneously.
Thanks all!
Roel

There are two aspects to this:
What the C implementation does.
What the kernel does.
Concerning the kernel, I'm pretty sure that it will either support this or raise an according error, otherwise this would be too easy to exploit. The C implementation of read() is just a syscall wrapper (See what happens after read is called for a Linux socket), so this doesn't change anything. However, I still don't see any guarantees documented there, so this is not reliable.
If you really want two threads, I'd suggest that you stay with stdio functions (fopen/fread/fwrite/fclose), because here you can leverage the fact that the glibc synchronizes these calls with a mutex internally.
However, if you are doing a blocking read in one thread, the other thread could be blocked waiting to write something. This could be a deadlock. A solution for that is to use select() to detect when there is some data ready to be read or buffer space to be written. This is done in a single thread though, but while the initial code is a bit larger, in the end this approach is easier and cleaner, even more so if multiple streams are involved.

pausing main thread execution other than sleep() in C

I need to pause the execution of the main thread with out using sleep statement.
is there any function or status values that shows the alive status of other threads like isalive() in java?

pause() often works well; it suspends execution until a signal is received.

Standard C provides no way to pause the main thread, because standard C has no concept of threads. (That's changing in C201X, but that new version of the standard isn't quite finished, and there are no implementations of it.)
Even sleep() (which is a function, not a language-defined statement) is implementation-specific.
So it's not really possible to answer your question without knowing what environment you're using. Do you have multiple threads? If so, what threading library are you using? Pthreads? Win32 threads?
Why does sleep() not satisfy your requirements? (Probably because it pauses all threads, not just the current one.)
(Hint: Whenever you ask "How do I do X without using Y?", tell us why you can't use Y.)
Consult the documentation for whatever thread library you're using. It should provide a function that does what you need.

A extremely simple approach would be using something as simple as getchar().
Other approach could be waiting for a signal with pthread_cond_wait (or any other similar function in a different threading API).
Other approach could be sitting on a tight loop and using a semaphore (or something simpler like a global variable value) to wait for the other threads to finish.
Anyway, there are several options. You don't say enough about your problem to tell what's your best choice here.

select() is often a good choice.
On Linux, epoll() is often a good alternative to select().
And every program, "threaded" or not, always has "main thread". If you're actually using threads, however, look at pthread_cond_wait().