Splitting a large multi-thread binary into smaller individual processes/binaries

Splitting a large multi-thread binary into smaller individual processes/binaries - c

I'm not sure if the title accurately describes what I want to do but here's the rub:
We have a large and hairy codebase (not-invented-here courtesy of Elbonian Code Slaves) which currently compiles as one big binary which internally creates several pthreads for various specific tasks, communicating through IPC messages.
It's not ideal for a number of reasons, and several of the threads would be better as independent autonomous processes as they are all individual specific "workers" rather than multiple instances of the same piece of code.
I feel a bit like I'm missing some trick, is our only option to split off the various thread code and compile each as a standalone executable invoked using system() or exec() from the main blob of code? It feels clunky somehow.

If you want to take a part of your program that currently runs as a thread, and instead run it as a separate process launched by your main program, then you have two main options:
Instead of calling pthread_create(), fork() and in the child process call the thread-start function directly (do not use any of the exec-family functions).
Compile the code that the the thread executes as a separate executable. Launch that executable at need by the standard fork / exec sequence. (Or you could use system() instead of fork/exec, but don't. Doing so needlessly brings the shell into it, and also gives you much less control.)
The former has the disadvantage that each process image contains a lot of code that it will never use, since each is a complete copy of everything. Inasmuch as under Linux fork() uses copy-on-write, however, that's mostly an address-space issue, not a resource-wastage issue.
The latter has the disadvantage that the main program needs to be able to find the child programs on the file system. That's not necessarily a hard problem, mind you, but it is substantially different from already having the needed code at hand. If there is any way that any of the child programs would be independently useful, however, then breaking them out as separate programs makes a fair amount of sense.
Do note, by the way, that I do not in general accept your premise that it is inappropriate to implement specific for-purpose workers as threads. If you want to break out such tasks, however, then the above are your available alternatives.
Edited to add:
As #EOF pointed out, if you intend that after the revamp your main process will still be multi-threaded (that is, if you intend to convert only some threads to child processes) then you need to be aware of a significant restriction placed by POSIX:
If a multi-threaded process calls fork(), [...] to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
On the other hand, I'm pretty sure the relevant definition of "multi-threaded" is that the process has multiple live threads at the time fork() is called. It should not present a problem if the child processes are all forked off before any additional threads are created, or after all but one thread is joined.

Related

Can GSubprocess be used in a thread safely?

I ran across some problems with GtkSubprocess, and I figured out that it is related to using threads, and is there a way to make it immune to concurrency problems?
I have this program that does some operations on a file, which are individually represented by GtkListBoxRows. When the GSubprocess finishes, and I attempt to remove the list box row, the program segfaults. BTW, each file has its own process, so if a user loads 10 files, there will be 10 threads (this is managed by GThreadPool). Interestingly, if I comment out the code that launches the process, and the code that blocks the thread function till the process finishes, the program does not segfault. So I deduced that GSubprocess is having problems with concurrency. The error produced varies a lot, so this must be due to time-related problems.
I wanted to use GSubprocess because it is relatively easy to get the output of the command, which I need. Will I need to move my invocations of GSubprocess outside of the thread function?

I found out that it is not safe, due to its internal implementation in the GTK+ source code. And you should not even use threads in an application as well, as stated here. Here is my workaround: create the process in the main loop, and wait for the process to terminate using the async version of the call. Thus you avoid threads.

How to get the list of all pthread Ids from main thread [duplicate]

I have a multi-threaded application in a POSIX/Linux environment - I have no control over the code that creates the pthreads. At some point the process - owner of the pthreads - receives a signal.
The handler of that signal should abort,cancel or stop all the pthreads and log how many pthreads where running.
My problem is that I could not find how to list all the pthreads running in process.

There doesn't seem to be any portable way to enumerate the threads in a process.
Linux has pthread_kill_other_threads_np, which looks like a leftover from the original purely-userland pthreads implementation that may or may not work as documented today. It doesn't tell you how many threads there were.
You can get a lot of information about your process by looking in /proc/self (or, for other processes, /proc/123). Although many unices have a file or directory with that name, the layout is completely different, so any code using /proc will be Linux-specific. The documentation of /proc is in Documentation/filesystems/proc.txt in the kernel source. In particular, /proc/self/task has a subdirectory for each thread. The name of the subdirectory is the LWP id; unfortunately, [1][2][3] there doesn't seem to be a way to associate LWP ids with pthread ids (but you can get your own thread id with gettid(2) if you work for it). Of course, reading /proc/self/task is not atomic; the number of threads is available atomically through /proc/self/status (but of course it might change before you act on it).
If you can't achieve what you want with the limited support you get from Linux pthreads, another tactic is to play dynamic linking tricks to provide your own version of pthread_create that logs to a data structure you can inspect afterwards.

You could wrap ps -eLF (or another command that more closely reads just the process you're interested in) and read the NLWP column to find out how many threads are running.

Given that the threads are in your process, they should be under your control. You can record all of them in a data structure and keep track.
However, doing this won't be race-condition free unless it's appropriately managed (or you only ever create and join threads from one thread).
Any threads created by libraries you use are their business and you should not be messing with them directory, or the library may break.
If you are planning to exit the process of course, you can just leave the threads running anyway, as calling exit() terminates them all.
Remember that a robust application should be crash-safe anyway, so you should not depend upon shutdown behaviour to avoid data loss etc.

Lighter weight alternatives to fork() in POSIX C?

In the man pages I've been reading, it seems popen, system, etc. tend to call fork(). In turn, fork() copies the process's entire memory state. This seems really heavy, especially when in many situations a child from a call to fork() uses little if any of the memory allocated for the parent.
So, my question is, can I get fork() like behavior without duplicating the whole memory state of the parent process? Or is there something I am missing, such that fork() is not as heavy as it appears (like, maybe calls tend to be optimized to avoid unnecessary memory duplication)?

fork(2) is, as all syscalls, a primitive operation (but some C libraries use clone(2) for it), from the point of view of user-space application. It is mostly a single machine instruction SYSCALL or SYSENTER to switch from user-mode to kernel-mode, then the (recent version of) Linux kernel is doing quite significant processing.
It is in practice quite efficient (e.g. less than a millisecond, and sometimes even less than a tenth of it) because the kernel is extensively using lazy copy-on-write techniques to share pages between parent & child processes. The actual copying would happen later, on page faults, when overwriting a shared page.
And forking has a huge advantage, since the starting of some other program is delegated to execve(2): it is conceptually simple: the only difference between the parent & child processes is the result of fork
BTW on POSIX systems such as Linux, fork(2) or the suitable clone(2) equivalent is the only way to create a process (there are some few weird exceptions that you should generally ignore: the kernel is making some processes like /sbin/init etc...), since vfork(2) is obsolete.

The problem is that to run the main function of a standardly linked executable, you need to call execve, and exec replaces the whole process image and so you need a new address space, which is what fork is for.
You can get around this by having your calee expose its main functionality in a shared library (but then it must not be called main), and then you can load the function with the main functionality without having to fork (provided there are no symbol conflicts).
That would be a more efficient alternative to system (basically with the efficiency of a function call).
Now popen involves pipes and to use pipes you need to have the pipe ends in different schedulable units. Threads, which use the same address space, can be used here as a lighter alternative to separate processes.

As you alluded to fork() is a bit of a mad syscall that has kind of stuck around for historical reasons. There's a great article about its flaws here, and also this post goes into some details and potential workarounds.
Although on Linux fork() is optimised to use copy-on-write for the memory, it's still not "free" because:
It still has to do some memory-related admin (new page tables, etc.)
If you're using RAII (e.g. in C++ or possibly Rust) then all the objects that are copied will be cleaned up twice. That might even lead to logic errors (e.g. deleting temporary files twice).
It's likely that the parent process will keep running, probably modifying lots of its memory, and then it will have to be copied.
The alternatives appear to be:
vfork()
clone()
posix_spawn()
vfork() was created for the common use case of doing fork() and then execve() to run a program. execve() replaces all of the memory of the current process with a new set, so there's no point copying the parent process's memory if your just about to obliterate it.
So vfork() doesn't do that. Instead it runs in the same memory space as the parent process and pauses it until it gets to execve(). The Linux man page for vfork() says that doing just about anything except vfork() then execve() is undefined behaviour.
posix_spawn() is basically a nice wrapper around vfork() and then execve().
clone() is similar to fork() but allows you to exactly specify what is copied (file descriptors, memory, etc.). It has a load of options, including one (CLONE_VM) which lets the child process run in the same address space as the parent, which is pretty wild! I guess that is the lightest weight way to make a new process because it doesn't involve any copying of memory at all!
But in practice I think in most situations you should either:
Use threads, or
Use posix_spawn().
(Note, I am just researching this now; I'm not an expert so I might have got some things wrong.)

Run program within C _without_ using a shell

I'm trying to run an application in C, but the only way I could find that is reasonably easy to use works like this:
system("command here");
It works, of course, but it's really slow (especially when repeating this a lot). I'm just wondering if there is a way of running a program without having to interact with a shell, something like python's subprocess module.
I have heard of execl, and I would use that (forking it first, of course), but I'm wondering if there is a simpler way that wouldn't require forking first.
EDIT: I also want to be able to know the return code of the program

As I'm sure you already know, system already employs the fork/exec strategy. I understand you want to circumvent the shell and are looking for a simple approach, I'm just saying you could just as easily write a function to wrap the fork/exec pattern as is done in system. Indeed it would probably be most straightforward to just do that. An alternative as Gabe mentioned in the comments is posix_spawn.
A faster (but apparently discouraged) alternative is vfork() / exec, but this is generally discouraged and is obsolete in the latest POSIX standards.
4.3BSD; POSIX.1-2001 (but marked OBSOLETE). POSIX.1-2008 removes the
specification of vfork().
It's meant to be immediately followed by an exec or _exit. Otherwise all kinds of weird bugs can arise since the virtual memory pages and page tables aren't duplicated (child uses same data/heap/stack segments). The parent/calling process blocks until the child execs or _exits. Regular fork's modern implementations have copy-on-write semantics which approach the speed of vfork, without the potential bugs incurred by vfork's memory sharing semantics.
If you want even further control over memory-sharing semantics and process inheritance, and the consequent potential speed-up (and are on Linux), look into clone() (wrapper for system-call sys_clone()) which is what some process-creating system calls delegate their work to. Be sure to carefully comb over all of the various flags.
You can use waitpid to get the exit status of the process.

If neither system() nor popen() provides the mechanism you need, then the easy way to do it is with fork() and execv() (or, perhaps, execl(), but the argument list must be fixed at compile time, not variable, to use it). Really! It is not hard to do fork() and exec(), and any alternative will encapsulate that processing.
The Python subprocess module is simply hiding fork() and exec() for you behind a convenient interface. That's probably appropriate for a high-level language like Python. C is a lower-level language and doesn't really need the complexity.
The hard way to do it is with posix_spawn(). You have to create arguments to describe all the actions you want done in the child between the fork() and the exec(), which is far harder to set up than it is to simply do the fork(), make the changes, and then use exec() after all. This (posix_spawn()) is what you get when you design the code to spawn a child process without visibly using fork() and exec() and ensure that it can handle almost any reasonable circumstance.
You'll need to consider whether you need to use wait() or waitpid() or a variant to determine when the child is complete. You may need to consider whether to handle the SIGCHLD signal (which will notify you when a child dies).

POSIX API call to list all the pthreads running in a process

I have a multi-threaded application in a POSIX/Linux environment - I have no control over the code that creates the pthreads. At some point the process - owner of the pthreads - receives a signal.
The handler of that signal should abort,cancel or stop all the pthreads and log how many pthreads where running.
My problem is that I could not find how to list all the pthreads running in process.

There doesn't seem to be any portable way to enumerate the threads in a process.
Linux has pthread_kill_other_threads_np, which looks like a leftover from the original purely-userland pthreads implementation that may or may not work as documented today. It doesn't tell you how many threads there were.
You can get a lot of information about your process by looking in /proc/self (or, for other processes, /proc/123). Although many unices have a file or directory with that name, the layout is completely different, so any code using /proc will be Linux-specific. The documentation of /proc is in Documentation/filesystems/proc.txt in the kernel source. In particular, /proc/self/task has a subdirectory for each thread. The name of the subdirectory is the LWP id; unfortunately, [1][2][3] there doesn't seem to be a way to associate LWP ids with pthread ids (but you can get your own thread id with gettid(2) if you work for it). Of course, reading /proc/self/task is not atomic; the number of threads is available atomically through /proc/self/status (but of course it might change before you act on it).
If you can't achieve what you want with the limited support you get from Linux pthreads, another tactic is to play dynamic linking tricks to provide your own version of pthread_create that logs to a data structure you can inspect afterwards.

You could wrap ps -eLF (or another command that more closely reads just the process you're interested in) and read the NLWP column to find out how many threads are running.

Given that the threads are in your process, they should be under your control. You can record all of them in a data structure and keep track.
However, doing this won't be race-condition free unless it's appropriately managed (or you only ever create and join threads from one thread).
Any threads created by libraries you use are their business and you should not be messing with them directory, or the library may break.
If you are planning to exit the process of course, you can just leave the threads running anyway, as calling exit() terminates them all.
Remember that a robust application should be crash-safe anyway, so you should not depend upon shutdown behaviour to avoid data loss etc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight