C library function exit() with SLURM on cluster - c

I'm having a problem with the simple usage of exit().
The context is running a program on a cluster for parallel computing, so we have a Slurm system installed managing and watching over all processes.
The problem is now that, when calling exit(1) in my program, SLURM doesn't seem to register that and the CPUs stay busy, burning up my allocated CPU-hours uselessly, although the program has already terminated.
So my question is:
What does exit() do differently compared to a regular return 1 in main()?
Is there a simple way to fix my exit signal?

The default behaviour of SLURM is to allow processes in a job to complete, even if one process crashes or exits with a non-zero exit code. You can change this by setting KillOnBadExit=1 in your slurm.conf, or using either -K/--kill-on-bad-exit=1 with srun.

Related

Can GSubprocess be used in a thread safely?

I ran across some problems with GtkSubprocess, and I figured out that it is related to using threads, and is there a way to make it immune to concurrency problems?
I have this program that does some operations on a file, which are individually represented by GtkListBoxRows. When the GSubprocess finishes, and I attempt to remove the list box row, the program segfaults. BTW, each file has its own process, so if a user loads 10 files, there will be 10 threads (this is managed by GThreadPool). Interestingly, if I comment out the code that launches the process, and the code that blocks the thread function till the process finishes, the program does not segfault. So I deduced that GSubprocess is having problems with concurrency. The error produced varies a lot, so this must be due to time-related problems.
I wanted to use GSubprocess because it is relatively easy to get the output of the command, which I need. Will I need to move my invocations of GSubprocess outside of the thread function?
I found out that it is not safe, due to its internal implementation in the GTK+ source code. And you should not even use threads in an application as well, as stated here. Here is my workaround: create the process in the main loop, and wait for the process to terminate using the async version of the call. Thus you avoid threads.

C difference between main thread and other threads

Is there a difference between the first thread and other threads created during runtime. Because I have a program where to abort longjmp is used and a thread should be able to terminate the program (exit or abort don't work in my case). Could I safely use pthread_kill_other_threads_np and then longjmp?
I'm not sure what platform you're talking about, but pthread_kill_other_threads_np is not a standard function and not a remotely reasonable operation anymore than free_all_malloced_memory would be. Process termination inherently involves the termination of all threads atomically with respect to each other (they don't see each other terminate).
As for longjmp, while there is nothing wrong with longjmp, you cannot use it to jump to a context in a different thread.
It sounds like you have an XY problem here; you've asked about whether you can use (or how to use) particular tools that are not the right tool for whatever it is you want, without actually explaining what your constraints are.

Linux multithreading with own memory space possible?

I have a Linux C program that runs well in a Raspberry 3. When I run it in a low memory situation in another sbc (Raspberry Zero) it runs about 2-3 days then freezes. I believe it's a stack overflow situation.
I've put a thread to check periodically when the main program has frozen. Unfortunately it appears that if the main process crashes, it takes down all of the other threads in the process.
I can avoid this by having another process checking upon the first process, but I'd prefer a thread. Is it possible to have thread that is safe and does not freeze it the main process freezes?
Easily no, it's not possible because per thread definition they share memory and they are part of the main process and it own them all. So everything afflict the main process afflict all the threads.

is exit command in multithread programming exit completely?

I write a program with C. I have 3 threads which are working concurrently. (and for protecting the critical section I use semaphore). my program exit just in some situation (ending situation which provide i=by if command) which exist in thread number 2. with command: exit(-1)
When I run my program in linux, when it arrives to this condition it exit completely. But I am still not sure if all other threads exit or not? and also if they remain in memory or not? someone told me they remain as Zombie and so it could harm the system, but when I look to processes (with ps command) I saw nothing. Now I need some help about the kind of ending the all thread and also look for zombies in my system.
exit terminates the whole program, no threads are running afterwards. This might not be what you want depending on how your program is designed - no cleanup is done, threads are terminated as they are in the time of termination.

Is it a bad practice to call exit() inside a parallel region in OpenMP?

I have a program using both MPI and OpenMP.
The master spawns several slaves.
Each slave is multithreaded with OpenMP and one thread is dedicated to communications (MPI_THREAD_FUNNELED).
When the communication thread receives a message from the master indicating that the process has to stop I don't want to wait for all threads inside the parallel region to finish.
So for now I call the exit() function inside the parallel region but I'm wondering if it's a bad practice and if there is a more elegant way to exit a process inside a parallel region ?
Summary
This is valid OpenMP but incorrect MPI.
OpenMP
From page 3 of OpenMP 4.0:
For C/C++, an executable statement, possibly compound, with a single
entry at the top and a single exit at the bottom, or an OpenMP
construct.
...
Calls to exit() are allowed in a structured block.
MPI
From page 357 of MPI 3.1 (definition of MPI_Finalize):
This routine cleans up all MPI state. If an MPI program terminates normally (i.e., not due to a call to MPI_ABORT or an unrecoverable error) then each process must call MPI_FINALIZE before it exits.
Practical consequencies
In practice, there are usually minimal adverse consequences to violating this part of the MPI standard. However, it is possible that not calling MPI_Finalize will cause resource leaks in some implementations, which may eventually accumulate to make nodes in your system unusable until they are rebooted.
Because MPI_Finalize is collective, it cannot be used like exit, although you can - in theory - use MPI_Abort to exit locally. However, this may tear down the entire MPI environment, since many implementations are not rigorous about localizing failures, even if MPI_Abort is called as MPI_Abort(MPI_COMM_SELF,0).
It's safe from OS' point of view. OS closes all handles, terminates threads and frees all associated memory when you exit a process. Modern operating systems have to do that because processes can exit inadvertently and that must not affect the system stability.
But from your app's point of view it all depends. Can your app exit in a dirty state? If you miss a disk write would it corrupt your data files? If you don't send a packet would be transaction clean, would everything still keep in sync? It's all up to what your app is doing.

Resources