If a running process's executable is deleted, I've noticed fork fails where the child process is never executed.
For example, consider the code below:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
sleep(5);
pid_t forkResult;
forkResult = fork();
printf("after fork %d \n", forkResult);
return 0;
}
If I compile this and delete the resulting executable before fork is called, I never see fork return a pid of 0, meaning the child process never starts. I only have a Mac running Big Sur, so not sure if this repros on other OS's.
Does anyone know why this would be? My understanding is an executable should work just fine even if it's deleted while still running.
The expectation that the process should continue even if the binary was deleted is correct, however not fully correct in case of macOS. The example is tripping on a side-effect of the System Integrity Protection (SIP) mechanism inside the macOS kernel, however before explaining what is exactly going on, we need to make several experiments which will help us to better understand the whole scenario.
Modified example to better demonstrate the issue
To demonstrate what is going on, I had modified the example to count to 9, than do the fork, after the fork, the child will print a message "I am done", wait 1 second and exit by printing the 0 as the PID. The parent will continue to count to 14 and print the child PID. The code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
for(int i=0; i <10; i++)
{
sleep(1);
printf("%i ", i);
}
pid_t forkResult;
forkResult = fork();
if (forkResult != 0) {
for(int i=10; i < 15; i++) {
sleep(1);
printf("%i ", i);
}
} else {
sleep(1);
printf("I am done ");
}
printf("after fork %d \n", forkResult);
return 0;
}
After compiling it, I have started the normal scenario:
╰> ./a.out
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 4385
So, the normal scenario works as expected. The fact that we see the count from 0 to 9 two times, is due to the copy of the buffers for stdout that was done in the fork call.
Tracing the failing example
Now is time to do the negative scenario, we will wait for 5 seconds after the start and remove the binary.
╰> ./a.out & (sleep 5 && rm a.out)
[4] 8555
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 8677
[4] 8555 done ./a.out
We see that the output is only from the parent. Since the parent had counted to 14, and shows valid PID for the child, however the child is missing, it never printed anything. So, the child creation failed after the fork() was performed, otherwise fork() would have received and error instead of a valid PID. Traces from ktrace reveal that the child was created under the pid and was waken up:
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.3 MACH_DISPATCH 1bc 0 84 4 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.2 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 41 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0(0.0) TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_thread_qos_and_relprio 88775d 20000 20200 6 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_update_thread 88775d 811200 140000100 1f 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.8) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(1.1) imp_thread_qos_and_relprio 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_thread_qos_workq_override 88775d 30000 20200 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.1) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(0.2) imp_thread_qos_workq_override 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623857 +04 1.3 TURNSTILE_turnstile_added_to_thread_heap 88775d 9931ba6049ddcc77 0 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623858 +04 1.0 MACH_MKRUNNABLE 88775d 25 0 5 888065 2 a.out(8677)
t
So the child's process was dispatched with MACH_DISPATCH and made runnable with MACH_MKRUNNABLE. This is the reason the parent got valid PID after the fork().
Further more the ktrace for the normal scenario shows that the process had issued BSC_exit and and imp_task_terminated system call occurred, which is the normal way for a process to exit. However, in the second scenario where we had deleted the file, the trace doesn't show BSC_exit. This means that the child was terminated by the kernel, not by a normal termination. And we know that the termination happend after the child was created properly, since the parent had received the valid PID and the PID was made runnable.
This bring us closer to the understanding of what is going on here. But, before we have the conclusion, let's show another even more "twisted" example.
Even more strange example
What if we replace the binary on the filesystem after we started the process?
Here is the test to answer this question: we will start the process, remove the binary and create an empty file with the same name on his place with touch.
╰> ./a.out & (sleep 5 && rm a.out; touch a.out)
[1] 6264
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 6851
[1] + 6722 done ./a.out
Wait a minute, this works!? What is going on here!?!?
This strange example gives us important clue that will help us to explain what is going on.
The root-cause of the issue
The reason why the third example works, while the second one is failing, reveals a lot of what is going on here. As mentioned on the beginning, we are tripping on a side-effect of SIP, more precisely on the runtime protection mechanism.
To protect the system integrity, SIP will examine the running processes for the system protection and special entitlement. From the apple documentation: ...When a process is started, the kernel checks to see whether the main executable is protected on disk or is signed with an special system entitlement. If either is true, then a flag is set to denote that it is protected against modification. Any attempt to attach to a protected process is denied by the kernel...
When we had removed the binary from the filesystem, the protection mechanism was not able to identify the type of process for the child nor the special system entitlements since the binary file was missing from the disk. This triggered the protection mechanism to treat this process as an intruder in the system and terminate it, hanse we had not seen the BSC_exit for the child process.
In the third example, when we created dummy entry on the file system with touch, the SIP was able to detect that this is not a special process nor it has special entitlements and allowed the process to continue. This is a very solid indication that we ware tripping on the SIP realtime protection mechanism.
To prove that this is the case, I have disabled the SIP which requires a restart in the recovery mode and executed the test
╰> csrutil status
System Integrity Protection status: disabled.
╰> ./a.out & (sleep 5 && rm a.out)
[1] 1504
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 1626
Conclusion
So, the whole issue was caused by the System Integrity Protection. More details can be fond in the documentation
All the SIP needed was to have a file on the filesystem with the process name, so the mechanism can run the verification and decide to allow the child to continue the execution. This is showing us that we are observing a side-effect, rather than designed behavior, since the empty file was not even a valid dwarf, yet the execution had proceed.
When I run the following code and then run ps to see the processes running, I only get 4 ./a.out running processes even though there are 5 forks. Why is that? How am I able to see the other fork? Also, if multiple people are using my computer, running the same process a.out, how can I terminate only my processes, using only linux commands? Please help.
PID TTY TIME CMD
32941 ttys000 0:00.10 -bash
34098 ttys000 0:00.08 less
33651 ttys000 0:00.00 ./a.out
33652 ttys000 0:00.00 ./a.out
33654 ttys000 0:00.00 ./a.out
33655 ttys000 0:00.00 ./a.out
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#define N 5 /* How many children to make. */
#define D 1200 /* Sleep 1200 seconds = 20 minutes */
int main(void)
{
int i;
pid_t p;
for (i = 0; i < N; i++) {
p = fork();
if (p == 0) {
close(0);
close(1);
close(2);
if (i == 2) {
setsid();
}
sleep(D);
return 0;
}
}
return 0;
}
All five processes are running, but you're using the ps command that only shows processes associated with your session, but the setsid() called for i=2 sets a different session, so the default ps options won't show you.
$ ps
PID TTY TIME CMD
3048 pts/0 00:00:00 bash
8288 pts/0 00:00:00 a.out
8289 pts/0 00:00:00 a.out
8291 pts/0 00:00:00 a.out
8292 pts/0 00:00:00 a.out
8301 pts/0 00:00:00 ps
$ ps -fe | grep a.out
steve 8288 1 0 13:39 pts/0 00:00:00 ./a.out
steve 8289 1 0 13:39 pts/0 00:00:00 ./a.out
steve 8290 1 0 13:39 ? 00:00:00 ./a.out <-- missing one
steve 8291 1 0 13:39 pts/0 00:00:00 ./a.out
steve 8292 1 0 13:39 pts/0 00:00:00 ./a.out
steve 8303 3048 0 13:40 pts/0 00:00:00 grep --color=auto a.out
$
By default, ps only displays processes with the same session id as itself and associated with a terminal. When you changed the session id of one of the processes, you disqualified it from being listed. ps x will include all of your processes.
By default, ps only lists the processes belong to you. You can be sure of this by using ps ux to include the owners of the processes in the output.
You can use the kill utility to kill these processes. Unless you are root, you can't kill other processes owned by others even if you tried.
I have built a bash script that runs fine when executed from the command line but does not work when run as batch job (with at). First I thought because of the environment but when debugging I think there is a problem with arrays I need to create. When run from command line log is created and its content is what I expected but when run with at any log is created. Any idea for what is causing this issue?
A short script with the piece of code I suppose it is not running is below
#!/bin/bash
fsol=`date +%Y%m%d`
for dia
in 0 1 2
do
var=$(date -d "$fsol +$dia days" +'%Y-%m-%d')
orto=`awk -v j=$var 'BEGIN { FS=","} $2 == j { print $3}' hora-sol.dat`
h_orto=${orto:0:2}
m_orto=${orto:2:2}
a_orto+=($h_orto $m_orto)
echo "dia $dia" $var $h_orto $m_orto >> log1.txt
done
echo ${a_orto[#]} >> log2.txt
Data in hora-sol.dat
32,2016-02-01,0711,1216,1722,10.1885659530428
33,2016-02-02,0710,1216,1723,10.2235441870822
34,2016-02-03,0709,1216,1724,10.2589836910036
35,2016-02-04,0708,1216,1725,10.2948670333624
36,2016-02-05,0707,1216,1727,10.3311771153741
37,2016-02-06,0706,1217,1728,10.3678971831004
38,2016-02-07,0705,1217,1729,10.4050108377139
39,2016-02-08,0704,1217,1730,10.4425020444393
40,2016-02-09,0703,1217,1731,10.4803551390436
41,2016-02-10,0701,1217,1733,10.5185548339287
42,2016-02-11,0700,1217,1734,10.5570862213108
43,2016-02-12,0659,1217,1735,10.5959347763989
44,2016-02-13,0658,1217,1736,10.6350863580571
45,2016-02-14,0657,1217,1737,10.6745272092687
46,2016-02-15,0655,1217,1738,10.7142439549499
47,2016-02-16,0654,1217,1740,10.7542236006922
48,2016-02-17,0653,1217,1741,10.7944535282585
49,2016-02-18,0652,1216,1742,10.8349214920733
50,2016-02-19,0650,1216,1743,10.8756156133281
51,2016-02-20,0649,1216,1744,10.9165243743526
52,2016-02-21,0648,1216,1745,10.9576366115941
53,2016-02-22,0646,1216,1746,10.9989415078031
54,2016-02-23,0645,1216,1747,11.0404285846154
55,2016-02-24,0644,1216,1749,11.0820876932144
56,2016-02-25,0642,1216,1750,11.123909005324
57,2016-02-26,0641,1215,1751,11.1658830035395
58,2016-02-27,0639,1215,1752,11.2080004711946
59,2016-02-28,0638,1215,1753,11.2502524821626
60,2016-02-29,0636,1215,1754,11.2926303895977
Running manually, it generated:
# cat log.txt
dia 0 2016-02-12 0659 1217 1735
dia 1 2016-02-13 0658 1217 1736
dia 2 2016-02-14 0657 1217 1737
06
59
06
58
06
57
Scheduling with at:
# echo "/tmp/horasol/script.sh" | at now +1 minute
warning: commands will be executed using /bin/sh
job 1 at Fri Feb 12 12:11:00 2016
It generated exactly the same:
# cat log.txt
dia 0 2016-02-12 0659 1217 1735
dia 1 2016-02-13 0658 1217 1736
dia 2 2016-02-14 0657 1217 1737
06
59
06
58
06
57
Note that warninig informing that 'at' uses /bin/sh:
warning: commands will be executed using /bin/sh
Tell us how you conclude that "does not work when run as batch job (with at)"
Tell us more about your "when debugging" moment.
Perhaps I'm reproducing here using a different proccess as you. And due to this difference it works for me.
This C program calls METIS to partition a mesh.
Edit: New version of the C program taking into account comments of WeatherVane and PaulOgilvie.
On my GNU/Linux I get the results:
objval: 14
epart: 0 0 0 0 0 1 2 2 1 0 0 1 2 2 1 2 2 1
npart: 0 0 0 2 0 0 1 1 2 2 2 1 2 2 1 1
8
while on my OSX I get:
objval: 17
epart: 0 1 1 0 1 0 2 2 0 1 1 1 2 2 1 2 2 0
npart: 0 1 1 1 0 1 0 1 2 2 2 0 2 2 0 0
8
What causes the results to be different?
How to fix it, I mean, always get the same results whatever the OS/architecture/compiler is?
Note: idx_t is int64_t, which is long on my GNU/Linux, but long long on my OSX.
My GNU/Linux
$ cat /etc/issue
Ubuntu 12.04.4 LTS \n \l
$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ uname -s -r -v -p -i -o
Linux 3.5.0-45-generic #68~precise1-Ubuntu SMP Wed Dec 4 16:18:46 UTC 2013 x86_64 x86_64 GNU/Linux
My OSX
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.9.5
BuildVersion: 13F34
$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
$ uname -m -p -r -s -v
Darwin 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64 i386
METIS installation
METIS version is 5.1.0
I have installed METIS with miniconda.
The packages are here
(files linux-64/metis-5.1.0-0.tar.bz2 and osx-64/metis-5.1.0-2.tar.bz2).
These packages have been built with this recipe.
METIS make use of pseudo-random numbers.
The pseudo-random numbers are generated by GKlib functions. (GKlib is embedded inside METIS tarbarlls).
By default, GKlib uses the rand function from the C standard library, which may generates different number on different platforms. (see: Consistent pseudo-random numbers across platforms).
But GKlib can also be compiled with the flag -DUSE_GKRAND. Instead of using the rand function, it uses its own, which always give the same random numbers of different plateforms.
Compiling with -DUSE_GKRAND the C code in the function give the same results on my GNU/Linux and on my OSX:
objval: 18
epart: 0 0 0 2 1 1 2 2 1 0 0 1 0 1 1 2 2 1
npart: 0 0 0 0 2 0 1 1 2 1 2 1 2 2 1 1
8
I've used this conda recipe to build METIS.
Background
Code to support a 'panic_action' was recently added to the FreeRADIUS v3.0.x, v2.0.x and master branches.
When radiusd (main FreeRADIUS process) receives a fatal signal (SIGFPE, SIGABRT, SIGSEGV etc...), the signal handler executes a predefined 'panic_action' which is a snippet of shell code passed to system(). The signal handler performs basic substitution for %e and %p writing in the values of the current binary name, and the current PID.
This should in theory allow a debugger like gdb or lldb to attach to the process (panic_action = lldb -f %e -p %p), either to perform interactive debugging, or to automate collection of a backtrace. This actually works well on my system OSX 10.9.2 with lldb, but only for SIGABRT.
Problem
This doesn't seem to work for other signals like SIGSEGV. The mini backtrace from execinfo is valid, but when lldb or gdb attach to the process, they only get the backtrace from for the signal handler.
There doesn't seem to be a way in lldb to switch to an arbitrary frame address.
Does anyone know if there's any way of forcing the signal handler to execute in the same stack as the the thread that received the signal? Or why when lldb attaches the backtraces don't show the full stack.
The actual output looks like:
FATAL SIGNAL: Segmentation fault: 11
Backtrace of last 12 frames:
0 libfreeradius-radius.dylib 0x000000010cf1f00f fr_fault + 127
1 libsystem_platform.dylib 0x00007fff8b03e5aa _sigtramp + 26
2 radiusd 0x000000010ce7617f do_compile_modsingle + 3103
3 libfreeradius-server.dylib 0x000000010cef3780 fr_condition_walk + 48
4 radiusd 0x000000010ce7710f modcall_pass2 + 191
5 radiusd 0x000000010ce7713f modcall_pass2 + 239
6 radiusd 0x000000010ce7078d virtual_servers_load + 685
7 radiusd 0x000000010ce71df1 setup_modules + 1633
8 radiusd 0x000000010ce6daae read_mainconfig + 2526
9 radiusd 0x000000010ce78fe6 main + 1798
10 libdyld.dylib 0x00007fff8580a5fd start + 1
11 ??? 0x0000000000000002 0x0 + 2
Calling: lldb -f /usr/local/freeradius/sbin/radiusd -p 1397
Current executable set to '/usr/local/freeradius/sbin/radiusd' (x86_64).
Attaching to process with:
process attach -p 1397
Process 1397 stopped
(lldb) bt
error: libfreeradius-radius.dylib debug map object file '/Users/arr2036/Documents/Repositories/freeradius-server-fork/build/objs//Users/arr2036/Documents/Repositories/freeradius-server-master/src/lib/debug.o' has changed (actual time is 0x530f3d21, debug map time is 0x530f37a5) since this executable was linked, file will be ignored
* thread #1: tid = 0x8d824, 0x00007fff867fee38 libsystem_kernel.dylib`wait4 + 8, queue = 'com.apple.main-thread, stop reason = signal SIGSTOP
frame #0: 0x00007fff867fee38 libsystem_kernel.dylib`wait4 + 8
frame #1: 0x00007fff82869090 libsystem_c.dylib`system + 425
frame #2: 0x000000010cf1f2e1 libfreeradius-radius.dylib`fr_fault + 849
frame #3: 0x00007fff8b03e5aa libsystem_platform.dylib`_sigtramp + 26
(lldb)
Code
The relevant code for fr_fault() is here:https://github.com/FreeRADIUS/freeradius-server/blob/b7ec8c37c7204accbce4be4de5013397ab662ea3/src/lib/debug.c#L227
and fr_set_signal() the function used to setup signal handlers is here: https://github.com/FreeRADIUS/freeradius-server/blob/0cf0e88704228e8eac2948086e2ba2f4d17a5171/src/lib/misc.c#L61
As the links contain commit hashes the code should be static
EDIT
Finally with version lldb-330.0.48 on OSX 10.10.4 lldb can now go past _sigtram.
frame #2: 0x000000010b96c5f7 libfreeradius-radius.dylib`fr_fault(sig=11) + 983 at debug.c:735
732 FR_FAULT_LOG("Temporarily setting PR_DUMPABLE to 1");
733 }
734
-> 735 code = system(cmd);
736
737 /*
738 * We only want to error out here, if dumpable was originally disabled
(lldb)
frame #3: 0x00007fff8df77f1a libsystem_platform.dylib`_sigtramp + 26
libsystem_platform.dylib`_sigtramp:
0x7fff8df77f1a <+26>: decl -0x16f33a50(%rip)
0x7fff8df77f20 <+32>: movq %rbx, %rdi
0x7fff8df77f23 <+35>: movl $0x1e, %esi
0x7fff8df77f28 <+40>: callq 0x7fff8df794d8 ; symbol stub for: __sigreturn
(lldb)
frame #4: 0x000000010bccb027 rlm_json.dylib`_json_map_proc_get_value(ctx=0x00007ffefa62dbe0, out=0x00007fff543534b8, request=0x00007ffefa62da30, map=0x00007ffefa62aaf0, uctx=0x00007fff54353688) + 391 at rlm_json.c:191
188 }
189 vp = map->op;
190
-> 191 if (value_data_steal(vp, &vp->data, vp->da->type, value) < 0) {
192 REDEBUG("Copying data to attribute failed: %s", fr_strerror());
193 talloc_free(vp);
194 goto error;
This is a bug in lldb related to backtracing through _sigtramp, the asynchronous signal handler in user processes. Unfortunately I can't suggest a workaround for this problem. It has been fixed in the top of tree sources for lldb at http://lldb.llvm.org/ if you're willing to build from source (see the "Source" and "Build" sidebars). But Xcode 5.0 and the next dot release are going to have real problems backtracing past _sigtramp.