Running fork, dup2 and exec from a C program in background

Running fork, dup2 and exec from a C program in background - c

I'm developing a embedded C application (gather.out) for a controller that runs Debian 8 (Jessie). This application calls another program called gather. The gather program prints some data on screen when I run it alone on the terminal. I decided to use fork and dup2 to redirect the output to a output file that contains the gathered data, as follows:
pid_t pid=fork();
/* Child process */
if(pid==0)
{
// Open gather output file
int gather_file_fd=open(actual_filename, O_WRONLY | O_CREAT, 0777);
if(gather_file_fd==-1)
{
return 2;
}
// Make stdout of gather program as input of output gather file
int gather_file_fd_dup=dup2(gather_file_fd, STDOUT_FILENO);
// Run the gather program
int err;
err=execl("/usr/bin/gather","gather","-w",NULL);
if(err==-1)
{
return 2;
}
}
Everything goes fine when I run it from the terminal, manually. When I set the controller to run it as startup (I don't know exactly how the controller start the program on boot, but it does) my output file gets corrupted.
I don't know how to debug this problem but I was trying to see the TTY of the processes to have some idea about what's happening using ps ux | grep gather. When running manually:
# ps ux | grep gather
root 18014 49.4 0.3 219380 6240 pts/0 RLl+ 14:50 0:51 ./gather.out
root 18022 39.2 0.2 219304 5988 pts/0 RLl+ 14:52 0:01 gather -w
root 18026 0.0 0.0 2076 568 pts/22 S+ 14:52 0:00 grep gather
And running at startup:
# ps ux | grep gather
root 17859 80.5 0.3 219380 6240 ? RLl 14:42 0:33 /var/ftp/usrflash/Project/C Language/Background Programs/gather.out
root 17978 45.0 0.2 219304 5984 ? RLl 14:43 0:01 gather -w
root 17982 0.0 0.0 2076 532 pts/0 S+ 14:43 0:00 grep gather
From there I noticed that there is no TTY attached to the processes running the program at startup. Does it affect the way I redirect the output with fork, dup2 and execl?
Edit 1
The correct output file must be like this (produced by gather application, runing gather -w > /var/ftp/gather/test.txt on the terminal and start the data acquisition on controller's IDE):
Waiting to gather
0 0 0
0 0 0
0 0 0
...
Where each line 0 0 0 is the sampled signals from the controller and could be any value, the ... indicates that the file is very large. When I run the gather program on the terminal gather -w > /var/ftp/gather/test.txt &, start data acquisition on controller's IDE, the gather program suddenly stops when I hit enter on terminal:
# gather -w > /var/ftp/gather/test.txt &
[2] 22232
#
[2]+ Stopped gather -w > /var/ftp/gather/test.txt
#
And the output file prints just the header:
Waiting to gather
Starting the gather program in background using gather -w < /dev/null >& /var/ftp/gather/test.txt & as one of the comments' advices and starting the data acquisition on controller's IDE I got the following file output:
Waiting to gather
Waiting to gather
Waiting to gather
Waiting to gather
...
The number of lines on the file is much greater than the number of samples on sample counter watched by IDE. My guess lies on a bug in the gather program. Unfortunatly I don't have gather's source code because it's part of natively programs of the controller.

Related

How do I make my parent wait, until a Intermediate parent, child and child of Intermediate parent finish the process using fork, wait, and waitpid?

I am creating a parent process, than an intermediate process and another child, and a child of intermediate process using fork() in C. Now I and trying to print the processes using ps -f --ppid ..,.. but some of the child processes finish hence it won't be printed when ps is ran using system(). How do I use wait and waitpid() so my parent will wait until child finishes it's process.
Current O/p:
UID PID PPID C STIME TTY TIME CMD
pc 24400 306 0 15:48 pts/2 00:00:00 ./a
pc 24401 24400 0 15:48 pts/2 00:00:00 [a]
pc 24404 24400 0 15:48 pts/2 00:00:00 sh -c ps -f --ppid 306,24400,24401,24402
Expected O/p:
UID PID PPID C STIME TTY TIME CMD
pc 24400 306 0 15:48 pts/2 00:00:00 ./a
pc 24401 24400 0 15:48 pts/2 00:00:00 ./a
pc 24402 24400 0 15:48 pts/2 00:00:00 ./a
pc 24403 24401 0 15:48 pts/2 00:00:00 ./a
pc 24404 24400 0 15:48 pts/2 00:00:00 sh -c ps -f --ppid 306,24400,24401,24402.
Thanks in advance!

How do I use wait and waitpid() so my parent will wait until child finishes it's process.
If your intermediate process waits for its child to terminate before it terminates itself, then your initial process need only wait for each of its children.
However, a process can successfully wait() only for its own children, not their children, so if your intermediate process does not wait for its child then the wait()-family functions do not provide any mechanism by which the initial process can suspend execution until its grandchild terminates. You would need something else for that case, or to be resilient against the possibility that the intermediate process dies before it collects its child.
A relatively simple "something else" would be for the initial process to create a pipe, then launch its children, then close the write end of the pipe and try to read from the read end. The children don't need to do anything except avoid closing or writing to the write end of the pipe before they terminate -- they don't even need to be aware of the pipe. The parent's read() will block as long as any process has the write end of the pipe open.

Save one line of `top`, `htop` or `intel_gpu_top` outputs into a Bash array

I want to save 1 line from the output of top into a Bash array to later access its components:
$ timeout 1 top -d 2 | awk 'NR==8'
2436 USER 20 0 1040580 155268 91100 S 6.2 1.0 56:38.94 Xorg
Terminated
I tried:
$ gpu=($(timeout 1s top -d 2 | awk 'NR==8'))
$ mapfile -t gpu < <($(timeout 1s top -d 2 | awk 'NR==8'))
and, departing from the array requisite, even:
$ read -r gpu < <(timeout 1s top -d 2 | awk 'NR==8')
all returned a blank for either ${gpu[#]} (first two) or $gpu (last).
Edit:
As pointed out by #Cyrus and others gpu=($(top -n 1 -d 2 | awk 'NR==8')) is the obvious solution. However I want to build the cmd dynamically so top -d 2 may be replaced by other cmds such as htop -d 20 or intel_gpu_top -s 1. Only top can limit its maximum number of iterations, so that is not an option in general, and for that reason I resort to timeout 1s to kill the process in all shown attempts...
End edit
Using a shell other than Bash is not an option. Why did the above attempts fail and how can I achieve that ?

Why did the above attempts fail
Because redirection to pipe does not have terminal capabilities, top process receives SIGTTOU signal when it tries to write the terminal and take the terminal "back" from the shell. The signal causes top to terminate.
how can I achieve that ?
Use top -n 1. Generally, use the tool specific options to disable using terminal utilities by that tool.
However I want to build the cmd dynamically so top -d 2 may be replaced by other cmds such as htop -d 20 or intel_gpu_top -s 1
Write your own terminal emulation and extract the first line from the buffer of the first stuff the command displays. See GNU screen and tmux source code for inspiration.

I dont think you need the timeout there if its intended to quit top. You can instead use the -n and -b flags but feel free to add it if you need it
#!/bin/bash
arr=()
arr[0]=$(top -n 1 -b -d 2 | awk 'NR==8')
arr[1]=random-value
arr[2]=$(top -n 1 -b -d 2 |awk 'NR==8')
echo ${arr[0]}
echo ${arr[1]}
echo ${arr[2]}
output:
1 root 20 0 99868 10412 7980 S 0.0 0.5 0:00.99 systemd
random-value
1 root 20 0 99868 10412 7980 S 0.0 0.5 0:00.99 systemd
from top man page:
-b :Batch-mode operation
Starts top in Batch mode, which could be useful for sending output from top to other programs or to a
file. In this mode, top will not accept input and runs until the iterations limit you've set with the
`-n' command-line option or until killed.
-n :Number-of-iterations limit as: -n number
Specifies the maximum number of iterations, or frames, top should produce before ending.
-d :Delay-time interval as: -d ss.t (secs.tenths)
Specifies the delay between screen updates, and overrides the corresponding value in one's personal
configuration file or the startup default. Later this can be changed with the `d' or `s' interactive
commands.

Implementing my own ps command

I'm trying to implement my own ps command, called psmod.
I can use linux system call and all utilities of the /proc directory.
I discovered that all directory in /proc directory with a number as their name are the processes in the system. My question is: how can I select only those processes which are active when psmod is called?
I know that in /proc/<pid>/stat there's a letter representing the current status of the process; anyway, for every process in /proc, this letter is S, that is sleeping.
I also tried to send a signal 0 to every process, from 0 to the maximumnumberofprocesses (in my case, 32768), but in this way it discovers far more processes than the ones present in /proc.
So, my question is, how does ps work? The source is a little too complicated for me, so if someone can explain me, I would be grateful.

how does ps work?
The way of learning standard utils - is to check their source code. There are several implementations of ps: procps and busybox; and busybox is smaller and it will be easier to begin with it. There is sources for ps from busybox: http://code.metager.de/source/xref/busybox/procps/. Main loop from ps.c:
632 p = NULL;
633 while ((p = procps_scan(p, need_flags)) != NULL) {
634 format_process(p);
635 }
Implementation of procps_scan is in procps.c (ignore code from inside ENABLE_FEATURE_SHOW_THREADS ifdefs for first time). First call to it will open /proc dir using alloc_procps_scan():
290 sp = alloc_procps_scan();
100 sp->dir = xopendir("/proc");
Then procps_scan will read next entry from /proc directory:
292 for (;;) {
310 entry = readdir(sp->dir);
parse the pid from subdirectory name:
316 pid = bb_strtou(entry->d_name, NULL, 10);
and read /prod/pid/stat:
366 /* These are all retrieved from proc/NN/stat in one go: */
379 /* see proc(5) for some details on this */
380 strcpy(filename_tail, "stat");
381 n = read_to_buf(filename, buf);
Actual unconditional printing is in format_process, ps.c.
So, busybox's simple ps will read data for all processes, and will print all processes (or all processes and all threads if there will be -T option).
how can I select only those processes which are active when psmod is called?
What is "active"? If you want find all processes that exists, do readdir of /proc. If you want to find only non-sleeping, do full read of /proc, check states of every process and print only non-sleeping. The /proc fs is virtual and is it rather fast.
PS: for example, normal ps program prints only processes from current terminal, usually two:
$ ps
PID TTY TIME CMD
7925 pts/13 00:00:00 bash
7940 pts/13 00:00:00 ps
but we can strace it with strace -ttt -o ps.log ps and I see that ps does read every process directory, files stat and status. And the time needed for this (option -tt of strace gives us timestamps of every syscall): XX.719011 - XX.870349 or just 120 ms under strace (which slows all syscalls). It takes only 20 ms in real life according to time ps (I have 250 processes in total):
$ time ps
PID TTY TIME CMD
7925 pts/13 00:00:00 bash
7971 pts/13 00:00:00 ps
real 0m0.021s
user 0m0.006s
sys 0m0.014s

"My question is: how can I select only those processes which are active when psmod is called?"
I hope this command will help you:
top -n 1 | awk "NR > 7" | awk {'print $1,$8,$12'} | grep R
I am on ubuntu 12.

Application crashes only when launched from inittab on busybox

I'm writing an application for an embedded busybox system that allows TCP connections, then sends out messages to all connected clients. It works perfectly when I telnet to the box and run the application from a shell prompt, but I have problems when it is launched from the inittab. It will launch and I can connect to the application with one client. It successfully sends one message out to that client, then crashes. It will also crash if I connect a second client before any messages are sent out. Again, everything works perfectly if I launch it from a shell prompt instead.
The following errors are what comes up in the log:
<11>Jan 1 00:02:49 tmmpd.bin: ERROR: recvMessage failed, recv IO error
<11>Jan 1 00:02:49 tmmpd.bin: Some other LTK TCP error 103. Closing connection 10
<11>Jan 1 00:02:49 tmmpd.bin: ERROR: recvMessage failed, recv IO error
<11>Jan 1 00:02:49 tmmpd.bin: Some other LTK TCP error 103. Closing connection 10
Any suggestions would be greatly appreciated!

I was testing a bit in arm-qemu and busybox, and I was able to start a script as user test to run in background.
I have created a new user "test":
buildroot-dir> cat etc/passwd
test:x:1000:1000:Linux User,,,:/home/test:/bin/sh
Created a simple testscript.sh:
target_system> cat /home/test/testscript.sh
#!/bin/sh
while :
do
echo "still executing in bg"
sleep 10
done
To my /etc/init.d/rcS I added a startup command for it:
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
/sbin/mdev -s
/bin/su test -c /home/test/testscript.sh& # < Added this
Now when I start the system, the script will run in the background, and when I grep for the process it has been started as user test (default root user is just 0):
target_system> ps aux | grep testscript
496 test 0:00 sh -c home/test/testscript.sh
507 test 0:00 {testscript.sh} /bin/sh home/test/testscript.sh

how should I use strace to snif the serial port?

I am writing an application in linux and need to access the serial port.
For debugging purposes I need to snif what comes and/or goes through the serial port.
I looked around and found out I can use strace to do that. So I tried the following:
-I print the file_descriptor of the serial device that I use.
(after restarting my application a few times, I reassured myself that the file_descriptor number my application gets from kernel is "4"
-if i start my application as strace -e write=4 ./myapp , I would expect to get messages in the terminal, from file_descriptor "4" only. instead I get looots of output:
read(5, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\300Q\254C4\0\0\0"..., 52
fstat64(5, {st_mode=S_IFREG|0644, st_size=1448930, ...}) = 0
mmap2(0x43ab8000, 153816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0)0
mprotect(0x43ad6000, 28672, PROT_NONE) = 0
mmap2(0x43add000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRI0
close(5) = 0
munmap(0x2ab4c000, 38409) = 0
exit(0)
from several different file_descriptors.
If I run my app with strace -e trace=write -e write=4 ./myapp
I'll get far less message, even though they will still be many, and or file_descriptor "1".
write(1, "GPIO data bank:0x8, data: 0x80 a"..., 52GPIO data bank:0x8, data: 0x81) = 52
write(1, "\n", 1) = 1
write(1, "--> Version: 0677 <--\n", 22--> Version: 0677 <-- ) = 22
serial fd = 4
what you see above are some printf statements.
The extremely weird part is that the line serial fd = 4 is also a printf statement, but for some reason it is not wrapped around write(fd, ....) statement in strace output.
Can someone explain that, too?
thank you for your help.

Try it out with someting simple.
strace -e write=1 echo foo
This will write all syscalls, and in addition to these, the data written to fd 1.
strace -e trace=none -e write=1 echo foo
This will generate no output except for the output from the program itself. It seems you have to trace write if you want to see its data.
strace -e trace=write -e write=1 echo foo
This will print all write syscalls, for any file descriptor. In addition to that, it will print a dump of the data sent to descriptor 1. The output will look like this:
write(1, "foo\n", 4foo
) = 4
| 00000 66 6f 6f 0a foo. |
+++ exited with 0 +++
The syscall starts in the first line. After the list of arguments, the syscall is actually executed, and prints foo followed by a newline. Then the syscall return value is printed by strace. After that, we have the data dump.
I'd suggest using -e trace=write -e write=4 -o write4.txt followed by grep '^ |' write4.txt or something like that. If you want to see data in real time, you can use a bash redirection like this:
strace -e trace=write -e write=4 -o >(grep '^ |') ./myapp
This will send output from strace to grep, where you can strip the write syscalls and concentrate on the data dumps.
The extremely weird part is that the line serial fd = 4 is also a printf statement, but for some reason it is not wrapped around write(fd, ....) statement in strace output. Can someone explain that, too?
I'd say that line is output not from strace, but from some application. That's the reason it is not wrapped. The fact that no wrapped version of this appears in addition to that unwrapped one (like in my foo example output above) suggests that the output might originate in a child process lainced by myapp. Perhaps you want to add -f so you follow child process creation?
Notice that a child might decide to rename its file descriptors, e.g. redirect its standard output to that serial port opened by the parent. If that happens, write=4 won't be appropriate any more. To be on the safe side, I'd write the whole -f -e trace=write output to a file, and look at that to see where the data actually gets written. Then adjust things to home in on that data.