Unix system calls : read/write and the buffer - c

I am writing a pretty simple script .
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(){
int pipefd[2];
pid_t c;
int value[2];
c = fork();
if(c<0){
perror("in fork");
exit(1);
}
if(c==0){
printf("i am the child\n");
int buf[2];
buf[0]=3;
buf[1]=0;
write(pipefd[1], buf, 4);
write(pipefd[1],buf+1,4);
close(pipefd[1]);
exit(0);
}
if (pipe(pipefd) == -1) { /*UPDATE */
perror("pipe");
exit(EXIT_FAILURE);
}
read(pipefd[0], value, 4);
read(pipefd[0], value+1, 4);
close(pipefd[0]);
printf("%d %d\n", value[0], value[1]);
exit(0);
}
What I intend to do is to achieve:
value[0] = buf[0];
value[1] = buf[1];
( and print those of course).
But all I get as a result is :
-1299582208 32766
i am the child
Because, I have ints, I assumed that each will hold 4 bytes. And I think that for an int array each element will holds 4 bytes. But clearly I am missing something. Any help?

As I mentioned in my top comment: Where is the pipe syscall?
Without it, the write and read calls will probably fail because pipefd has random values.
So, the parent will never have value filled in correctly.
Because these [unitialized] values are on the stack, they will have random values, which is what you're seeing.
This is UB [undefined behavior].
Different systems/compilers may manipulate the stack differently, which is why you see different [yet still random] results on different configurations.
To fix, add the following above your fork call:
pipe(pipefd);
I downloaded, built, and ran your program. Before I added the fix, I got random values. After applying the fix, I get 3 0 as the output, which is what you expected/wanted.
Note: As others have mentioned, you could check the return codes for read and write. If you had, they might return -1 and put an error code in errno that would have helped you debug the issue.

A very simple fix would be to put a sleep(1) call right above your read() calls - obviously this isn't a great solution.
An important early lesson in multi process programming and communications is "race conditions". Your fork'd child is executing before the parent, it seems. I bet if you ran this 20 times, you might get X number of times where it does what you want!
You cannot guarantee the order of execution. So a sleep(1) will suffice until you learn more advanced techniques on resource locking (mutexes, semaphores).

Related

Can a fork child determine whether it is a fork or a vfork?

Within the child process, is there any way that it determine whether it was launched as a fork with overlay memory, or a vfork with shared memory?
Basically, our logging engine needs to be much more careful (and not log some classes of activity) in vfork. In fork it needs to cooperate with the parent process in ways that it doesn't in vfork. We know how to do those two things, but not how to decide.
I know I could probably intercept the fork/vfork/clone calls, and store the fork/vfork/mapping status as a flag, but it would make life somewhat simpler if there was an API call the child could make to determine its own state.
Extra marks: Ideally I also need to pick up any places in libraries that have done a fork or vfork and then called back into our code. And how that can happen? At least one of the libraries we have offers a popen-like API where a client call-back is called from the fork child before the exec. Clearly the utility of that call-back is significantly restricted in vfork.
All code not specifically designed to work under vfork() doesn't work under vfork().
Technically, you can check if you're in a vfork() child by calling mmap() and checking if the memory mapping was inherited by the parent process under /proc. Do not write this code. It's a really bad idea and nobody should be using it. Really, the best way to tell if you're in a vfork() child or not is to be passed that information. But here comes the punchline. What are you going to do with it?
The things you can't do as a vfork() child include calling fprintf(), puts(), fopen(), or any other standard I/O function, nor malloc() for that matter. Unless the code is very carefully designed, you're best off not calling into your logging framework at all, and if it is carefully designed you don't need to know. A better design would most likely be log your intent before calling vfork() in the first place.
You ask in comments about a library calling fork() and then back into your code. That's already kind of bad. But no library should ever ever call vfork() and back into your code without being explicitly documented as doing so. vfork() is a constrained environment and calling things not expected to be in that environment really should not happen.
A simple solution could use pthread_atfork(). The callbacks registered with this service are triggered only upon fork(). So, the 3rd parameter of the function, which is called in the child process right after the fork, could update a global variable. The child can check the variable and if it is modified, then it has been forked:
/*
Simple program which demonstrates a solution to
make the child process know if it has been forked or vforked
*/
#include <pthread.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
pid_t forked;
void child_hdl(void)
{
forked = getpid();
}
int main(void)
{
pid_t pid;
pthread_atfork(0, 0, child_hdl);
pid = fork();
if (pid == 0) {
if (forked != 0) {
printf("1. It is a fork()\n");
}
exit(0);
}
// Father continues here
wait(NULL);
pid = vfork();
if (pid == 0) {
if (forked != 0) {
printf("2. It is a fork()\n");
}
_exit(0);
}
// Father continues here
wait(NULL);
return 0;
}
Build/execution:
$ gcc fork_or_vfork.c
$ ./a.out
1. It is a fork()
I came across kcmp today, which looks like it can answer the basic question - i.e. do two tids or pids share the same VM. If you know they represent forked parent/child pids, this can perhaps tell you if they are vfork()ed.
Of course if they are tids in the same process group then they will by definition share VM.
https://man7.org/linux/man-pages/man2/kcmp.2.html
int syscall(SYS_kcmp, pid_t pid1, pid_t pid2, int type,
unsigned long idx1, unsigned long idx2);
KCMP_VM
Check whether the processes share the same address space.
The arguments idx1 and idx2 are ignored. See the
discussion of the CLONE_VM flag in clone(2).
If you were created by vfork, your parent will be waiting for you to terminate. Otherwise, it's still running. Here's some very ugly code:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
void doCheck()
{
char buf[512];
sprintf(buf, "/proc/%d/wchan", (int) getppid());
int j = open(buf, O_RDONLY);
if (j < 0) printf("No open!\n");
int k = read(j, buf, 500);
if (k <= 0) printf("k=%d\n", k);
close(j);
buf[k] = 0;
char *ptr = strstr(buf, "vfork");
if (ptr != NULL)
printf("I am the vfork child!\n");
else
printf("I am the fork child!\n");
}
int main()
{
if (fork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
if (vfork() == 0)
{
doCheck();
_exit(0);
}
sleep(1);
}
This is not perfect, the parent might be waiting for a subsequent vfork call to complete.

Make processes run at the same time using fork

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
int main(int argc, char **argv) {
FILE *file;
file = fopen(argv[1], "r");
char buf[600];
char *pos;
pid_t parent = fork();
if(parent == 0) {
while (fgets(buf, sizeof(buf), file)) {
pid_t child = fork();
if(child == 0) {
/* is there a function I can put here so that it waits
once the parent is exited to then run?*/
printf("%s\n", buf);
return(0);
}
}
return(0);
}
wait(NULL);
return(0);
}
The goal here to print out the line of a file all at the same time, parallel.
For example:
Given a file
a
b
c
$ gcc -Wall above.c
$ ./a.out file
a
c
b
$ ./a.out file
b
c
a
As in the processes ran at the exact same time. I think I can get this to work if there was a wait clause that waits for the parent to exit then start running the child. As shown in the comments above. Once the parent exits then all the processes would start at the print statement as wanted.
If you had:
int i = 10;
while (i > 0)
{
pid_t child = fork();
if(child == 0) {
printf("i: %d\n", i--);
exit(0);
}
}
then the child processes are running concurrently. And depending on the number of cores and your OS scheduler, they might even run literally at the same time. However, printf is buffer, so the order in which the lines appear on screen cannot be determined and will vary between executions of your program. And because printf is buffered, you will most likely not see lines overlapping other other. However if you were using write directly to stdout, then the outputs might overlap.
In your scenario however, the children die so fast and because you are reading
from a file (which might take a while to return), by the time the next fork is executed,
the previous child is already dead. But that doesn't change the fact, that if
the children would run long enough, they would be running concurrently and the
order of the lines on screen cannot be determined.
edit
As Barmar points out in the comments, write is atomic. I looked up in my
man page and in the BUGS section it says this:
man 2 write
According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions with Regular File Operations"):
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: ...
Among the APIs subsequently listed are write() and writev(2). And among the effects that should be atomic across threads (and pro‐
cesses) are updates of the file offset. However, on Linux before version 3.14, this was not the case: if two processes that share an
open file description (see open(2)) perform a write() (or writev(2)) at the same time, then the I/O operations were not atomic with
respect updating the file offset, with the result that the blocks of data output by the two processes might (incorrectly) overlap.
This problem was fixed in Linux 3.14.
Sever years ago I observed this behaviour of write on stdout with concurrent
children printing stuff, that's why I wrote that with write, the lines may
overlap.
I am not sure why you have an outer loop. You could rewrite as follows. Once you create the child processes, they could run in any order. So you might seem the output in "order" but in another run you might see different order. It depends on the process scheduling by your OS and for your purpose, it's all running in "parallel". So you really don't need to ensure parent process is dead.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc, char **argv)
{
if (argc != 2) {
printf("Incorrect args\n");
exit(1);
}
char buf[1024];
FILE *file = fopen(argv[1], "r");
while (fgets(buf, sizeof buf, file)) {
pid_t child = fork();
if(child == 0) {
write(STDOUT_FILENO, buf, strlen(buf));
_exit(0);
}
}
/* Wait for all child processes. */
while (wait(NULL) != -1);
}

Can a single pipe be used for 2 way communication between parent and a child?

Suppose I use pipefdn[2] and pipe() on it , can bidirectional communication be implemented using a single pipe or do you need 2 pipes ?
Though this operation results as success in some cases, but it is not a recommended way , especially in the production code. As pipe() by default dont provide any sync mechanism and moreover the read() can go for an infinite hang, if no data or read() is called before write() from other process.
Recommended way is to always use 2 pipe. pipe1[2], pipe2[2] for two way communication.
For more info please refer the following video description.
https://www.youtube.com/watch?v=8Q9CPWuRC6o&list=PLfqABt5AS4FkW5mOn2Tn9ZZLLDwA3kZUY&index=11
No sorry. Linux pipe() is unidirectional. See the man page, and also pipe(7) & fifo(7). Consider also AF_UNIX sockets, see unix(7).
Correct me if I am wrong: But I think you can. The problem is that you probably don't want to do that. First, of all create a simple program:
#include <stdio.h>
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
int pd[2];
int num = 2;
int main(){
pid_t t = fork();
/* create a child process */
if(t<0){
printf("error in fork");
exit(1);
}
/* create a pipe */
if(pipe(pd)==-1){
printf("error in pipe");
exit(3);
}
else if(t==0){
//close(pd[1]); // child close writing end
int r = read(pd[0], &num, sizeof(num));
if(r<0){
printf("error while reading");
exit(2);
}
printf("i am the child and i read %d\n",num);
// close(pd[0]);
exit(0);
}
/* parent process */
//close(pd[0]); /* parents closes its reading end
if(write(pd[1],&num,sizeof(num)<0)){
printf("error in reading");
exit(4);
}
//close(pd[1]);
/*parent wait for your child to terminate;*/
int status;
wait(&status);
printf("my child ended with status: %d\n",status);
return 0;
}
Try to play with close(). Skip it by putting it in a comment or include it. You will find out that in order this program to run the only really needed system-call close is the one before the child reads. I found here in stack overflow an answer saying that " Because the write-end is open the system waits because a potential write could occur .. " . Personally, I tried to run it without it and I discovered that it would not terminate. The other close(), although are a good practice , don't influence the execution. ( I am not sure why that happens maybe someone more experienced can help us).
Now let's examine what you asked:
I can see some problems here:
If two processes write in the same channel you may have race conditions:
They write to the same file descriptor at the same time:
What if one process reads its own writings instead of those of the process
it tries to communicate with? How you will know, where in the file you should read?
What if the one process, writes "above" the writings of the other?
Yes it can, I've done that before. I had a parent and child send each other different messages using the same 2 pipes and receive them correctly. Just make sure you're always reading from the first file descriptor and writing to the second.

How and why can fork() fail?

I'm currently studying the fork() function in C. I understand what it does (I think). Why do we check it in the following program?
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int pid;
pid=fork();
if(pid<0) /* Why is this here? */
{
fprintf(stderr, "Fork failed");
exit(-1);
}
else if (pid == 0)
{
printf("Printed from the child process\n");
}
else
{
printf("Printed from the parent process\n");
wait(pid);
}
}
In this program we check if the PID returned is < 0, which would indicate a failure. Why can fork() fail?
From the man page:
Fork() will fail and no child process will be created if:
[EAGAIN] The system-imposed limit on the total number of pro-
cesses under execution would be exceeded. This limit
is configuration-dependent.
[EAGAIN] The system-imposed limit MAXUPRC (<sys/param.h>) on the
total number of processes under execution by a single
user would be exceeded.
[ENOMEM] There is insufficient swap space for the new process.
(This is from the OS X man page, but the reasons on other systems are similar.)
fork can fail because you live in the real world, not some infinitely-recursive mathematical fantasy-land, and thus resources are finite. In particular, sizeof(pid_t) is finite, and this puts a hard upper bound of 256^sizeof(pid_t) on the number of times fork could possibly succeed (without any of the processes terminating). Aside from that, you also have other resources to worry about like memory.
There is not enough memory available to make the new process perhaps.
If the kernel fails to allocate memory for example, that's pretty bad and would cause fork() to fail.
Have a look at the error codes here:
http://linux.die.net/man/2/fork
Apparently it can fail (not really fail but hang infinitely) due to the following things coming together:
trying to profile some code
many threads
much memory allocation
See also:
clone() syscall infinitely restarts because of SIGPROF signals #97
Hanging in ARCH_FORK with CPUPROFILE #704
SIGPROF keeps a large task from ever completing a fork(). Bug 645528
Example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main()
{
size_t sz = 32*(size_t)(1024*1024*1024);
char *p = (char*)malloc(sz);
memset(p, 0, sz);
fork();
return 0;
}
Build:
gcc -pg tmp.c
Run:
./a.out

Understanding forks in C

I am having some trouble understanding the following simple C code:
int main(int argc, char *argv[]) {
int n=0;
fork();
n++;
printf("hello: %d\n", n);
}
My current understanding of a fork is that from that line of code on, it will split the rest of the code in 2, that will run in parallel until there is "no more code" to execute.
From that prism, the code after the fork would be:
a)
n++; //sets n = 1
printf("hello: %d\n", n); //prints "hello: 1"
b)
n++; //sets n = 2
printf("hello: %d\n", n); //prints "hello: 2"
What happens, though, is that both print
hello: 1
Why is that?
EDIT: Only now it ocurred to me that contrary to threads, processes don't share the same memory. Is that right? If yes, then that'd be the reason.
After fork() you have two processes, each with its own "n" variable.
fork() starts a new process, sharing no variables/memory locations.
It is very similar to what happens if you execute ./yourprogram twice in a shell, assuming the first thing the program does is forking.
At fork() call's end, both the processes might be referring to the same copy of n. But at n++, each gets its own copy with n=0. At the end of n++; n becomes 1 in both the processes. The printf statement outputs this value.
Actually you spawn a new process of the same progarm. It is not the closure kind of thing. You could use pipes to exchange data between parent and child.
You did indeed answer your own question in your edit.
examine this code and everything should be clearer (see the man pages if you don't know what a certain function does):
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int count = 1;
int main(int argc, char *argv[]) {
// set the "startvalue" to create the random numbers
srand(time(NULL));
int pid;
// as long as count is <= 50
for (count; count<=50; count++) {
// create new proccess if count == 9
if (count==9) {
pid = fork();
// reset start value for generating the random numbers
srand(time(NULL)+pid);
}
if (count<=25) {
// sleep for 300 ms
usleep(3*100000);
} else {
// create a random number between 1 and 5
int r = ( rand() % 5 ) + 1;
// sleep for r ms
usleep(r*100000);
}
if (pid==0) {
printf("Child: count:%d pid:%d\n", count, pid);
} else if (pid>0) {
printf("Father: count:%d pid:%d\n", count, pid);
}
}
return 0;
}
happy coding ;-)
The system call forks more than the execution thread: also forked is the data space. You have two n variables at that point.
There are a few interesting things that follow from all this:
A program that fork()s must consider unwritten output buffers. They can be flushed before the fork, or cleared after the fork, or the program can _exit() instead of exit() to at least avoid automatic buffer flushing on exit.
Fork is often implemented with copy-on-write in order to avoid unnecessarily duplicating a large data memory that won't be used in the child.
Finally, an alternate call vfork() has been revived in most current Unix versions, after vanishing for a period of time following its introduction i 4.0BSD. Vfork() does not pretend to duplicate the data space, and so the implementation can be even faster than a copy-on-write fork(). (Its implementation in Linux may be due less to speed reasons than because a few programs actually depend on the vfork() semantics.)

Resources