I read from man pages of execve that if a process(A) call execve, the already opened file descriptors are copied to the new process(B).
Two possiblities arise here :-
1) Does it mean that a new file descriptor table is created for process B, the entries to which are copied from older file descriptor table of process A
2) Or process B gets the file descriptor table of process A as after execve process A will cease to exist and the already opened files could only be closed from process B, if it gets the file descriptor table of process A.
Which one is correct?
execve does not create a new process. It replaces the calling process's program image, memory space, etc. with new ones based on an executable file from the filesystem. The file descriptor table is modified by closing any descriptors with close-on-exec flag set; the rest remain open and in the same state they were in (current position, locks, etc.) prior to execve.
You're probably confusing this with what happens on fork since execve is usually preceded by fork. When a process forks, the child process has a new file descriptor table referring to the same open file descriptions as the parent process's file descriptor table.
Which one is correct?
#2
Though what you ask is more of OS implementation details and that is rarely if ever important to applications, completely transparent to the applications and depends on the OS.
It is normally said that the new process inherits file descriptors. Except those having FD_CLOEXEC flag set, obviously.
Even in case of #1, if we would presume that for some short time both process A and B are in memory (not really, that's fork() territory) copying of the fd table would be OK. Since process A would be terminated (by the exec()) all its file descriptors would be close()d. And that would have no effect on the already-copied file descriptors in the process B. File descriptors are like pointers to the corresponding kernel structure containing actual information about what the file descriptor actually points to. Copying the fd table doesn't make copies of the underlying structures - it copies only the pointers. The kernel structure contains reference counter (required to implement the fork()) which is incremented when copy is made and thus knows how many processes are using it. Calling close() on file descriptor first of all does decrement of the reference counter. And only if the counter goes to zero (no more processes are using the structure) only then OS actually closes the underlying file/socket/pipe/etc. (But obviously even if inside of the kernel two processes are present for some short time simultaneously, user space applications can't see that since the new process after exec() also inherits the PID of the original process.)
Related
I am comparatively new to the linux programming. I wonder whether the exec() function called after fork() can cause data loss in the parent process.
After a successful call to fork, a new process is created which is a duplicate of the calling process. One thing that gets duplicated are the file descriptors, so it's possible for the new process to read/write the same file descriptors as the original process. These may be files, sockets, pipes, etc.
The exec function replaces the currently running program in the current process with a new program, overwriting the memory of the old program in that process. So any data stored in the memory of the old program is lost. This does not however affect the parent process that forked this process.
When a new program is executed via exec, any open file descriptors that do not have the FD_CLOEXEC (close-on-exec) flag set (see the fcntl man page) are again preserved. So now you have two processes, each possibly running a different program, which may both write to the same file descriptor. If that happens, and the processes don't properly synchronize, data written by one process to the file may overwritten by the other process.
So data loss can occur with regard to writing to file descriptors that the child process inherited from the parent process.
My program does the following in chronological order
The program is started with root permissions.
Among other tasks, A file only readable with root permissions is open()ed.
Root privileges are dropped.
Child processes are spawned with clone() and the CLONE_FILES | CLONE_FS | CLONE_IO flags set, which means that while they use separate regions of virtual memory, they share the same file descriptor table (and other IO stuff).
All child processes execve() their own programs (the FD_CLOEXEC flag is not used).
The original program terminates.
Now I want every spawned program to read the contents of the aforementioned file, but after they all have read the file, I want it to be closed (for security reasons).
One possible solution I'm considering now is having a step 3a where the fd of the file is dup()licated once for every child process, and each child gets its own fd (as an argv). Then every child program would simply close() their fd, so that after all fds pointing to the file are close()d the "actual file" is closed.
But does it work that way? And is it safe to do this (i.e. is the file really closed)? If not, is there another/better method?
While using dup() as I suggested above is probably just fine, I've now --a day after asking this SO question-- realized that there is a nicer way to do this, at least from the point of view of thread safety.
All dup()licated file descriptors point to the same same file position indicator, which of course means you run into trouble when multiple threads/processes might simultaneously try to change the file position during read operations (even if your own code does so in a thread safe way, the same doesn't necessarily go for libraries you depend on).
So wait, why not just call open() multiple times (once for every child) on the needed file before dropping root? From the manual of open():
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
Could be used like this:
int fds[CHILD_C];
for (int i = 0; i < CHILD_C; i++) {
fds[i] = open("/foo/bar", O_RDONLY);
// check for errors here
}
drop_privileges();
// etc
Then every child gets a reference to one of those fds through argv and does something like:
FILE *stream = fdopen(atoi(argv[FD_STRING_I]), "r")
read whatever needed from the stream
fclose(stream) (this also closes the underlying file descriptor)
Disclaimer: According to a bunch of tests I've run this is indeed safe and sound. I have however only tested open()ing with O_RDONLY. Using O_RDWR or O_WRONLY may or may not be safe.
In "Advanced Programming in the Unix Environment", 2nd edition, By W. Richard Stevens. Section 8.3 fork function.
Here's the description:
It is important that the parent and the child share the same file offset.
Consider a process that forks a child, then waits for the child to complete. Assume that both processes write to standard output as part of their normal processing. If the parent has its standard output redirected (by a shell, perhaps) it is essential that the parent's file offset be updated by the child when the child writes to standard output.
My responses:
{1} What does it mean? if parent's std output is redirected to a 'file1' for example, then what should child update after child writes? parent's original std output offset or redirected ouput(i.e file1) offset? Can't be the later, right?
{2} How is the update done? by child explicitly, by OS implicitly, by files descriptor itself? After fork, I thought parent and child went their own ways and has their own COPY of file descriptor. So how does child update offset to parent side?
In this case, the child can write to standard output while the parent is waiting for it; on completion of the child, the parent can continue writing to standard output, knowing that its output will be appended to whatever the child wrote. If the parent and the child did not share the same file offset, this type of interaction would be more difficult to accomplish and would require explicit actions by the parent.
If both parent and child write to the same descriptor, without any form of synchronization, such as having the parent wait for the child, their output will be intermixed (assuming it's a descriptor that was open before the fork). Although this is possible, it's not the normal mode of operation.
There are two normal cases for handling the descriptors after a fork.
The parent waits for the child to complete. In this case, the parent does not need to do anything with its descriptors. When the child terminates, any of the shared descriptors that the child read from or wrote to will have their file offsets updated accordingly.
Both the parent and the child go their own ways. Here, after the fork, the parent closes the descriptors that it doesn't need, and the child does the same thing. This way, neither interferes with the other's open descriptors. This scenario is often the case with network servers.
My response:
{3} When fork() is invoked, all i understand is that child get a COPY of what parent has, file descriptor in this case, and does its thing. If any offset changes to file descriptor that parent and child share, it can only be because the descriptor remember the offset itself. Am I right?
I am kind of new to the concepts.
It's important to distinguish between the file descriptor, which is a small integer that the process uses in its read and write calls to identify the file, and the file description, which is a structure in the kernel. The file offset is part of the file description. It lives in the kernel.
As an example, let's use this program:
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
int main(void)
{
int fd;
fd = open("output", O_CREAT|O_TRUNC|O_WRONLY, 0666);
if(!fork()) {
/* child */
write(fd, "hello ", 6);
_exit(0);
} else {
/* parent */
int status;
wait(&status);
write(fd, "world\n", 6);
}
}
(All error checking has been omitted)
If we compile this program, call it hello, and run it like this:
./hello
here's what happens:
The program opens the output file, creating it if it didn't exist already or truncating it to zero size if it did exist. The kernel creates a file description (in the Linux kernel this is a struct file) and associates it with a file descriptor for the calling process (the lowest non-negative integer not already in use in that process's file descriptor table). The file descriptor is returned and assigned to fd in the program. For the sake of argument suppose that fd is 3.
The program does a fork(). The new child process gets a copy of its parent's file descriptor table, but the file description is not copied. Entry number 3 in both processes' file tables points to the same struct file.
The parent process waits while the child process writes. The child's write causes the first half of "hello world\n" to be stored in the file, and advances the file offset by 6. The file offset is in the struct file!
The child exits, the parent's wait() finishes, and the parent writes, using fd 3 which is still associated with the same file description that had its file offset updated by the child's write(). So the second half of the message is stored after the first part, not overwriting it as it would have done if the parent had a file offset of zero, which would be the case if the file description was not shared.
Finally the parent exits, and the kernel sees that the struct file is no longer in use and frees it.
In the same section, of book, there is a diagram showing three tables which are there when a file is opened.
The user filedescriptor table(part of the process table entry), filetable and inode table (v-node table).
Now a filedescriptor (which is an index to the file descriptor table) entry points to a file table entry,which points to a inode table entry.
Now the file offset (the position from where the next read/write happens) is there in the File table.
So say you have a file opened in parent, that means it is having a descriptor, a file
table entry and an inode reference too.
Now when you are creating a child the file descriptor table is copied for the child.
So the reference count in the file table entry (for that open descriptor) is increased that means now there are two references for the same file table entry.
This descriptor is now available in both parent and child, pointing to the same File Table entry, hence sharing the offset.
Now having this background let us see your questions,
What does it mean? if parent's std output is redirected to a 'file1' for example, then what should child update after child writes? parent's original std output offset or
redirected ouput(i.e file1) offset? Can't be the later, right?]
The child explicitly need not update anything. The author of the book is trying to
tell that, suppose parents's standard out put is redirected to a file and the a fork call is made. After that the parent is wating.So the descriptor is now duplicated, that is the file offset is also shared. Now whenever now the child writes anything to standard out, the written data is saved in the redirected file.
The offset automatically is incremented by the write call.
Now say the child exits.
So the parent come out of waiting and writes something on the standard out(which is
redirected). Now where the parent 's write call's output will be placed -> after the data, which was written by the child. Why -> since the offset's current value is now changed after the child has written.
Parent ( )
{
open a file for writing, that is get the
descriptor( say fd);
close(1);//Closing stdout
dup(fd); //Now writing to stdout means writing to the file
close(fd)
//Create a child that is do a fork call.
ret = fork();
if ( 0 == ret )
{
write(1, "Child", strlen("Child");
exit ..
}
wait(); //Parent waits till child exit.
write(1, "Parent", strlen("Parent");
exit ..
}
Pl. see the above pseudo code, the final data that the opened file contains will be ChildParent. So you see that the file offset got changed when the child has written and this was avilable to the parent's write call, since the offest is shared.
2.How is the update done? by child explicitly, by OS implicitly, by files descriptor itself? After fork, i thought parent and child went their own ways and
has their own COPY of file descriptor. So how does child update offset to parent side?]
Now I think the answer is clear-> by the system call that is by the OS.
[3. When fork() is invoked, all i understand is that child get a COPY of what parent has,
file descriptor in this case, and does its thing. If any offset changes to file descriptor that parent and child share, it can only be because the descriptor remember the offset itself. Am i right?]
This should also be clear. The entry of the user file table points to the file table
table entry (which contains the offset).
In other words, the system calls can fetch the offset from the descriptor.
Unix kernel represents open files using three data structures: Descriptor table, File table, and v-node table.
When a process opens a file twice, it gets two different descriptors in the descriptor table, two entries in the file table(so that they have different positions in the same file), and they both point to one entry in the v-node table.
And child process inherits parent process's descriptor table, so kernel maintains one descriptor table for each process respectively. But two descriptor from different processes point to the same entry in open file table.
So
When child process does some read on the file, would the offset of the same file change in parent process?
If 1 is true, for two processes, is there a convenient way that I can get the same effect of fork on same file? That means two processes share a position(offset) information on the same file.
Is there a way to fork so that both processes have totally unrelated tables, like two unrelated processes only that they opened same files.
When child process does some read on the file, would the offset of the same file change in parent process?
Yes, since the offset is stored system-wide file table. You could get a similar effect using dup or dup2.
If 1 is true, for two processes, is there a convenient way that I can get the same effect of fork on same file? That means two processes share a position(offset) information on the same file.
There is a technique called "passing the file descriptor" using Unix domain sockets. Look for "ancillary" data in sendmsg.
Is there a way to fork so that both processes have totally unrelated tables, like two unrelated processes only that they opened same files.
You have to open the file again to achieve this. Although it doesn't do what you want, you should also look for the FD_CLOEXEC flag.
I'm working on a multi-process program which basically perform fuzzification on each layer of a RVB file. (1 process -> 1 layer). Each child process is delivering a temp file by using the function: tmpfile(). After each child process finishes its job, the main process has to read each temp file created and assemble the data. The problem is that I don't know how to read each temp file inside the main process since I can't access to child's process memory so I can't know what's the temporary pointer to the temp file created!
Any idea?
Don't hesitate to ask for clarifications if needed.
The tmpfile() function returns you a FILE pointer to a file with no determinate name - indeed, even the child process cannot readily determine a name for the file, let alone the parent (and on many Unix systems, the file has no name; it has been unlinked before tmpfile() returns to the caller).
extern FILE *tmpfile(void);
So, you are using the wrong temporary file creation primitive if you must convey file names around.
You have a number of options:
Have the parent process create the file streams with tmpfile() so that both the parent and children share the files. There are some minor coordination issues to handle - the parent will need to seek back to the start before reading what the children wrote, and it should only do that after the child has exited.
Use one of the filename generating primitives instead - mkstemp() is good, and if you need a FILE pointer instead of a file descriptor, you can use fdopen() to create one. You are still faced with the problem of getting file names from children to parent; again, the parent could open the files, or you can use a pipe for each child, or some shared memory, or ... take your pick of IPC mechanisms.
Have the parent open a pipe for each child before forking. The child process closes the read end of the pipe and writes to the the write end; the parent closes the write end of the pipe and arranges to read from the the read end. The issue here with multiple children is that the capacity of any given pipe is finite (and quite small - typically about 5 KiB). Consequently, you need to ensure the parent reads all the pipes completely, bearing in mind that the children won't be able to exit until all the data has been read (strictly, all but the last buffer full has been read).
Consider using threads - but be aware of the coordination issues using threads.
Decide that you do not need to use multiple threads of control - whether processes or threads - but simply have the main program do the work. This eliminates coordination and IPC issues - it does mean you won't benefit from the multi-core processor on the machine.
Of these, assuming parallel execution is of paramount importance, I'd probably use pipes to get the file names from the children (option 2); it has the fewest coordination issues. But for simplicity, I'd go with 'main program does it all' (option 5).
If you call tmpfile() in parent process, child will inherit all open descriptors and will be able to write to the file, and opened file will be accessible for parent as well.
You could create a tempfile in the parent process and then fork, then have the child process use that.
The child process can send back the filedescriptor to the parent process.
EDIT: example code in APUE site (src.tar.gz/apue.2e/lib, recvfd.c, sendfd.c)
Use threads instead of subprocesses? Put the names of the temporary files in another file? Don't use random names for the temp files, but (for example) names based on the pid of the parent process (to allow several instances of your program to run simultaneously) plus a sequential number?