how can I do file operations on linux? - c

I want to parent and child process read the contents of a.txt file, byte by byte. And parent process write byte by byte to the b.txt file. child process write byte by byte to the c.txt file.
Parent and child are working on reading from the same file and writing to the same file
#include<stdio.h>
#include<fcntl.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/wait.h>
int main(argc,argv)
int argc;
char *argv[];
{
int fdrt,fdwt2,fdwt3;
char c;
char parent='P';
char child='C';
int pid;
unisigned long i;
if(argc !=4) exit(1);
if((fdrd =open(argv[1],O_RDONLY))==-1)
exit(1);
if((fdwt2=creat(argv[2],0666))==-1)
exit(1);
if((fdwt3=creat(argv[3],0666))==-1)
exit(1);
printf("Parent:creating a child process\n");
pid=fork();
if(pid==0){
printf("Child process starts,id= %d\n",getpid());
for(;;)
{
if(read(fdrd,&c,1)!=1) break;
if(i=0;i<50000;i++);
write(1,&child,1);
write(fdwt2,&c,1);
}
exit(0);
}
else
{
printf("Parent starts,id= %d\n",getpid());
for(;;)
{
if(read(fdrd,&c,1)!=1) break;
if(i=0;i<50000;i++);
write(1,&parent,1);
write(fdwt3,&c,1);
}
wait(0);
}
}

When you open or creat a file, the system creates a kernel "file object" that tracks the open file (tracking where in the file you are reading/writing, among other things) and returns a "file descriptor" that refers to it. That file descriptor is just a handle, and you can have mulitple file descriptors that refer to the same file object.
When you fork a child, the child gets a copy of all the parents file descriptors, but the file objects are not copied -- the file descriptors in the child refer to the same file object. So when you open the read file and then fork, the parent and child share a file object for that file, which means that when either one reads from the file, it advances the read location in the file, so when the other one reads from the file, it will get bytes from that advanced point rather than the start.
The net effect is that the bytes in the source file will be "split" between your parent and child -- each process will get some but not all of the bytes. Which bytes go to which depends on timing and is hard to predict.
To avoid this, you want to open the source file after calling fork. This will result in opening the file twice (once in the parent and once in the child), thus creating two file objects to track the read location, and allowing parent and child to read the file independently.

Even if you found a way of reading and writing to the same file at the same time, it is probably unwise and may lead to data corruption. If you really need two processes and really need to read and write to a.txt from both you can do it by opening reading/writing and closing the file each time you read/write data. This would be safe but the performance wouldn't be good.
If you're only reading from parent and child you can just open after the fork without issue so it is opened for read twice.
You can write to b.txt from the parent only and write to c.txt from the child only without issue by opening those files after the fork().
To communicate between parent and child process efficiently, you can use pipes, but the amount of data you can send in a pipe is not as large as in a file. Using a file for this purpose when a pipe would do the trick is probably bad practice.

Related

How does pipe works in c?

So I have 2 questions about pipes in c :
1:
When i fork a process after creating a pipe in order to make the parent write to the pipe and the child read from it, How it's synchronized ? : the parent always send data before the child attempts to read that data although they are running concurrently?
why i don't fall on the case where the child start reading before the parent try to send its data ?
2:
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Any help will be appreciated, Thank you !
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#define BUFFER_SIZE 256
int main(int argc , char*argv[])
{
pid_t worker_pid ;
int descriptor[2];
unsigned char bufferR[256] , bufferW[256];
/***************************** create pipe ********************/
puts("pipe creation");
if (pipe(descriptor) !=0)
{
fprintf(stderr,"error in the pipe");
exit(1);
}
/***************************** create processes ********************/
puts("now fork processes");
worker_pid = fork(); // fork process
if (worker_pid == -1 ) // error forking processes
{
fprintf(stderr,"error in fork");
exit(2);
}
/*********************** communicate processes ********************/
if (worker_pid == 0) // in the child process : the reader
{
close(descriptor[1]); // close input file descriptor , ok because parent finished writing
read(descriptor[0],bufferR,BUFFER_SIZE);
printf("i'm the child process and i read : %s \n",bufferR);
}
if (worker_pid !=0)
{
// is there any case where child attempts to read before parent writting ?
close(descriptor[0]);// close file descriptor of child before it reads ?
sprintf(bufferW,"i'm the parent process my id is %d , and i wrote this to my child",getpid());
write(descriptor[1],bufferW,BUFFER_SIZE);
wait(NULL);
}
return 0;
}
I expected there will be some cases for question 1 where the output is :
i'm the child process and i read :
because the parent doesn't wrote it's message yet
for question 2 i expected an error saying :
invalid file descriptor in the child process because the parent already closed it (assuming the parent runs always the first)
but the actual output is always :
i'm the child process and i read: i'm the parent process my id is 7589, and i wrote this to my child
When i fork a process after creating a pipe in order to make the
parent write to the pipe and the child read from it, How it's
synchronized ? : the parent always send data before the child attempts
to read that data although they are running concurrently? why i
don't fall on the case where the child start reading before the parent
try to send its data ?
Typically, it doesn't need to be synchronized, and in fact it can itself serve as a synchronization mechanism. If you perform blocking reads (the default) then they will not complete until the corresponding data have been sent, regardless of the relative order of the initial read and write calls.
The two processes do, however, need to implement an appropriate mechanism to demarcate and recognize message boundaries, so that the reader can recognize and respond appropriately to short reads. That may be as simple as ending each message with a newline, and reading messages with fgets() (through a stream obtained via fdopen()).
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Not an issue. Once the child is forked, it has access to the underlying open file descriptions it inherited from its parent, through the same file descriptor numbers that the parent could use, but these are separate from the parent for the purpose of determining how many times each open file description is referenced. The situation is similar to that resulting from the operation of the dup() syscall. Only when all processes close all their file descriptors for a given open file description is that open file description invalidated.
This is done internally in the kernel. When you try to read from a file descriptor (being it a pipe, or socket), if the "remote side" has not sent any data, your process stalls (the call to read does not return at all), until the other side has pushed something into the internal buffers of the kernel (in your example, wrote to the pipe).
Here you can see the internal implementation of the pipes in linux:
https://github.com/torvalds/linux/blob/master/fs/pipe.c#L272
Look for variable do_wakeup.

how to use a file descriptor in a child process after execvp?

I am trying to open a child process using fork(), and then execvp()-ing into another program.
I also want the parent process and the child process to communicate with each other using a pipe.
here is the parent process -
int pipefds[2];
pipe(pipefds); // In original code I check for errors...
int readerfd = pipefds[0];
int writerfd = pipefds[1];
if(pid == 0){
// Child
close(readerfd);
execvp("./proc2",NULL);
}
in the program 'proc2' I am trying to access the writerfd in the following way-
write(writerfd, msg, msg_len);
but instead, I get a compilation time error -
"error: ‘writerfd’ undeclared (first use in this function);"
why is that? I read here on stack overflow that "Open file descriptors are preserved across a call to exec." link. should't I be able to reach writerfd if that is so?
how can I write to that file descriptor on the child process after using execvp? what is the correct way to do this and where can I read about the answer (I looked but I didn't find..)?
Thanks!
Open file descriptors are preserved when you call an exec function. What is not preserved are the names of any variables used to store them.
You need to duplicate the file descriptor to a known file descriptor number that the other program can reference. Since the child process is writing, you should copy the child end of the pipe to file descriptor 1, which is stdout:
int pipefds[2];
pipe(pipefds);
int readerfd = pipefds[0];
int writerfd = pipefds[1];
if(pid == 0){
// Child
close(readerfd);
dup2(writerfd, 1);
execvp("./proc2",NULL);
}
Then proc2 can write to file descriptor 1:
write(1, msg, msg_len);
Or, if the message is a string, just use printf
printf("%s", msg);
fflush(stdout);

Writing to closed file descriptor doesn't raise error

In the following code:
int main(void) {
printf("before child\n");
int pid = fork();
if(pid == 0)
{
exit(0);
}
int status;
wait(&status);
if(4 != printf("abc\n"))
perror("printing to stdout\n");
return 0;
}
Produces the output:
before child
abc
The call to exit() in the child should close all file discriptors, including stdout fd.
Then how can the parent process still write to stdout after it has been closed?
Think of file descriptors as pointers to reference-counted file objects.
When you fork, the child process gets new references to the same streams as the parent process. Both the parent and child's descriptors point to the same stream object.
When your child process exits, all of the file descriptors of the child process are closed. But since the parent also has file descriptors to the stream objects, the streams don't go away.
Files and streams are only torn down once no one refers to them anymore. In this case, the parent process refers to them.
(For additional fun, check out the dup family of functions, which duplicate file descriptors in a similar way. With it, you can have, in a single process, two file descriptors for the same file.)

Can not understand the pipe() in my own shell

This is the code i found for my own shell. It works fine, but the thing i can't understand is pipe section of the code.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
char* cmndtkn[256];
char buffer[256];
char* path=NULL;
char pwd[128];
int main(){
//setting path variable
char *env;
env=getenv("PATH");
putenv(env);
system("clear");
printf("\t MY OWN SHELL !!!!!!!!!!\n ");
printf("_______________________________________\n\n");
while(1){
fflush(stdin);
getcwd(pwd,128);
printf("[MOSH~%s]$",pwd);
fgets(buffer,sizeof(buffer),stdin);
buffer[sizeof(buffer)-1] = '\0';
//tokenize the input command line
char* tkn = strtok(buffer," \t\n");
int i=0;
int indictr=0;
// loop for every part of the command
while(tkn!=NULL)
{
if(strcoll(tkn,"exit")==0 ){
exit(0);
}
else if(strcoll(buffer,"cd")==0){
path = buffer;
chdir(path+=3);
}
else if(strcoll(tkn,"|")==0){
indictr=i;
}
cmndtkn[i++] = tkn;
tkn = strtok(NULL," \t\n");
}cmndtkn[i]='\0';
// execute when command has pipe. when | command is found indictr is greater than 0.
if(indictr>0){
char* leftcmnd[indictr+1];
char* rightcmnd[i-indictr];
int a,b;
for(b=0;b<indictr;b++)
leftcmnd[b]=cmndtkn[b];
leftcmnd[indictr]=NULL;
for(a=0;a<i-indictr-1;a++)
rightcmnd[a]=cmndtkn[a+indictr+1];
rightcmnd[i-indictr]=NULL;
if(!fork())
{
fflush(stdout);
int pfds[2];
pipe(pfds);
if(!fork()){
close(1);
dup(pfds[1]);
close(pfds[0]);
execvp(leftcmnd[0],leftcmnd);
}
else{
close(0);
dup(pfds[0]);
close(pfds[1]);
execvp(rightcmnd[0],rightcmnd);
}
}else
wait(NULL);
//command not include pipe
}else{
if(!fork()){
fflush(stdout);
execvp(cmndtkn[0],cmndtkn);
}else
wait(NULL);
}
}
}
What is the purpose of the calls to close() with parameters of 0 and 1 mean and what does the call to dup() do?
On Unix, the dup() call uses the lowest numbered unused file descriptor. So, the close(1) before the call to dup() is to coerce dup() to use file descriptor 1. Similarly for close(0).
So, the aliasing is to get the process to use the write end of the pipe for stdout (file descriptor 1 is used for console output), and the read end of the pipe for stdin (file descriptor 0 is used for console input).
The code may have been more clearly expressed with dup2() instead.
dup2(fd[1], 1); /* alias fd[1] to 1 */
From your question about how ls | sort works, your question is not limited to why the dup() system call is being made. Your question is actually how pipes in Unix work, and how a shell command pipeline works.
A pipe in Unix is a pair of file descriptors that are related in that writing data on tje writable descriptor allows that data to be read from the readable descriptor. The pipe() call returns this pair in an array, where the first array element is readable, and second array element is writable.
In Unix, a fork() followed by some kind of exec() is the only way to produce a new process (there are other library calls, such as system() or popen() that create processes, but they call fork() and do an exec() under the hood). A fork() produces a child process. The child process sees the return value of 0 from the call, while the parent sees a non-zero return value that is either the PID of the child process, or a -1 indicating that an error has occurred.
The child process is a duplicate of the parent. This means that when a child modifies a variable, it is modifying a copy of the variable that resides in its own process. The parent does not see the modification occur, as the parent has the original copy). However, a duplicated pair of file descriptors that form a pipe can be used to allow a child process its parent to communicate with each other.
So, ls | sort means that there are two processes being spawned, and the output written by ls is being read as input by sort. Two processes means two calls to fork() to create two child processes. One child process will exec() the ls command, the other child process will exec() the sort command. A pipe is used between them to allow the processes to talk to each other. The ls process writes to the writable end of the pipe, the sort process reads from the readable end of the pipe.
The ls process is coerced into writing into the writable end of the pipe with the dup() call after issuing close(1). The sort process is coerced into reading the readable end of the pipe with the dup() call after close(0).
In addition, the close() calls that close the pipe file descriptors are used to make sure that the ls process is the only process to have an open reference to the writable fd, the the sort process is the only process to have an open reference to the readable fd. That step is important because after ls exits, it will close the writable end of the fd, and the sort process will expect to see an EOF as a result. However, this will not occur if some other process still has the writable fd open.
http://en.wikipedia.org/wiki/Standard_streams#Standard_input_.28stdin.29
stdin is file descriptor 0.
stdout is file descriptor 1.
In the !fork section, the process closes stdout then calls dup on pfds[1] which according to:
http://linux.die.net/man/2/dup
Creates a duplicate of the specified file descriptor at the lowest available position, which will be 1, since it was just closed (and stdin hasn't been closed yet). This means everything sent to stdout will really go to pfds[1].
So, basically, it's setting up the two new processes to talk to each other. the !fork section is for the new child which will send data to stdout (file descriptor 1), the parent (the else block) closes stdin, so it really reads from pfds[0] when it tries to read from stdout.
Each process has to close the file descriptor in pfds it's not using, as there are two open handles to the file now that the process has forked. Each process now execs to left/right-cmnd, but the new stdin and stdout mappings remain for the new processes.
Forking twice is explained here: Why fork() twice

After fork, do the parent and child process share the file descriptor created by pipe?

int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
char buffer[BUFSIZ + 1];
pid_t fork_result;
memset(buffer, '\0', sizeof(buffer));
if (pipe(file_pipes) == 0) {
fork_result = fork();
if (fork_result == -1) {
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
// We've made sure the fork worked, so if fork_result equals zero, we're in the child process.
if (fork_result == 0) {
data_processed = read(file_pipes[0], buffer, BUFSIZ);
printf("Read %d bytes: %s\n", data_processed, buffer);
exit(EXIT_SUCCESS);
}
// Otherwise, we must be the parent process.
else {
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
printf("Wrote %d bytes\n", data_processed);
}
}
exit(EXIT_SUCCESS);
}
Based on my understanding, the child process created by fork doesn't share variables with its parent process. Then, why here the parent can write to one file descriptor and child process can get the data by reading from another file descriptor. Is this because they are controled somehow by the pipe function internally?
File descriptors, including pipes, are duplicated on fork -- the child process ends up with the same file descriptor table, including stdin/out/err and the pipes, as the parent had immediately before the fork.
Based on my understanding, the child process created by fork doesn't share variables with its parent process.
This isn't entirely true -- changes to variables are not shared with the parent, but the values that the parent had immediately prior to the fork are all visible to the child afterwards.
In any case, pipes exist within the operating system, not within the process. As such, data written to one end of the pipe becomes visible to any other process holding a FD for the other end. (If more than one process tries to read the data, the first process to try to read() data gets it, and any other processes miss out.)
The variables are not shared e.g. if you write file_pipes[0] = 999 in the child, it will not be reflected in the parent. The file descriptors are shared (FD number x in the child refers to the same thing as FD number x in the parent). This is why (for example) you can redirect the output of a shell script which executes other commands (because they share the same standard output file descriptor).
You're right - ordinary variables aren't shared between the parent and the child.
However, pipes are not variables. They're a pseudo-file specifically designed to connect two independent processes together. When you write to a pipe, you're not changing a variable in the current process - you're sending data off to the operating system and asking it to make that data available to the next process to read from the pipe.
It's just like when you write to a real, on-disk file - except that the data isn't written to disk, it's just made available at the other end of the pipe.

Resources