How to read file using different processes?

How to read file using different processes? - c

I want to read a file using different processes but when i try that first created child read all file so other processes cannot read the file . For example i create 3 different process with 101,102 and 103 process ids.
a read from = 101.
b read from = 101.
c read from = 101.
d read from = 101.
But I wanted to read like that
a read from = 101.
b read from = 103.
c read from = 102.
d read from = 103.
I tried to solve it using semaphore and mutex but I couldn't do that. Could you help me, please?
int i=0, pid;
char buffer[100];
for(i=0; i<3; i++){
pid = fork();
if(pid == 0){
sem_wait(&mutex); // sem_t mutex is global.
while(read(fd,&buffer[j],1) == 1){
printf("%c read from = %d\n",buffer[j],getpid());
j++;
}
sem_post(&mutex);
exit(0);
}
else{
wait(NULL);
}
}

The problem is that even though each process has its own file descriptor, those file descriptors all share the same open file description ('descriptor' != 'description'), and the read position is stored in the file description, not the file descriptors. Consequently, when any of the children reads the file, it moves the file pointer for all the children.
For more information about this, see the POSIX specifications for:
open()
dup2()
fork()
No mutex or other similar gadget is going to fix this problem for you — at least, not on its own. The easiest fix is to reopen the file in the child processes so that each child has a separate open file description as well as its own file descriptor. Alternatively, each child will have to use a mutex, rewind the file, read the data, and release the mutex when done. It's simpler to (re)open the file in the child processes.
Note that the mutex must be shared between processes for it to be relevant. See the POSIX specification for pthread_mutexattr_setpshared(). That is not set with the default mutex attribute values.

You have two problems that prevent what you appear to want and will result in the entire file being read by (just) the first child
the parent process waits for each child immediately after creating it, before forking any more children. So after creating the first child, it will wait for that child to exit looping and creaing a second child. To fix that, you need have two loops parent -- the first just creates children and the second waits for them:
for (...) {
if (fork() == 0) {
// run the child
exit(0); } }
for (...)
wait(NULL); // wait for a child
your reading loop is inside the sem_wait/sem_post. So the first child will get the mutex, then proceed to read the entire file before releasing the mutex. Subsequent children will not get the mutex until the file is fully read, so they'll see they're at the EOF and exit. To fix this you need to move the sem_wait/sem_post calls inside the while loop:
while (!done) {
sem_wait(&mutex);
if (read(...) == 1) { ...
} else {
done = true; }
sem_post(&mutex); }
You might not even need the semaphore at all -- the kernel will synchronize reads between the different processes, so each byte will be read by exactly one child. This would allow the children to proceed in parallel (processing the bytes) rather than only allowing one child at a time to run.
Of course, even with the above, one child may process many bytes before another child starts to run and process them, so if the processing is fast and there are few bytes, they might still all be consumed by the first child.

Related

How does pipe works in c?

So I have 2 questions about pipes in c :
1:
When i fork a process after creating a pipe in order to make the parent write to the pipe and the child read from it, How it's synchronized ? : the parent always send data before the child attempts to read that data although they are running concurrently?
why i don't fall on the case where the child start reading before the parent try to send its data ?
2:
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Any help will be appreciated, Thank you !
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>
#define BUFFER_SIZE 256
int main(int argc , char*argv[])
{
pid_t worker_pid ;
int descriptor[2];
unsigned char bufferR[256] , bufferW[256];
/***************************** create pipe ********************/
puts("pipe creation");
if (pipe(descriptor) !=0)
{
fprintf(stderr,"error in the pipe");
exit(1);
}
/***************************** create processes ********************/
puts("now fork processes");
worker_pid = fork(); // fork process
if (worker_pid == -1 ) // error forking processes
{
fprintf(stderr,"error in fork");
exit(2);
}
/*********************** communicate processes ********************/
if (worker_pid == 0) // in the child process : the reader
{
close(descriptor[1]); // close input file descriptor , ok because parent finished writing
read(descriptor[0],bufferR,BUFFER_SIZE);
printf("i'm the child process and i read : %s \n",bufferR);
}
if (worker_pid !=0)
{
// is there any case where child attempts to read before parent writting ?
close(descriptor[0]);// close file descriptor of child before it reads ?
sprintf(bufferW,"i'm the parent process my id is %d , and i wrote this to my child",getpid());
write(descriptor[1],bufferW,BUFFER_SIZE);
wait(NULL);
}
return 0;
}
I expected there will be some cases for question 1 where the output is :
i'm the child process and i read :
because the parent doesn't wrote it's message yet
for question 2 i expected an error saying :
invalid file descriptor in the child process because the parent already closed it (assuming the parent runs always the first)
but the actual output is always :
i'm the child process and i read: i'm the parent process my id is 7589, and i wrote this to my child

When i fork a process after creating a pipe in order to make the
parent write to the pipe and the child read from it, How it's
synchronized ? : the parent always send data before the child attempts
to read that data although they are running concurrently? why i
don't fall on the case where the child start reading before the parent
try to send its data ?
Typically, it doesn't need to be synchronized, and in fact it can itself serve as a synchronization mechanism. If you perform blocking reads (the default) then they will not complete until the corresponding data have been sent, regardless of the relative order of the initial read and write calls.
The two processes do, however, need to implement an appropriate mechanism to demarcate and recognize message boundaries, so that the reader can recognize and respond appropriately to short reads. That may be as simple as ending each message with a newline, and reading messages with fgets() (through a stream obtained via fdopen()).
This is about file descriptors of the pipe, referring to the code below how can parent-process close the pipe-output file-descriptor while the child doesn't access to that file yet ? , assuming the parent starts first.
Not an issue. Once the child is forked, it has access to the underlying open file descriptions it inherited from its parent, through the same file descriptor numbers that the parent could use, but these are separate from the parent for the purpose of determining how many times each open file description is referenced. The situation is similar to that resulting from the operation of the dup() syscall. Only when all processes close all their file descriptors for a given open file description is that open file description invalidated.

This is done internally in the kernel. When you try to read from a file descriptor (being it a pipe, or socket), if the "remote side" has not sent any data, your process stalls (the call to read does not return at all), until the other side has pushed something into the internal buffers of the kernel (in your example, wrote to the pipe).
Here you can see the internal implementation of the pipes in linux:
https://github.com/torvalds/linux/blob/master/fs/pipe.c#L272
Look for variable do_wakeup.

Parent child interprocess comunication - is keeeping pipes open OK?

I have to implement a program in which a process sends data it has received from parent process to its child process, waits until the child sends him processed data back, and then return processed data to child process (so e.g. in case of 4 processes the data flow would look like this P1->P2->P3->P4->P3->P2->P1). For means of interprocess communication I need to use pipes. Here's an approach I planned to take:
./child
// Assert argv contains 2 pipe descriptors - for reading
// from parent and for writing to parent, both of type char[]
// I'm not handling system errors currently
int main(int argc, char *argv[]) {
int read_dsc, write_dsc;
read_dsc = atoi(argv[1]);
write_dsc = atoi(argv[2]);
char data[DATA_SIZE];
read (read_dsc, data, DATA_SIZE - 1);
close (read_dsc);
// Process data...
(...)
// Pass processed data further
int pipeRead[2]; // Child process will read from this pipe
int pipeWrite[2]; // Child process will write into this pipe
pipe(pipeRead);
pipe(pipeWrite);
switch(fork()) {
case 0:
close (pipeRead[1]);
close (pipeWrite[0]);
char pipeReadDsc[DSC_SIZE];
char pipeWriteDsc[DSC_SIZE];
printf (pipeReadDsc, "%d", pipeRead[0]);
printf (pipeWriteDsc, "%d", pipeWrite[1]);
execl ("./child", "child", pipeReadDsc, pipeWriteDsc, (char *) 0);
default:
close(pipeRead[0]);
close(pipeWrite[1]);
wait(0);
read (pipeWrite[0], data, DATA_SIZE - 1);
close (pipeWrite[0]);
// Pass data to parent process
write (write_dsc, data, DATA_SIZE - 1);
close (write_dsc);
}
}
High level description of my solution is as follows: make 2 pipes, one for writing to child process, one for reading from child process. Wait until child process finishes and then read from read pipe and pass data to parent.
The problem is I don't know whether this approach is correct. I've read somewhere that not closing unused pipes is an error as it clutters OS file descriptors and there shouldn't be many opened pipes at once. Here however we're keping unclosed pipe for reading from a child and potentially if there are n processes, there are n opened pipes when process number n processes it's data (all parent processes are waiting for data to come back). However I can't see any other way to solve this problem...
So - is my solution correct? If it isn't, how should I approach this problem?

Yes your solution is correct. But there is problems in your code:
case 0 is the child, you will benefit in redirecting pipe ends onto standard input and output (use dup or dup2); passing descriptor ids to the child is weird.
default is the parent, so you need to write before reading.
"not closing unused pipes is an error" : it is not an error but may cause problems (detecting the end of a communication would be difficult or impossible), but it seems that you correctly close all non useful pipe ends in your code, so ok. In general the number of open pipes is not really an issue, as open files...

fork ( ) - C programming

I'm having issues working out where a good starting point for this is,
I have made dot points on what I exactly need to do but am unsure if this is entirely possible.
I have a file that I want to run multiple instances of
I want a new ID assigned to each process for the file
I need to assign a char eg. 'A' that was given through argv[1] to a process
If there is already a process with the char given, print to stderr
So far,
what I am thinking is, having something like the function below. But i'm really not too sure,
any help would be awesomeness.
int createProcess(char *argv[]){
//argv[1] is given 'A'
//fork()
//getPID()
//assign PID to 'A'
}

I think you are looking for a combination of fork and execl. You can fork to create multiple instances and then replace one of the forked process with another process by using exec(In your case it is the same process). Through execl you can give command line arguments. You may need to use sprintf in the exec'd process and sscanf in the original process. I guess this is enough hint.

I have a file that I want to run multiple instances of
To do that you have two options :
1. You can use multiple fork() system call to duplicate new child processes and open the file in those processes.
2. You can have multiple threads in your program that open the same file.
But looking at the next three dots, fork() is the choice to go with.
I want a new ID assigned to each process for the file
When you duplicate processes using fork() each process gets its own unique process Id(pid).
I need to assign a char eg. 'A' that was given through argv[1] to a process
For this you need to use one of the many calls in the "exec" family.By using "exec"
you can also pass the command line parameters to the newly created processes.
This cannot be done by fork because fork is used to duplicate the current process, whereas if you want to create a totally new process you must use exec calls.
Edit :
In order to get the command line parameters being passed to a process, you need to
know its process id and then you can look for a directory with its name same as the pid
inside the /proc file system( not mounted on actual device ). When you find the directory
you will get the parameters passed to it in a file named "cmdline".
For more detail you can read about "/proc" file system.

You will need to create multiple forking (preferably iteratively) and index your children.* One way to do that is to let the original parent loop, and only let that process do the fork. The original parent loops k times, only creating one child process per iteration. On the created child, you do stuff only the current child process will, such as assign an identifier (such as the loop counter), perform exec, and exit after the child performs everything so it does not go to the next iteration to fork to create grandchild.
Please note that the call fork() is a syscall that causes the original process (now called parent process) to create a duplicate (called child process), as well as return an int value for the parent process only.
One thing you need to observe is that the forked processes are identical with only two exceptions: the value returned by fork() and the process pid (child usually have higher pid). The value returned on the parent is the child's PID. The value on the child process is always zero. Identifying returned value of fork() is the only way to identify it the process is a parent or child.
I have a file that I want to run multiple instances of
You may need to use a combination of fork() and exec. It is not clear which type of file you want to run. Are you reading from a file, writing from a file, or executing a file?
I want a new ID assigned to each process for the file
The PID itself is a new unique ID at the time a new process is created. However, you can use a counter so that only the parent can create multiple child processes, each with a unique ID.
I need to assign a char eg. 'A' that was given through argv[1] to a process
argv[1] is a string (char array), not a char.
If there is already a process with the char given, print to stderr
It is possible that you can keep track of all identifier chars on the original parent.
Here is some sample C code where only the parent creates the forking:
int main() {
for (int k = 1; k <= 16; k++) {
int r = fork();
if (r == 0) { // kth CHILD
printf("[%d] %d\n", getpid(), k);
exit(0);
}
else if (r > 0) {
int status;
wait(&status);
printf("[%d] P\n", getpid());
}
else return 1;
}
return 0;
}

If I understand what you want correctly is to "assign" different chars to different instances of the forked process.
You can do something like this:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]){
char chr = *argv[1];
pid_t res;
res = fork();
if (!res)
chr++;
printf("%c \n", chr);
return 0;
}

How do I chain stdout in one child process to stdin in another child in C?

I've been messing around in C trying to figure out how to do this. Let's say I have my main program, the parent process. The parent creates three child processes, each of which will eventually run programs (but that's not important right now). What I'd like to do is make it so that the first child's stdout will be received by the second child's stdin. The second child's stdout will then be received by the third child's stdin.
The parent process's stdin/stdout aren't messed with at all.
So far, what I've got is
pipe(procpipe);
parentPid = getpid();
for(i = 0; i < 3; i++)
{
if(getpid() == parentPid)
{
child[i] = fork();
if(child[i] == 0)
{
mynumber = i+1;
}
}
}
But from there I'm kind of stuck as to how to use dup2 to properly assign my pipes, and in which section of my code to insert it. There are lots of examples on Google and this website of how to pipe from a parent to a child, but I'm yet to see one that will tell me exactly how to connect a child's stdout to another child's stdin.
Edit:
Forgot to mention: assume all my variables are properly initialised. The int 'mynumber' is so a child process knows upon creation which number it is, so I can give it instructions via
if(mynumber == whatever)

So you have a loop that creates several child processes. Each of these child processes will be using two pipes: read from previous and write to the next. To set up a pipe for the reading end you need to close the write end of the pipe, and dup2 the read end into the stdin. Similar for the pipe where the process will be writing.
void set_read(int* lpipe)
{
    dup2(lpipe[0], STDIN_FILENO);
    close(lpipe[0]); // we have a copy already, so close it
    close(lpipe[1]); // not using this end
}
  
void set_write(int* rpipe)
{
    dup2(rpipe[1], STDOUT_FILENO);
    close(rpipe[0]); // not using this end
    close(rpipe[1]); // we have a copy already, so close it
}
When you fork each children you need to attach the pipes to it.
void fork_and_chain(int* lpipe, int* rpipe)
{
    if(!fork())
    {
        if(lpipe) // there's a pipe from the previous process
            set_read(lpipe);
// else you may want to redirect input from somewhere else for the start
        if(rpipe) // there's a pipe to the next process
            set_write(rpipe);
// else you may want to redirect out to somewhere else for the end
// blah do your stuff
// and make sure the child process terminates in here
// so it won't continue running the chaining code
    }
}
With this in hand you can now write a loop that continuously forks, attaches the pipes, and then reuses the output pipe as the input pipe for the next one. Of course, once both ends of a pipe have been connected to child processes, the parent should not leave it open for itself.
// This assumes there are at least two processes to be chained :)
// two pipes: one from the previous in the chain, one to the next in the chain
int lpipe[2], rpipe[2];
// create the first output pipe
pipe(rpipe);
// first child takes input from somewhere else
fork_and_chain(NULL, rpipe);
// output pipe becomes input for the next process.
lpipe[0] = rpipe[0];
lpipe[1] = rpipe[1];
// chain all but the first and last children
for(i = 1; i < N - 1; i++)
{
    pipe(rpipe); // make the next output pipe
    fork_and_chain(lpipe, rpipe);
close(lpipe[0]); // both ends are attached, close them on parent
close(lpipe[1]);
    lpipe[0] = rpipe[0]; // output pipe becomes input pipe
    lpipe[1] = rpipe[1];
}
// fork the last one, its output goes somewhere else   
fork_and_chain(lpipe, NULL);
close(lpipe[0]);
close(lpipe[1]);
The closing bits are very important! When you fork with an open pipe, there will be four open file descriptors: two on the parent process, and two others on the child process. You have to close all of those you won't be using. That's why the code above always closes the irrelevant ends of the pipes in the child processes, and both ends on the parent.
Also note that I am giving special treatment to the first and the last processes, because I don't know where the input for the chain will come from, and where the output will go to.

After fork, do the parent and child process share the file descriptor created by pipe?

int main()
{
int data_processed;
int file_pipes[2];
const char some_data[] = "123";
char buffer[BUFSIZ + 1];
pid_t fork_result;
memset(buffer, '\0', sizeof(buffer));
if (pipe(file_pipes) == 0) {
fork_result = fork();
if (fork_result == -1) {
fprintf(stderr, "Fork failure");
exit(EXIT_FAILURE);
}
// We've made sure the fork worked, so if fork_result equals zero, we're in the child process.
if (fork_result == 0) {
data_processed = read(file_pipes[0], buffer, BUFSIZ);
printf("Read %d bytes: %s\n", data_processed, buffer);
exit(EXIT_SUCCESS);
}
// Otherwise, we must be the parent process.
else {
data_processed = write(file_pipes[1], some_data,
strlen(some_data));
printf("Wrote %d bytes\n", data_processed);
}
}
exit(EXIT_SUCCESS);
}
Based on my understanding, the child process created by fork doesn't share variables with its parent process. Then, why here the parent can write to one file descriptor and child process can get the data by reading from another file descriptor. Is this because they are controled somehow by the pipe function internally?

File descriptors, including pipes, are duplicated on fork -- the child process ends up with the same file descriptor table, including stdin/out/err and the pipes, as the parent had immediately before the fork.
Based on my understanding, the child process created by fork doesn't share variables with its parent process.
This isn't entirely true -- changes to variables are not shared with the parent, but the values that the parent had immediately prior to the fork are all visible to the child afterwards.
In any case, pipes exist within the operating system, not within the process. As such, data written to one end of the pipe becomes visible to any other process holding a FD for the other end. (If more than one process tries to read the data, the first process to try to read() data gets it, and any other processes miss out.)

The variables are not shared e.g. if you write file_pipes[0] = 999 in the child, it will not be reflected in the parent. The file descriptors are shared (FD number x in the child refers to the same thing as FD number x in the parent). This is why (for example) you can redirect the output of a shell script which executes other commands (because they share the same standard output file descriptor).

You're right - ordinary variables aren't shared between the parent and the child.
However, pipes are not variables. They're a pseudo-file specifically designed to connect two independent processes together. When you write to a pipe, you're not changing a variable in the current process - you're sending data off to the operating system and asking it to make that data available to the next process to read from the pipe.
It's just like when you write to a real, on-disk file - except that the data isn't written to disk, it's just made available at the other end of the pipe.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight