I have two file open in two different processes. There's a pipe connecting the two. Is it possible to write directly from one file to another? Especially if the process reading doesn't know the size of the file it's trying to read?
I was hoping to do something like this
#define length 100
int main(){
int frk = fork();
int pip[2];
pipe(pip);
if (frk==0){ //child
FILE* fp fopen("file1", "r");
write(pip[1],fp,length);
}
else {
FILE* fp fopen("file2", "w");
read(pip[0],fp,length);
}
Is it possible to write directly from one file to another?
C does not provide any mechanism for that, and it seems like it would require specialized hardware support. The standard I/O paradigm is that data get read from their source into memory or written from memory to their destination. That pesky "memory" in the middle means copying from one file to another cannot be direct.
Of course, you can write a function or program that performs such a copy, hiding the details from you. This is what the cp command does, after all, but the C standard library does not contain a function for that purpose.
Especially if the process reading doesn't know the size of the file it's trying to read?
That bit isn't very important. One simply reads and then writes (only) what one has read, repeating until there is nothing more to read. "Nothing more to read" means that a read attempt indicates by its return value that the end of the file has been reached.
If you want one process to read one file and the other to write that data to another file, using a pipe to convey data between the two, then you need both processes to implement that pattern. One reads from the source file and writes to the pipe, and the other reads from the pipe and writes to the destination file.
Special note: for the process reading from the pipe to detect EOF on that pipe, the other end has to be closed, in both processes. After the fork, each process can and should close the pipe end that it doesn't intend to use. The one using the write end then closes that end when it has nothing more to write to it.
In other unix systems, like BSD, there's a call to connect directly two file descriptors to do what you want, but don't know if there's a system call to do that in linux. Anywya, this cannot be done with FILE * descriptors, as these are the instance of a buffered file used by <stdio.h> library to represent a file. You can get the file descriptor (as the system knows it) of a FILE * instance by a call to the getfd(3) function call.
The semantics you are trying to get from the system are quite elaborate, as you want something to pass directly the data from one file descriptor to another, without intervention of any process (directly in the kernel), and the kernel needs for that a pool of threads to do the work of copying directly from the read calls to the write ones.
The old way of doing this is to create a thread that makes the work of reading from one file descriptor (not a FILE * pointer) and write to the other.
Another thing to comment is that the pipe(2) system call gives you two connected descriptors, that allow you to read(2) in one (the 0 index) what is write(2)n in the second (the 1 index). If you fork(2) a second process, and you do the pipe(2) call on both, you will have two pipes (with two descriptors each), one in each process, with no relationship between them. You will be able only to communicate each process with itself, but not with the other (which doesn't know anything about the other process' pipe descriptors) so no communication between them will be possible.
Next is a complete example of what you try to do:
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#define length 100
#define FMT(fmt) "pid=%d:"__FILE__":%d:%s: " fmt, getpid(), __LINE__, __func__
#define ERR(fmt, ...) do { \
fprintf(stderr, \
FMT(fmt ": %s (errno = %d)\n"), \
##__VA_ARGS__, \
strerror(errno), errno); \
exit(1); \
} while(0)
void copy(int fdi, int fdo)
{
unsigned char buffer[length];
ssize_t res, nread;
while((nread = res = read(fdi, buffer, sizeof buffer)) > 0) {
res = write(fdo, buffer, nread);
if (res < 0) ERR("write");
} /* while */
if (res < 0) ERR("read");
} /* copy */
int main()
{
int pip[2];
int res;
res = pipe(pip);
if (res < 0) ERR("pipe");
char *filename;
switch (res = fork()) {
case -1: /* error */
ERR("fork");
case 0: /* child */
filename = "file1";
res = open(filename, O_RDONLY);
if (res < 0) ERR("open \"%s\"", filename);
close(pip[0]);
copy(res, pip[1]);
break;
default: /* parent, we got the child's pid in res */
filename = "file2";
res = open(filename, O_CREAT | O_TRUNC | O_WRONLY, 0666);
if (res < 0) ERR("open \"%s\"", filename);
close(pip[1]);
copy(pip[0], res);
int status;
res = wait(&status); /* wait for the child to finish */
if (res < 0) ERR("wait");
fprintf(stderr,
FMT("The child %d finished with exit code %d\n"),
res,
status);
break;
} /* switch */
exit(0);
} /* main */
Related
I'm trying to trigger some concurrent conflicts by having several processes writing to the same file, but couldn't:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>
void concurrent_write()
{
int create_fd = open("bar.txt", O_CREAT | O_TRUNC, 0644);
close(create_fd);
int repeat = 20;
int num = 4;
for (int process = 0; process < num; process++)
{
int rc = fork();
if (rc == 0)
{
// child
int write_fd = open("bar.txt", O_WRONLY | O_APPEND, 0644);
for (int idx = 0; idx < repeat; idx++)
{
sleep(1);
write(write_fd, "child writing\n", strlen("child writing\n"));
}
close(write_fd);
exit(0);
}
}
for (int process = 0; process < num; process++)
{
wait(NULL);
// wait for all children to exits
}
printf("write to `bar.txt`\n%d lines written by %d process\n", repeat * num, num);
printf("wc:");
if (fork() == 0)
{
// child
char *args[3];
args[0] = strdup("wc");
args[1] = strdup("bar.txt");
args[2] = NULL;
execvp(args[0], args);
}
}
int main(int argc, char *argv[])
{
concurrent_write();
return 0;
}
This program fork #num children and then have all of them write #repeat lines to a file. But every time (however I change #repeat and #num) I got the same result that the length of bar.txt (output file) matched the number of total written lines. Why is there no concurrent conflicts triggered?
Writing to a file can be divided into a two-step process:
Locate where you want to write.
Write data into the file.
You open a file with flag O_APPEND and it ensures that the two-step process is atomic. So, you can always find the lines of the file as the count you set.
See the open(2) man page:
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
In essence, one of the major design features of O_APPEND is precisely to prevent the sort of "concurrent conflicts" you mention. The typical example would be a log file that several processes must write to. Using O_APPEND ensures their messages do not overwrite each other.
Moreover, all data written by a single write call is written atomically, so provided that your write("child writing\n") successfully writes all its bytes (which for a regular file it usually would), they will not be interleaved with the bytes of any other such message.
First, write() calls with the O_APPEND flag should be atomic. Per POSIX write():
If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
But that's not enough when there are multiple threads or processes making parallel write() calls on the same file - that does not guarantee that parallel write() calls are atomic.
POSIX does guarantee that parallel write() calls are also atomic:
All of the following functions shall be atomic with respect to each
other in the effects specified in POSIX.1-2017 when they operate on
regular files or symbolic links:
...
write()
...
See also Is file append atomic in UNIX?
Beware, though. Reading that question and its answers shows that Linux filesystems such as ext3 are not POSIX compliant once you get past a relatively small size operation, or possibly if you cross page and/or file system sector boundaries. I suspect XFS and ZFS will support write() atomicity much better given their origins.
And none of this applies to Windows.
I'm trying to understand what is behind this behaviour in my parent process.
Basically, I create a child process and connect its stdout to my pipe. The parent process continuously reads from the pipe and does some stuff.
I noticed that when inserting the while loop in the parent the stdout seems to be lost, nothing appears on the terminal etc I thought that the output of stdout would somehow go to the pipe (maybe an issue with dup2) but that doesn't seem to be the issue. If I don't continuously fflush(stdout) in the parent process, whatever I'm trying to get to the terminal just won't show. Without a while loop in the parent it works fine, but I'm really not sure why it's happening or if the rest of my implementation is problematic somehow.
Nothing past the read system call seems to be going to the stdout in the parent process. Assuming the output of inotifywait in the pipe is small enough ( 30 > bytes ), what exactly is wrong with this program?
What I expect to happen is the stdout of inotifywait to go to the pipe, then for the parent to read the message, run strtok and print the file name (which only appears in stdout when I fflush)
Running the program with inotify installed and creating any file in the current directory of the program should be enough. Removing the while loop does print the created file's name (as expected).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <errno.h>
int main(void) {
char b[100];
int pipefd;
if (mkfifo("fifo", 0666) == -1) {
if (errno != EEXIST) {
perror("mkfifo");
exit(EXIT_FAILURE);
}
}
pid_t pid = fork();
if (pid < 0) {
perror("fork");
exit(1);
}
if ((pipefd = open("fifo", O_RDWR)) < 0) {
perror("open pipe");
exit(EXIT_FAILURE);
}
if (pid == 0) {
dup2(pipefd, 1);
const char* dir = ".";
const char* args[] = {"inotifywait", dir, "-m", "-e",
"create", "-e", "moved_to", NULL};
execvp("inotifywait", (char**)args);
perror("inotifywait");
} else {
while (1) {
fflush(stdout); // the output only appears in stdout with this here
if (read(pipefd, b, 30) < 0) {
perror("problem # read");
exit(1);
}
char filename[30];
printf("anything");
sscanf(b, "./ CREATE %s", filename);
printf("%s", filename);
}
}
}
The streams used by the C standard library are designed in such a way that they are normally buffered (except for the standard error stream stderr).
The standard output stream is normally line buffered, unless the output device is not an interactive device, in which case it is normally fully buffered. Therefore, in your case, it is probably line buffered.
This means that the buffer will only be flushed
when it is full,
when an \n character is encountered,
when the stream is closed (e.g. during normal program termination),
when reading input from an unbuffered or line-buffered stream (in certain situations), or
when you explicitly call fflush.
This explains why you are not seeing the output, because none of the above are happening in your infinite loop (when you don't call fflush). Although you are reading input, you are not doing this from a C standard library FILE * stream. Instead, you are bypassing the C runtime library (e.g. glibc) by using the read system call directly (i.e. you are using a file descriptor instead of a stream).
The simplest solution to your problem would probably be to replace the line
printf("%s", filename);
with:
printf("%s\n", filename);
If stdout is line-buffered (which should be the case if it is connected to a terminal), then the input should automatically be flushed after every line and an explicit call to fflush should no longer be necessary.
Throughout my years as a C programmer, I've always been confused about the standard stream file descriptors. Some places, like Wikipedia[1], say:
In the C programming language, the standard input, output, and error streams are attached to the existing Unix file descriptors 0, 1 and 2 respectively.
This is backed up by unistd.h:
/* Standard file descriptors. */
#define STDIN_FILENO 0 /* Standard input. */
#define STDOUT_FILENO 1 /* Standard output. */
#define STDERR_FILENO 2 /* Standard error output. */
However, this code (on any system):
write(0, "Hello, World!\n", 14);
Will print Hello, World! (and a newline) to STDOUT. This is odd because STDOUT's file descriptor is supposed to be 1. write-ing to file descriptor 1
also prints to STDOUT.
Performing an ioctl on file descriptor 0 changes standard input[2], and on file descriptor 1 changes standard output. However, performing termios functions on either 0 or 1 changes standard input[3][4].
I'm very confused about the behavior of file descriptors 1 and 0. Does anyone know why:
writeing to 1 or 0 writes to standard output?
Performing ioctl on 1 modifies standard output and on 0 modifies standard input, but performing tcsetattr/tcgetattr on either 1 or 0 works for standard input?
I guess it is because in my Linux, both 0 and 1 are by default opened with read/write to the /dev/tty which is the controlling terminal of the process. So indeed it is possible to even read from stdout.
However this breaks as soon as you pipe something in or out:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
int main() {
errno = 0;
write(0, "Hello world!\n", 14);
perror("write");
}
and run with
% ./a.out
Hello world!
write: Success
% echo | ./a.out
write: Bad file descriptor
termios functions always work on the actual underlying terminal object, so it doesn't matter whether 0 or 1 is used for as long as it is opened to a tty.
Let's start by reviewing some of the key concepts involved:
File description
In the operating system kernel, each file, pipe endpoint, socket endpoint, open device node, and so on, has a file description. The kernel uses these to keep track of the position in the file, the flags (read, write, append, close-on-exec), record locks, and so on.
The file descriptions are internal to the kernel, and do not belong to any process in particular (in typical implementations).
File descriptor
From the process viewpoint, file descriptors are integers that identify open files, pipes, sockets, FIFOs, or devices.
The operating system kernel keeps a table of descriptors for each process. The file descriptor used by the process is simply an index to this table.
The entries to in the file descriptor table refer to a kernel file description.
Whenever a process uses dup() or dup2() to duplicate a file descriptor, the kernel only duplicates the entry in the file descriptor table for that process; it does not duplicate the file description it keeps to itself.
When a process forks, the child process gets its own file descriptor table, but the entries still point to the exact same kernel file descriptions. (This is essentially a shallow copy, will all file descriptor table entries being references to file descriptions. The references are copied; the referred to targets remain the same.)
When a process sends a file descriptor to another process via an Unix Domain socket ancillary message, the kernel actually allocates a new descriptor on the receiver, and copies the file description the transferred descriptor refers to.
It all works very well, although it is a bit confusing that "file descriptor" and "file description" are so similar.
What does all that have to do with the effects the OP is seeing?
Whenever new processes are created, it is common to open the target device, pipe, or socket, and dup2() the descriptor to standard input, standard output, and standard error. This leads to all three standard descriptors referring to the same file description, and thus whatever operation is valid using one file descriptor, is valid using the other file descriptors, too.
This is most common when running programs on the console, as then the three descriptors all definitely refer to the same file description; and that file description describes the slave end of a pseudoterminal character device.
Consider the following program, run.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
static void wrerrp(const char *p, const char *q)
{
while (p < q) {
ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
if (n > 0)
p += n;
else
return;
}
}
static inline void wrerr(const char *s)
{
if (s)
wrerrp(s, s + strlen(s));
}
int main(int argc, char *argv[])
{
int fd;
if (argc < 3) {
wrerr("\nUsage: ");
wrerr(argv[0]);
wrerr(" FILE-OR-DEVICE COMMAND [ ARGS ... ]\n\n");
return 127;
}
fd = open(argv[1], O_RDWR | O_CREAT, 0666);
if (fd == -1) {
const char *msg = strerror(errno);
wrerr(argv[1]);
wrerr(": Cannot open file: ");
wrerr(msg);
wrerr(".\n");
return 127;
}
if (dup2(fd, STDIN_FILENO) != STDIN_FILENO ||
dup2(fd, STDOUT_FILENO) != STDOUT_FILENO) {
const char *msg = strerror(errno);
wrerr("Cannot duplicate file descriptors: ");
wrerr(msg);
wrerr(".\n");
return 126;
}
if (dup2(fd, STDERR_FILENO) != STDERR_FILENO) {
/* We might not have standard error anymore.. */
return 126;
}
/* Close fd, since it is no longer needed. */
if (fd != STDIN_FILENO && fd != STDOUT_FILENO && fd != STDERR_FILENO)
close(fd);
/* Execute the command. */
if (strchr(argv[2], '/'))
execv(argv[2], argv + 2); /* Command has /, so it is a path */
else
execvp(argv[2], argv + 2); /* command has no /, so it is a filename */
/* Whoops; failed. But we have no stderr left.. */
return 125;
}
It takes two or more parameters. The first parameter is a file or device, and the second is the command, with the rest of the parameters supplied to the command. The command is run, with all three standard descriptors redirected to the file or device named in the first parameter. You can compile the above with gcc using e.g.
gcc -Wall -O2 run.c -o run
Let's write a small tester utility, report.c:
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
int main(int argc, char *argv[])
{
char buffer[16] = { "\n" };
ssize_t result;
FILE *out;
if (argc != 2) {
fprintf(stderr, "\nUsage: %s FILENAME\n\n", argv[0]);
return EXIT_FAILURE;
}
out = fopen(argv[1], "w");
if (!out)
return EXIT_FAILURE;
result = write(STDIN_FILENO, buffer, 1);
if (result == -1) {
const int err = errno;
fprintf(out, "write(STDIN_FILENO, buffer, 1) = -1, errno = %d (%s).\n", err, strerror(err));
} else {
fprintf(out, "write(STDIN_FILENO, buffer, 1) = %zd%s\n", result, (result == 1) ? ", success" : "");
}
result = read(STDOUT_FILENO, buffer, 1);
if (result == -1) {
const int err = errno;
fprintf(out, "read(STDOUT_FILENO, buffer, 1) = -1, errno = %d (%s).\n", err, strerror(err));
} else {
fprintf(out, "read(STDOUT_FILENO, buffer, 1) = %zd%s\n", result, (result == 1) ? ", success" : "");
}
result = read(STDERR_FILENO, buffer, 1);
if (result == -1) {
const int err = errno;
fprintf(out, "read(STDERR_FILENO, buffer, 1) = -1, errno = %d (%s).\n", err, strerror(err));
} else {
fprintf(out, "read(STDERR_FILENO, buffer, 1) = %zd%s\n", result, (result == 1) ? ", success" : "");
}
if (ferror(out))
return EXIT_FAILURE;
if (fclose(out))
return EXIT_FAILURE;
return EXIT_SUCCESS;
}
It takes exactly one parameter, a file or device to write to, to report whether writing to standard input, and reading from standard output and error work. (We can normally use $(tty) in Bash and POSIX shells, to refer to the actual terminal device, so that the report is visible on the terminal.) Compile this one using e.g.
gcc -Wall -O2 report.c -o report
Now, we can check some devices:
./run /dev/null ./report $(tty)
./run /dev/zero ./report $(tty)
./run /dev/urandom ./report $(tty)
or on whatever we wish. On my machine, when I run this on a file, say
./run some-file ./report $(tty)
writing to standard input, and reading from standard output and standard error all work -- which is as expected, as the file descriptors refer to the same, readable and writable, file description.
The conclusion, after playing with the above, is that there is no strange behaviour here at all. It all behaves exactly as one would expect, if file descriptors as used by processes are simply references to operating system internal file descriptions, and standard input, output, and error descriptors are duplicates of each other.
I'm having some trouble using dup2 in trying to redirect both stdout and stderr into the same output file.
I'm using this explanatory code sample: (gcc 4.8.2, Ubuntu 14.04)
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#define USE2FILES
int main()
{
int f1, f2, status;
f1 = open("test.out", O_CREAT | O_WRONLY, 0644);
if (f1 == -1) {
perror("open(): ");
}
status = dup2(f1, STDOUT_FILENO);
if (status == -1) {
perror("dup2(): ");
}
#ifdef USE2FILES
close(f1);
#endif
#ifdef USE2FILES
f2 = open("test.out", O_CREAT | O_WRONLY, 0644);
if (f2 == -1) {
perror("dup2(): ");
}
#else
f2 = f1;
#endif
status = dup2(f2, STDERR_FILENO);
if (status == -1) {
perror("dup2(): ");
}
close(f2);
fprintf(stderr, "test_stderr1\n");
fprintf(stdout, "test_stdout1\n");
fprintf(stderr, "test_stderr2\n");
fprintf(stdout, "test_stdout2\n");
fprintf(stderr, "test_stderr3\n");
fprintf(stdout, "test_stdout3\n");
fflush(stdout);
fflush(stderr);
return 0;
}
USE2FILES macro is supposed to switch between using either 2 file descriptors (to the same file) which get duped to stdout and stderr respectivly or 1 file descriptor which gets duplicated both to stdout and stderr.
I was under the impression that using 2 distinct file descriptors for redirection should work. However running this piece of code with USE2FILES on issues the following output in test.out:
test_stdout1
test_stdout2
test_stdout3
If I then disable USE2FILES I get:
test_stderr1
test_stderr2
test_stderr3
test_stdout1
test_stdout2
test_stdout3
Seems like in the first case no output towards stderr gets through. Is this behavior to be expected (am I missing something)?
EDIT: After accepted Chris Dodd's answer:
That's indeed a poor example. Changing the fprintf sequence to something like this:
fprintf(stderr, "test_stderr+++++++++++++++++++++++++++++++++++++++++++++++++1\n");
fprintf(stdout, "test_stdout----------------------------------------1\n");
fprintf(stderr, "test_stderr++++++++++++++++++++++++++++++++++2\n");
fprintf(stdout, "test_stdout----------------2\n");
fprintf(stderr, "test_stderr++++++++++++++++++++++++++++3\n");
fprintf(stdout, "test_stdout----------------------3\n");
gets me this test.out output:
test_stdout----------------------------------------1
test_stdout----------------2
test_stdout----------------------3
err++++++++++++++++++++++++++++3
showing pretty clearly stdout & stderr are competing with their writes over the same file.
If you do two open calls, you get two distinct kernel filehandles, each with its own I/O cursor (file offset), so writes to the two file descriptors will overwrite each other. If you use a single open call, you only get a single filehandle that both file descriptors refer to, so each write (to each descriptor) will advance the output offset so the next write (with the other file descriptor) will write after it.
In your example, the strings written are the exact same length, so the write to stdout exactly overwrites the preceeding write to stderr. Note that the file write only occurs when the FILE object is flushed, not (necessarily) when fprintf is called.
You could also get the effect you seem to be trying to get by opening the files in O_APPEND mode. This will cause every write to reposition the write offset to the current end of the file just before actually writing.
Can I make an anonymous stream in c? I don't want to create a new file on the file system, just have a stream that one function can fwrite to while the other can fread from it. Not c++, c.
Maybe You're looking for pipes.
Forward Your STDOUT to the pipe.
Then the other application would read from the pipe.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#define RDR 0
#define WTR 1
char ** parseargs(char *string);
int main(void){
char mode = 'r';
char prog[50] = "/bin/ps --version";
char **argv;
int p[2];
pid_t pid;
FILE *readpipe;
int pipein, pipeout;
char buf;
/* create the pipe */
if(pipe(p) != 0){
fprintf(stderr, "error: could not open pipe\n");
}
pipein = p[RDR];
pipeout = p[WTR];
if((pid = fork()) == (pid_t) 0){
close(pipein);
dup2(pipeout, 1);
close(pipeout);
if(execv(argv[0], argv) == -1){
fprintf(stderr, "error: failed to execute %s\n", argv[0]);
}
_exit(1);
}
close(pipeout);
readpipe = fdopen(pipein, &mode);
while(!feof(readpipe)){
if(1 == fread(&buf, sizeof(char), 1, readpipe)){
fprintf(stdout, "%c", buf);
}
}
return 0;
}
Yes, tmpfile() is one way to do it. However, I believe tmpfile() is frowned upon these days due to security concerns.
So, you should use mkstemp in POSIX or tmpfile_s in Windows instead of tmpfile().
These will all still create files in the filesystem, though. They're temporary in that they "go away" when the program exits.
Another option, which doesn't create a physical file is mmap().
Oops, just found it... maybe. tmpfile() returns a tmeporary FILE *
Is that the right way to do it?
If you're on Unix (or a similar OS), you want to read Beej's Guide to Unix Interprocess Communication (it's a good read no matter what your OS is).
Check it out at Beej's Guides.
In a rapid glance there I noticed a few things you could probably use with more or less work (and with the optional creation of a file/resource):
Pipes
FIFOs
Message Queues
Shared Memory Segments
Memory Mapped Files
Unix Sockets