How to properly fread & fwrite from & to a pipe - c

I have this code which acts as a pipe between two shell invocations.
It reads from a pipe, and writes into a different one.
#include <stdio.h>
#include <stdlib.h>
#define BUFF_SIZE (0xFFF)
/*
* $ cat /tmp/redirect.txt |less
*/
int main(void)
{
FILE *input;
FILE *output;
int c;
char buff[BUFF_SIZE];
size_t nmemb;
input = popen("cat /tmp/redirect.txt", "r");
output = popen("less", "w");
if (!input || !output)
exit(EXIT_FAILURE);
#if 01
while ((c = fgetc(input)) != EOF)
fputc(c, output);
#elif 01
do {
nmemb = fread(buff, 1, sizeof(buff), input);
fwrite(buff, 1, nmemb, output);
} while (nmemb);
#elif 01
while (feof(input) != EOF) {
nmemb = fread(buff, 1, sizeof(buff), input);
fwrite(buff, 1, nmemb, output);
}
#endif
/*
* EDIT: The previous implementation is incorrect:
* feof() return non-zero if EOF is set
* EDIT2: Forgot the !. This solved the problem.
*/
#elif 01
while (feof(input)) {
nmemb = fread(buff, 1, sizeof(buff), input);
fwrite(buff, 1, nmemb, output);
}
#endif
pclose(input);
pclose(output);
return 0;
}
I want it to be efficient, so I want to implement it with fread()&fwrite(). There are the 3 way I tried.
The first one is implemented with fgetc()&fputc() so it will be very slow. However it works fine because it checks for EOF so it will wait until cat (or any shell invocation I use) finishes its job.
The second one is faster, but I'm concerned that I don't check for EOF so if there is any moment when the pipe is empty (but the shell invocation hasn't finished, so may not be empty in the future), it will close the pipe and end.
The third implementation is what I would like to do, and it relatively works (all the text is received by less), but for some reason it gets stuck and doesn't close the pipe (seems like it never gets the EOF).
EDIT: Third implementation is buggy. Fourth tries to solve the bug, but now less doesn't receive anything.
How should this be properly done?

First of all, to say that I think you are having problems more with buffering, than with efficiency. That is a common problem when first dealing with the stdio package.
Second, the best (and simplest) implementation of a simple data copier from input to output is the following snippet (copied from K&R first ed.).
while((c = fgetc(input)) != EOF)
fputc(c, output);
(well, not a literal copy, as there, K&R use stdin and stdout as FILE* descriptors, and they use the simpler getchar(); and putchar(c); calls.) When you try to do better than this, normally you incur in some false assumptions, as the fallacy of the lack of buffering or the number of system calls.
stdio does full buffering when the standard output is a pipe (indeed, it does full buffering always except when the file descriptor gives true to the isatty(3) function call), so you should do, in the case you want to see the output as soon as it is available, at least, no output buffering (with something like setbuf(out, NULL);, or fflush()) your output at some point, so it doesn't get buffered in the output while you are waiting in the input for more data.
What it seems to be is that you see that the output for the less(1) program is not visible, because it is being buffered in the internals of your program. And that is exactly what is happening... suppose you feed your program (which, despite of the handling of individual characters, is doing full buffering) doesn't get any input until the full input buffer (BUFSIZ characters) have been feeded to it. Then, a lot of single fgetc() calls are done in a loop, with a lot of fputc() calls are done in a loop (exactly BUFSIZ calls each) and the buffer is filled at the output. But this buffer is not written, because it need one more char to force a flush. So, until you get the first two BUFSIZ chunks of data, you don't get anything written to less(1).
A simple, and efficient way is to check after fputc(c, out); if the char is a \n, and flush output with fflush(out); in that case, and so you'll write a line of output at a time.
fputc(c, out);
if (c == '\n') fflush(out);
If you don't do something, the buffering is made in BUFSIZ chunks, and normally, not before you have such an amount of data in the output side. And remember always to fclose() things (well, this is handled by stdio), or you can lose output in case your process gets interrupted.
IMHO the code you should use is:
while ((c = fgetc(input)) != EOF) {
fputc(c, output);
if (c == '\n') fflush(output);
}
fclose(input);
fclose(output);
for the best performance, while not blocking unnecessarily the output data in the buffer.
BTW, doing fread() and fwrite() of one char, is a waste of time and a way to complicate things a lot (and error prone). fwrite() of one char will not avoid the use of buffers, so you won't get more performance than using fputc(c, output);.
BTW(bis) if you want to do your own buffering, don't call stdio functions, just use read(2) and write(2) calls on normal system file descriptors. A good approach is:
int input_fd = fileno(input); /* input is your old FILE * given by popen() */
int output_fd = fileno(output);
while ((n = read(input_fd, your_buffer, sizeof your_buffer)) > 0) {
write(output_fd, your_buffer, n);
}
switch (n) {
case 0: /* we got EOF */
...
break;
default: /* we got an error */
fprintf(stderr, "error: read(): %s\n", strerror(errno));
...
break;
} /* switch */
but this will awaken your program only when the buffer is fully filled with data, or there's no more data.
If you want to feed your data to less(1) as soon as you have one line for less, then you can disable completely the input buffer with:
setbuf(input, NULL);
int c; /* int, never char, see manual page */
while((c == fgetc(input)) != EOF) {
putc(c, output);
if (c == '\n') fflush(output);
}
And you'll get less(1) working as soon as you have produced a single line of output text.
What are you exactly trying to do? (This would be nice to know, as you seem to be reinventing the cat(1) program, but with reduced functionality)

Simplest solution:
while (1) {
nmemb = fread(buff, 1, sizeof buff, input);
if (nmemb < 1) break;
fwrite(buff, 1, nmemb, output);
}
Similarly, for the getc() case:
while (1) {
c = getc(input);
if (c == EOF) break;
putc(c, output);
}
Replacing fgetc() by getc() will give performance equivalent to the fread()case. (getc() will (often) be a macro, avoiding function-call overhead). [just take a look at the generated assembly.

Related

C - Print lines from file with getline()

I am trying to write a simple C program that loads a text-file, prints the first line to screen, waits for the user to press enter and then prints the next line, and so on.
As only argument it accepts a text-file that is loaded as a stream "database". I use the getline()-function for this, according to this example. It compiles fine, successfully loads the text-file, but the program never enters the while-loop and then exits.
#include <stdio.h>
#include <stdlib.h>
FILE *database = NULL; // input file
int main(int argc, char *argv[])
{
/* assuming the user obeyed syntax and gave input-file as first argument*/
char *input = argv[1];
/* Initializing input/database file */
database = fopen(input, "r");
if(database == NULL)
{
fprintf(stderr, "Something went wrong with reading the database/input file. Does it exist?\n");
exit(EXIT_FAILURE);
}
printf("INFO: database file %s loaded.\n", input);
/* Crucial part printing line after line */
char *line = NULL;
size_t len = 0;
ssize_t read;
while((read = getline(&line, &len, database)) != -1)
{
printf("INFO: Retrieved line of length %zu :\n", read);
printf("%s \n", line);
char confirm; // wait for user keystroke to proceed
scanf("%c", &confirm);
// no need to do anything with "confirm"
}
/* tidy up */
free(line);
fclose(database);
exit(EXIT_SUCCESS);
}
I tried it with fgets() -- I can also post that code --, but same thing there: it never enters the while-loop.
It might be something very obvious; I am new to programming.
I use the gcc-compiler on Kali Linux.
Change your scanf with fgetline using stdin as your file parameter.
You should step through this in a debugger, to make sure your claim that it never enters the while loop is correct.
If it truly never enters the while loop, it is necessarily because getline() has returned -1. Either the file is truly empty, or you have an error reading the file.
man getline says:
On success, getline() and getdelim() return the number of
characters
read, including the delimiter character, but not including the termi‐
nating null byte ('\0'). This value can be used to handle embedded
null bytes in the line read.
Both functions return -1 on failure to read a line (including end-of-
file condition). In the event of an error, errno is set to indicate
the cause.
Therefore, you should enhance your code to check for stream errors and deal with errno -- you should do this even when your code works, because EOF is not the only reason for the function
to return -1.
int len = getline(&line, &len, database);
if(len == -1 && ferror(database)) {
perror("Error reading database");
}
You can write more detailed code to deal with errno in more explicit ways.
Unfortunately handling this thoroughly can make your code a bit more verbose -- welcome to C!

Clearing Input Buffer in C

I have a large program, a text twist game with graphics in c. somewhere in my code i use kbhit() i did this code to clear my input buffer:
while ((c = getchar()) != '\n' && c != EOF);
This code works and wait for me to hit enter key to exit from the loop. The problem is, the loop waits me to hit enter, any other key will output in the screen (unreadable). My question, is there any other way to clear the input buffer without pressing anything, like using fflush to clear output buffer?
Assuming you're on some kind of Unix variant....
There are two things you need to clear here:
The FILE * input buffer managed by the C library.
The OS input buffer that your program has not yet read.
The first can be cleared with fflush, just like output streams, except that the data is simply discarded rather than written out.
The second requires some low-level OS I/O calls. In general you should not mix these with the FILE * I/O functions, but they should be safe between fflush and any other read/get operation.
First, you'll need to use select to see if a read operation would block. This effectively checks to see if the OS buffer is clear. If the read would not block, then you do a one-character read, and repeat the select. The key point is that you have to check if there is data to read before you read and discard it, or else it will block until there is data to read.
The code might look something like this (untested):
fflush(stdin);
int stdinin_fd = fileno(stdin);
while (1) {
fdset readset;
FD_ZERO(&readset);
FD_SET(stdin_fd, &readset);
struct timeval timeout = {0, 0};
int result = select(stdin_fd+1, &readset, NULL, NULL, timeout);
if (result == 1) {
char c;
read(stdin_fd, &c, 1);
} else if (result == 0
|| (result == -1 && errno != EINTR)) {
break;
} // else loop
}
It might be possible use a larger read-size when clearing the OS buffers, which would be more efficient, but I'm not sure about the portability of that, and anyway I'm assuming there won't be much data to clear.

Using fread() hangs until killed

This is the general structure of my code:
if (contentLength > 0)
{
// Send POST data
size_t sizeRead = 0;
char buffer[1024];
while ((sizeRead < contentLength) && (!feof(stream)))
{
size_t diff = contentLength - sizeRead;
if (diff > 1024)
diff = 1024;
// Debuging
fprintf(stderr, "sizeRead: %zu\n", sizeRead);
fprintf(stderr, "contentLength: %ul\n", contentLength);
fprintf(stderr, "diff: %zu\n", diff);
size_t read = fread(buffer, 1, diff, stream);
sizeRead += read;
exit(1);
// Write to pipe
fwrite(buffer, 1, read, cgiPipePost);
exit(1);
}
}
However, the program hangs when it hits the fread() line. If I add an exit() before that line, the program exists. If I add it after, the program hangs until I send a SIGINT signal.
Any help would be appreciated, I have been stuck on this for quite some time now.
Thanks
fread tries to fill an internal buffer. Depending on the implementation, you may be able to stop or limit it by setting the buffering mode (in particular, setting _IONBF, see setbuf, should work for all implementations). The general rule, though, is to avoid mixing counted I/O on sockets with stdio at all—to use raw read calls.
Also, while it's not biting you here, a !feof(stream) test is almost always wrong: people mean this to be predictive (EOF is about to occur), but feof is only "post-dictive": after a read operation fails (getc or fgetc returns EOF), the feof and ferror indicators allow you to discover why the previous failure occurred.

Buffering of standard I/O library

In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:
When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.
Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.
Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.
I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.
You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.
f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */
This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.
Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.
It's not clear what you're asking.
Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.
Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"
Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)
But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:
int main(int argc, char *argv[]) {
FILE *fp = fopen("foo", "r+");
fseek(fp, 0, SEEK_SET);
fwrite("foo", 1, 3, fp);
fseek(fp, 0, SEEK_SET);
fwrite("bar", 1, 3, fp);
char buf[4] = { 0 };
size_t ret = fread(buf, 1, 3, fp);
printf("%d %s\n", (int)ret, buf);
}
So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.
If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.
Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.
You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".
There are three ways you could implement something like stdio:
Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.
So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.
You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.
If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.
For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.
reason 1
find the real file position to start.
due to the buffer implementation of the stdio, the stdio stream position may differ from the OS file position. when you read 1 byte, stdio mark the file position to 1. Due to the buffering, stdio may read 4096 bytes from the underlying file, where OS would record its file position at 4096. When you switch to output, you really need to choose which position you want to use.
reason 2
find the right buffer cursor to start.
tl;dr,
if an underlying implementation only uses a single shared buffer for both read and write, you have to flush the buffer when changing IO direction.
Take this glibc used in chromium os to demo how fwrite, fseek, and fflush handle the single shared buffer.
fwrite fill buffer impl:
fill_buffer:
while (to_write > 0)
{
register size_t n = to_write;
if (n > buffer_space)
n = buffer_space;
buffer_space -= n;
written += n;
to_write -= n;
if (n < 20)
while (n-- > 0)
*stream->__bufp++ = *p++;
else
{
memcpy ((void *) stream->__bufp, (void *) p, n);
stream->__bufp += n;
p += n;
}
if (to_write == 0)
/* Done writing. */
break;
else if (buffer_space == 0)
{
/* We have filled the buffer, so flush it. */
if (fflush (stream) == EOF)
break;
from this code snippet, we can see, if buffer is full, it will flush it.
Let's take a look at fflush
int
fflush (stream)
register FILE *stream;
{
if (stream == NULL) {...}
if (!__validfp (stream) || !stream->__mode.__write)
{
__set_errno (EINVAL);
return EOF;
}
return __flshfp (stream, EOF);
}
it uses __flshfp
/* Flush the buffer for FP and also write C if FLUSH_ONLY is nonzero.
This is the function used by putc and fflush. */
int
__flshfp (fp, c)
register FILE *fp;
int c;
{
/* Make room in the buffer. */
(*fp->__room_funcs.__output) (fp, flush_only ? EOF : (unsigned char) c);
}
the __room_funcs.__output by default is using flushbuf
/* Write out the buffered data. */
wrote = (*fp->__io_funcs.__write) (fp->__cookie, fp->__buffer,
to_write);
Now we are close. What's __write? Trace the default settings aforementioned, it's __stdio_write
int
__stdio_write (cookie, buf, n)
void *cookie;
register const char *buf;
register size_t n;
{
const int fd = (int) cookie;
register size_t written = 0;
while (n > 0)
{
int count = __write (fd, buf, (int) n);
if (count > 0)
{
buf += count;
written += count;
n -= count;
}
else if (count < 0
#if defined (EINTR) && defined (EINTR_REPEAT)
&& errno != EINTR
#endif
)
/* Write error. */
return -1;
}
return (int) written;
}
__write is the system call to write(3).
As we can see, the fwrite is only using only one single buffer. If you change direction, it can still store the previous write contents. From the above example, you can call fflush to empty the buffer.
The same applies to fseek
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
it will soft flush (reset) the buffer at the end, which means read buffer will be emptied after this call.
This obeys the C99 rationale:
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.

fseek does not work in linux [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Using fseek with a file pointer that points to stdin
i have a program that use fseek to clear my input buffer, it works well in Windows, buf fails in Linux. Please help me .
#include <stdio.h>
#define NO_USE_FSEEK 0
int main(int argc, char *argv[])
{
char ch = 'a';
int i = 1;
long int fpos = -1;
while(1)
{
printf("loop : %d\n", i);
fseek(stdin, 0L, SEEK_END); /*works in Windows with MinGW, fails in Linux*/
fpos = ftell(stdin);
if (-1 == fpos)
{
perror("ftell failure:"); /*perror tells it is Illegal Seek*/
printf("\n");
}
else
{
printf("positon indicator:%ld\n", fpos);
}
scanf("%c", &ch);
printf("%d : %c\n", (int)ch, ch);
i++;
}
return 0;
}
Thanks in advance!
This is not the accepted way to "clear your input buffer" on either Windows or Linux.
On windows, using the MSVCRT version of the standard C functions, there is an extension allowing fflush(stdin) for this purpose. Note that on other systems this is undefined behavior.
Linux has a function called fpurge with the same purpose.
However, I have to ask, why do you want to clear your input buffer? If it's the usual complaint people have with scanf not reading to the end of the line, it would be better to write code to actually read and discard the rest of the line (loop with getc until reading a '\n', for example, as in pmg's answer). Clearing the input buffer will tend to skip a large amount of data when used on a redirected file or pipe rather than the normal console/tty input.
i guess fseek will not work with stdin. Because the size of stdin is not known.
Test the return value from fseek() (in fact, test the return value from all <stdio.h> input functions).
if (fseek(stdin, 0, SEEK_END) < 0) { perror("fseek"); exit(EXIT_FAILURE); }
Use the idiom
while ((ch = getchar()) != '\n' && ch != EOF) /* void */;
/* if (ch == EOF)
** call feof(stdin) or ferror(stdin) if needed; */
to ignore all characters in the input buffer up to the next ENTER (or end of file or input error).

Resources