Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I am trying to read input from stdin with fread(). However i am have a problem, the loop will not terminate and instead keeps reading.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "argument err");
return -1;
}
FILE *in = fopen(argv[1], "w");
if (in == NULL) {
fprintf(stderr, "failed to open file");
return -1;
}
char buffer[20];
size_t ret;
while ((ret = fread(buffer, 1, 20, stdin)) > 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
return 0;
}
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
How can i make this loop terminate when EOF is reached?
When do you think EOF is reached? Really. When you are providing input interactively, how is the system or the program supposed to know that you've entered all the data you want the program to consume?
i have tried using ctrl+D but that just seems like a strange way to stop taking input.
It is exactly the way to signal a soft EOF to a POSIX terminal. Since you want the loop to stop when EOF is encountered, it seems absolutely natural to me to use ctrl+D for the purpose when providing data interactively. That's not the only way you could signal the end of the input, but it has a lot going for it.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
Again: how is the program supposed to know when it has consumed all the "multiple arbitrary amounts" of data that you decide to provide on a given run? An EOF signal is an eminently reasonable choice for multiple reasons, and the way to deliver that from a POSIX terminal interface is ctrl+D.
As pointed out before you are reading from an eternal stream, this means that stdin don't naturally have a EOF (or <=0) value.
If you want your loop to terminate, you will have to add a termination condition, like a certain character, word or all type of value. After that you could use a break or a return in some case. You could also search if your terminal emulator support the insertion of an EOF value into the stdin, which is pretty common (But very platform dependent).
ADD: On my system, typical linux, CTRL+D is for an EOF insertion in stdin. It seems that you found this out yourself, and if you want your program to know where to stop you will need to use this.
You cand also send a signal to your program, usually done with a shortcut like CTRL+D, CTRL+C, CTRL+T etc... there is all sort of signal, which can be sent by your system or/and your TE and you just have to implement in your program the corresponding signal receiver.
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
fread and fwrite are there to read data records, so they (both) take the number of records to read and the size of the record. If the available data doesn't fit on a full record, you will not get the full record at all (indeed, the routines return the number of full records read, and the partial read will be waiting for the next fread() call.)
All the calls in stdio.h package are buffered, so the buffer holds the data that has been read (from the system) but not yet consumed by the user, and so, this makes me to wonder why are you trying to use a buffer to read data that is already buffered?
EOF is produced when you are trying to read one record and the fread() call results in a true end of file from the system (this normally requires two calls, the first to complete the remaining data, the second resulting in no data ---zero bytes--- returned from the system) So you have to distinguish two cases:
fread() returns 0 in case it has read something, but is not enough to complete a record.
fread() returns EOF in case it has read nothing (the true end of file is reached)
As I've said above, fread() & fwrite() will read/write full records (this is useful when your data is a struct with a fixed length, but normally not when you can have extra data at the end)
The way to terminate the loop should be something like this:
while ((ret = fread(buffer, 1, 20, stdin)) >= 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
/* here you can have upto 19 bytes in the buffer that cannot
* be read with that record length, but you can read individually
* with fgetc() calls. */
so, if you read half a record (at end of file) only at the next fread() it will detect the end of file (by reading nothing) and you will be free of ending. (beware that the extra data that doesn't fill a full buffer, still needs to be read by other means)
The cheapest and easiest way to solve this problem (to copy a file from one descriptor to another) is described in K&R (in the first edition) and has not yet have better code to void it, is this:
int c;
while ((c = fgetc(in)) != EOF)
fputc(c, out);
while it seems to read the characters one by one, it actually makes a call to read(2) to completely fill a full buffer of data, and return just one character, next characters will be taken from the buffer, saving calls to read(), and the same happens to fputc() (it fills the buffer until it's full, then flushes it, in a single call to write()).
Many people has tried to defeat the code above, without any measurable gain in efficience. So, my hint is be simple, that the world is complicated enough to force you to go complex.
Related
I want to read all buffers from a pipe except for the last one. This is my current code:
while(read(server_to_client,serverString2,sizeof(serverString2))){
printf("Client : PID %d",getpid());
printf("-Target>>%s<<", clientString2);
printf(serverString2);
}
The problem with that is it reads everything from the buffer. How can I avoid reading the last buffer?
You can't. The question does not even make sense.
The question supposes that a "buffer" is a meaningful unit of measure for your data, but it is not. In particular, the third argument to read(2) is a maximum number of bytes to read, but the call may actually transfer fewer bytes for a large number reasons, with reaching the end of the data being only one. Other reasons are in fact a lot more likely to manifest when the file descriptor being read is connected to a pipe, as you say yours is, than when it is connected to a file. Note that this means you must always capture read()'s return value if you intend to examine the data it reads, for otherwise you cannot know how much of the buffer contains valid data.
More generally, you cannot tell from an open file descriptor for a pipe how much data is available to be read from it. You need to include that information in your protocol (for example, HTTP's Content-Length header), or somehow communicate it out-of-band. That still doesn't tell you how much data is available to be read right now, but it can help you determine when to stop trying to read more.
Edited to add:
If you ask because you want to avoid dealing with partially-filled buffers, then you are flat out of luck. At minimum you need to be prepared for a partially-filled buffer when the data are prematurely truncated. Unless the total size of the data to be transferred is certain to be a multiple of the chosen buffer size, you will also have to be prepared to deal with a partial buffer at the end of your data. You can, however, avoid dealing with partial buffers in the middle of your data by repeatedly read()ing until you fill the buffer, perhaps via a wrapper function such as this:
ssize_t read_fully(int fd, void *buf, size_t count) {
char *byte_buf = buf;
ssize_t bytes_remaining = count;
while (1) {
ssize_t nread = read(fd, byte_buf, bytes_remaining);
if ((nread <= 0) || ((bytes_remaining -= nread) <= 0)) {
break;
}
byte_buf += nread;
bytes_remaining -= nread;
}
return count - bytes_remaining;
}
Alternatively, you can approach the problem altogether differently. Instead of trying to avoid reading certain data, you may be able to read it but avoid processing it. Whether that could be sensible depends on the nature of your program.
Do you really need to avoid reading the last buffer? Or just avoid doing anything with it? Perhaps a different form of loop? Perhaps a check for eof() after reading each buffer?
while(read(server_to_client,serverString2,sizeof(serverString2)))
{
if (! eof(server_to_client))
{
printf("Client : PID %d",getpid());
printf("-Target>>%s<<", clientString2);
printf(serverString2);
}
else
{
// do special stuff for the last buffer here
}
}
The mod_rewrite documentation states that it is a strict requirement to disable in(out)put buffering in a rewrite program.
Keeping that in mind I've written a simple program (I do know that it lacks the EOF check but this is not an issue and it saves one condition check per loop):
#include <stdio.h>
#include <stdlib.h>
int main ( void )
{
setvbuf(stdin,NULL,_IONBF,0);
setvbuf(stdout,NULL,_IONBF,0);
int character;
while ( 42 )
{
character = getchar();
if ( character == '-' )
{
character = '_';
}
putchar(character);
}
return 0;
}
After making some measurements I was shocked - it was over 9,000 times slower than the demo Perl script provided by the documentation:
#!/usr/bin/perl
$| = 1; # Turn off I/O buffering
while (<STDIN>) {
s/-/_/g; # Replace dashes with underscores
print $_;
}
Now I have two related questions:
Question 1. I believe that the streams may be line buffered since Apache sends a new line after each path. Am I correct? Switching my program to
setvbuf(stdin,NULL,_IOLBF,4200);
setvbuf(stdout,NULL,_IOLBF,4200);
makes it twice as fast as Perl one. This should not hit Apache's performance, should it?
Question 2. How can one write a program in C which will use unbuffered streams (like Perl one) and will perform as fast as Perl one?
Question 1: You would have to look at the code. It could be line buffered, it could be using fflush at the end of each request (or block of requests), or it could be using write calls with a larger buffer. In any case, it won't be doing per-character I/O which is what your program is doing.
Question 2: I suspect the main issue is on output. If you were to assemble the entire result in a buffer and write that out as one call, then you would be faster. However, that just means you are doing the line buffering instead of having the library take care of it for you. The key is that with no buffering, each output call results in a system call - that is very high overhead. In theory, the same concept holds true on input but I'm not sure the implementation wouldn't notice the available characters and buffer them in any case. Same workaround though - read a larger buffer and then take it apart yourself.
Personally, I'd avoid all the setvbuf stuff and just do an fflush at the end of each request.
When writing to a terminal, stdout is flushed after every line. This way you can always see the output right away. When writing to a file or, as in your case a pipe, this automatic flush is disabled. Usually in those cases performance is more important.
This causes problems when processes have to interact with each other. One program writes something. It's not sent instantly but stored in a buffer. Second program waits for that data. First program waits for more data from second program resulting in a deadlock.
To avoid this, you need to flush all the output before waiting for additional input. Simple fflusuh(stdout) before every read operation should be enough. This is actually what $|=1 does in Perl. Nothing needs to be done with stdin.
If performance is critical and you need to operate only on single bytes. Read and write data in big chunks using unbuffered read/write. For example:
#include <unistd.h>
int main() {
char buf[1024];
while(1) {
int len = read(0,buf,sizeof(buf));
for(int i=0;i<len;i++) {
if ( buf[i] == '-' ) {
buf[i] = '_';
}
}
write(1,buf,len);
}
}
I've included an example program using getchar() below, for reference (not that anyone probably needs it), and feel free to address concerns with it if you desire. But my question is:
What exactly is going on when the program calls getchar()?
Here is my understanding (please clarify or correct me):
When getchar is called, it checks the STDIN buffer to see if there is any input.
If there isn't any input, getchar sleeps.
Upon wake, getchar checks to see if there is any input, and if not, puts it self to sleep again.
Steps 2 and 3 repeat until there is input.
Once there is input (which by convention includes an 'EOF' at the end), getchar returns the first character of this input and does something to indicate that the next call to getchar should return the second letter from the same buffer? I'm not really sure what that is.
When there are no more characters left other than EOF, does getchar flush the buffer?
The terms I used are probably not quite correct.
#include <stdio.h>
int getLine(char buffer[], int maxChars);
#define MAX_LINE_LENGTH 80
int main(void){
char line[MAX_LINE_LENGTH];
int errorCode;
errorCode = getLine(line, sizeof(line));
if(errorCode == 1)
printf("Input exceeded maximum line length of %d characters.\n", MAX_LINE_LENGTH);
printf("%s\n", line);
return 0;
}
int getLine(char buffer[], int maxChars){
int c, i = 0;
while((c = getchar()) != EOF && c != '\n' && i < maxChars - 1)
buffer[i++] = c;
buffer[i++] = '\0';
if(i == maxChars)
return 1;
else
return 0;
}
Step 2-4 are slightly off.
If there is no input in the standard I/O buffer, getchar() calls a function to reload the buffer. On a Unix-like system, that normally ends up calling the read() system call, and the read() system call puts the process to sleep until there is input to be processed, or the kernel knows there will be no input to be processed (EOF). When the read returns, the code adjusts the data structures so that getchar() knows how much data is available. You description implies polling; the standard I/O system does not poll for input.
Step 5 uses the adjusted pointers to return the correct values.
There really isn't an EOF character; it is a state, not a character. Even though you type Control-D or Control-Z to indicate 'EOF', that character is not inserted into the input stream. In fact, those characters cause the system to flush any typed characters that are still waiting for 'line editing' operations (like backspace) to change them so that they are made available to the read() system call. If there are no such characters, then read() returns 0 as the number of available characters, which means EOF. Then getchar() returns the value EOF (usually -1 but guaranteed to be negative whereas valid characters are guaranteed to be non-negative (zero or positive)).
So basically, rather than polling, is it that hitting Return causes a certain I/O interrupt, and then when the OS receives this, it wakes up any processes that are sleeping for I/O?
Yes, hitting Return triggers interrupts and the OS kernel processes them and wakes up processes that are waiting for the data. The terminal driver is woken by the kernel when interrupt occurs, and decides what to do with the character(s) that were just received. They may be stashed for further processing (canonical mode) or made available immediately (raw mode), etc. Assuming, of course, that the input is a terminal; if the input is from a disk file, it is simpler in many ways — or if it is a pipe, or …
Nominally, it isn't the terminal app that gets woken by the interrupt; it is the kernel that wakes first, then the shell running in the terminal app that is woken because there's data for it to read, and only when there's output does the terminal app get woken.
I say 'nominally' because there's an outside chance that in fact the terminal app does mediate the I/O via a pty (pseudo-tty), but I think it happens at the kernel level and the terminal application is involved fairly late in the process. There's a huge disconnect really between the keyboard where you type and the display where what you type appears.
See also Canonical vs non-canonical terminal input.
Is it alright for multiple processes to access (write) to the same file at the same time? Using the following code, it seems to work, but I have my doubts.
Use case in the instance is an executable that gets called every time an email is received and logs it's output to a central file.
if (freopen(console_logfile, "a+", stdout) == NULL || freopen(error_logfile, "a+", stderr) == NULL) {
perror("freopen");
}
printf("Hello World!");
This is running on CentOS and compiled as C.
Using the C standard IO facility introduces a new layer of complexity; the file is modified solely via write(2)-family of system calls (or memory mappings, but that's not used in this case) -- the C standard IO wrappers may postpone writing to the file for a while and may not submit complete requests in one system call.
The write(2) call itself should behave well:
[...] If the file was
open(2)ed with O_APPEND, the file offset is first set to the
end of the file before writing. The adjustment of the file
offset and the write operation are performed as an atomic
step.
POSIX requires that a read(2) which can be proved to occur
after a write() has returned returns the new data. Note that
not all file systems are POSIX conforming.
Thus your underlying write(2) calls will behave properly.
For the higher-level C standard IO streams, you'll also need to take care of the buffering. The setvbuf(3) function can be used to request unbuffered output, line-buffered output, or block-buffered output. The default behavior changes from stream to stream -- if standard output and standard error are writing to the terminal, then they are line-buffered and unbuffered by default. Otherwise, block-buffering is the default.
You might wish to manually select line-buffered if your data is naturally line-oriented, to prevent interleaved data. If your data is not line-oriented, you might wish to use un-buffered or leave it block-buffered but manually flush the data whenever you've accumulated a single "unit" of output.
If you are writing more than BUFSIZ bytes at a time, your writes might become interleaved. The setvbuf(3) function can help prevent the interleaving.
It might be premature to talk about performance, but line-buffering is going to be slower than block buffering. If you're logging near the speed of the disk, you might wish to take another approach entirely to ensure your writes aren't interleaved.
This answer was incorrect. It does work:
So the race condition would be:
process 1 opens it for append, then
later process 2 opens it for append, then
later still 1 writes and closes, then
finally 2 writes and closes.
I'd be impressed if that 'worked' because it isn't clear to me what
working should mean. I assume 'working' means all of the bytes written
by the two processes are inthe log file? I'd expect that they both
write starting at the same byte offset, so one will replace the others
bytes. It will all be okay upto and including step 3. and only show up
as a problem at step 4, Seems like an easy test to write: open getchar
... write close.
Is it critical that they can have the file open simultaneously? A
more obvious solution if the write is quick, is to open exclusive.
For a quick check on your system, try:
/* write the first command line argument to a file called foo
* stackoverflow topic 9880935
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main (int argc, const char * argv[]) {
if (argc <2) {
fprintf(stderr, "Error: need some text to write to the file Foo\n");
exit(1);
}
FILE* fp = freopen("foo", "a+", stdout);
if (fp == NULL) {
perror("Error failed to open file\n");
exit(1);
}
fprintf(stderr, "Press a key to continue\n");
(void) getchar(); /* Yes, I really mean to ignore the character */
if (printf("%s\n", argv[1]) < 0) {
perror("Error failed to write to file: ");
exit(1);
}
fclose(fp);
return 0;
}
I'm using select() call to detect input presence in the main cycle of my program. This makes me use raw file descriptor (0) instead of stdin.
While working in this mode I've noticed that my software occasionally loses a chunk of input at the beginning. I suspect that stdin consumes some of it on the program start. Is there a way to prevent this behavior of stdin or otherwise get the whole input data?
The effect described can be reproduced only with some data on standard input at the very moment of program start. My executable should be used as xinetd service in a way that it always has some input on the start.
Standard input is read in the following way:
Error processInput() {
struct timeval ktimeout;
int fd=fileno(stdin);
int maxFd=fd+1;
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
ktimeout.tv_sec = 0;
ktimeout.tv_usec = 1;
int selectRv=-1;
while ((selectRv=select(maxFd, &fdset, NULL, NULL, &ktimeout)) > 0) {
int left=MAX_BUFFER_SIZE-position-1;
assert(left>0);
int bytesCount=read(fd, buffer+position, left);
//Input processing goes here
}
}
Don't mix cooked and raw meat together. Try replacing the read() call with the equivalent fread() call.
It is very likely that fileno(stdin) is initializing the stdin object, causing it to read and buffer some input. Or perhaps you are already calling something that causes it to initialize (scanf(), getchar(), etc...).