I am currently reading the book C Programming Language by Ritchie & Kernighan. And I am pretty confused about the usage of EOF in the getchar() function.
First, I want to know why the value of EOF is -1 and why the value of getchar()!=EOF is 0. Pardon me for my question but I really don't understand. I really tried but I can't.
Then I tried to run the example on the book that can count the number of characters using the code below but it seems that I never get out of the loop even if I press enter so I am wondering when would I reach the EOF?
main(){
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
Then, I read the same problem at Problem with EOF in C. Most people advised that instead of using EOF, use the terminator \n or the null terminator '\0' which makes a lot of sense.
Does it mean that the example on the book serves another purpose?
EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.
The code isn't wrong[*], it just doesn't do what you seem to expect. It reads to the end of the input, but you seem to want to read only to the end of a line.
The value of EOF is -1 because it has to be different from any return value from getchar that is an actual character. So getchar returns any character value as an unsigned char, converted to int, which will therefore be non-negative.
If you're typing at the terminal and you want to provoke an end-of-file, use CTRL-D (unix-style systems) or CTRL-Z (Windows). Then after all the input has been read, getchar() will return EOF, and hence getchar() != EOF will be false, and the loop will terminate.
[*] well, it has undefined behavior if the input is more than LONG_MAX characters due to integer overflow, but we can probably forgive that in a simple example.
EOF is -1 because that's how it's defined. The name is provided by the standard library headers that you #include. They make it equal to -1 because it has to be something that can't be mistaken for an actual byte read by getchar(). getchar() reports the values of actual bytes using positive number (0 up to 255 inclusive), so -1 works fine for this.
The != operator means "not equal". 0 stands for false, and anything else stands for true. So what happens is, we call the getchar() function, and compare the result to -1 (EOF). If the result was not equal to EOF, then the result is true, because things that are not equal are not equal. If the result was equal to EOF, then the result is false, because things that are equal are not (not equal).
The call to getchar() returns EOF when you reach the "end of file". As far as C is concerned, the 'standard input' (the data you are giving to your program by typing in the command window) is just like a file. Of course, you can always type more, so you need an explicit way to say "I'm done". On Windows systems, this is control-Z. On Unix systems, this is control-D.
The example in the book is not "wrong". It depends on what you actually want to do. Reading until EOF means that you read everything, until the user says "I'm done", and then you can't read any more. Reading until '\n' means that you read a line of input. Reading until '\0' is a bad idea if you expect the user to type the input, because it is either hard or impossible to produce this byte with a keyboard at the command prompt :)
That's a lot of questions.
Why EOF is -1: usually -1 in POSIX system calls is returned on error, so i guess the idea is "EOF is kind of error"
any boolean operation (including !=) returns 1 in case it's TRUE, and 0 in case it's FALSE, so getchar() != EOF is 0 when it's FALSE, meaning getchar() returned EOF.
in order to emulate EOF when reading from stdin press Ctrl+D
Related
I am currently reading the book C Programming Language by Ritchie & Kernighan. And I am pretty confused about the usage of EOF in the getchar() function.
First, I want to know why the value of EOF is -1 and why the value of getchar()!=EOF is 0. Pardon me for my question but I really don't understand. I really tried but I can't.
Then I tried to run the example on the book that can count the number of characters using the code below but it seems that I never get out of the loop even if I press enter so I am wondering when would I reach the EOF?
main(){
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
Then, I read the same problem at Problem with EOF in C. Most people advised that instead of using EOF, use the terminator \n or the null terminator '\0' which makes a lot of sense.
Does it mean that the example on the book serves another purpose?
EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.
The code isn't wrong[*], it just doesn't do what you seem to expect. It reads to the end of the input, but you seem to want to read only to the end of a line.
The value of EOF is -1 because it has to be different from any return value from getchar that is an actual character. So getchar returns any character value as an unsigned char, converted to int, which will therefore be non-negative.
If you're typing at the terminal and you want to provoke an end-of-file, use CTRL-D (unix-style systems) or CTRL-Z (Windows). Then after all the input has been read, getchar() will return EOF, and hence getchar() != EOF will be false, and the loop will terminate.
[*] well, it has undefined behavior if the input is more than LONG_MAX characters due to integer overflow, but we can probably forgive that in a simple example.
EOF is -1 because that's how it's defined. The name is provided by the standard library headers that you #include. They make it equal to -1 because it has to be something that can't be mistaken for an actual byte read by getchar(). getchar() reports the values of actual bytes using positive number (0 up to 255 inclusive), so -1 works fine for this.
The != operator means "not equal". 0 stands for false, and anything else stands for true. So what happens is, we call the getchar() function, and compare the result to -1 (EOF). If the result was not equal to EOF, then the result is true, because things that are not equal are not equal. If the result was equal to EOF, then the result is false, because things that are equal are not (not equal).
The call to getchar() returns EOF when you reach the "end of file". As far as C is concerned, the 'standard input' (the data you are giving to your program by typing in the command window) is just like a file. Of course, you can always type more, so you need an explicit way to say "I'm done". On Windows systems, this is control-Z. On Unix systems, this is control-D.
The example in the book is not "wrong". It depends on what you actually want to do. Reading until EOF means that you read everything, until the user says "I'm done", and then you can't read any more. Reading until '\n' means that you read a line of input. Reading until '\0' is a bad idea if you expect the user to type the input, because it is either hard or impossible to produce this byte with a keyboard at the command prompt :)
That's a lot of questions.
Why EOF is -1: usually -1 in POSIX system calls is returned on error, so i guess the idea is "EOF is kind of error"
any boolean operation (including !=) returns 1 in case it's TRUE, and 0 in case it's FALSE, so getchar() != EOF is 0 when it's FALSE, meaning getchar() returned EOF.
in order to emulate EOF when reading from stdin press Ctrl+D
Trying to understand the behavior of my code. I'm expecting Ctrl-D to lead to the program printing the array and exiting, however it takes 3 presses, and it enters the while loop after the second press.
#include <stdio.h>
#include <stdlib.h>
void unyon(int p, int q);
int connected(int p, int q);
int main(int argc, char *argv[]) {
int c, p, q, i, size, *ptr;
scanf("%d", &size);
ptr = malloc(size * sizeof(int));
while((c = getchar()) != EOF){
scanf("%d", &p);
scanf("%d", &q);
printf("p = %d, q = %d\n", p, q);
}
for(i = 0; i < size; ++i)
printf("%d\n", *ptr + i);
free(ptr);
return 0;
}
I read the post here, but I don't quite understand it.
How to end scanf by entering only one EOF
After reading that, I'm expecting the first Ctrl-D to clear the buffer, and then I'm expecting c = getchar() to pick up the second Ctrl-D and jump out. Instead the second Ctrl-D enters the loop and prints p and q, and it takes a third Ctrl-D to drop out.
This is made more confusing by the fact that the code below drops out on the first Ctrl-D-
#include <stdio.h>
main() {
int c, nl;
nl = 0;
while((c = getchar()) != EOF)
if (c == '\n')
++nl;
printf("%d\n", nl);
}
Let's just strip the program down to the calls which do input:
scanf("%d", &size); // Statement 1
while((c = getchar()) != EOF){ // 2
scanf("%d", &p); // 3
scanf("%d", &q); // 4
}
That is definitely not the way to go; we'll get to the correct usage in a bit. For now, let's just analyze what happens. It's important to understand precisely how scanf works. The %d format code causes it to first skip over any whitespace characters, and then read characters as long as the characters can be made into a decimal integer. Eventually some character will be read which is not part of a decimal integer; most likely a newline character. Because the format string is now finished, the unused character which has just been read will be reinserted into the stream.
So when the call to getchar is made, getchar will read and return the newline character which terminated the integer. Inside the loop, there are then two calls to scanf("%d"), each of which will behave as indicated above: skip whitespace if any, read a decimal integer, and reinsert the unused character back into the input stream.
Now, let's suppose that you run the program, and enter the number 42 followed by the enter key, and then Ctrl-D to close the input stream.
The 42 will be read by statement 1, and (as mentioned above) the newline will be read by statement 2. So when statement 3 is executed, there is no more data to be read. Because end-of-file is signaled before any digit is read, scanf will return EOF. However, the code does not test the return value of scanf; it goes on to statement 4.
What should happen at this point is that the scanf in statement 4 should immediately return EOF without attempting to read more input. That's what the C standard says should happen, and it is what Posix says should happen. Once end-of-file has been signaled on a stream, any input request should immediately return EOF until the end-of-file indicator is manually cleared. (See below for standards quotes.)
But glibc, for reasons we won't go into just yet, does not conform to the standard. It attempts another read. And so the user must enter another Ctrl-D, which will cause the scanf at statement 4 to return EOF. Again, the code does not check the return code, so it continues with the while loop and calls getchar again at statement 2. Because of the same bug, getchar does not immediately return EOF, but instead attempts to read a character from the terminal. So the user must now type a third Ctrl-D to cause getchar to return EOF. Finally, the code checks a return code, and the while loop terminates.
So that is the explanation of what is happening. Now, it is easy to see at least one mistake in the code: the return value of scanf is never checked. Not only does this mean that EOF is missed, it also means that input errors are ignored. (scanf would have returned 0 if the input could not be parsed as an integer.) That's serious, because if scanf cannot succesfully match the format code, the value of the corresponding argument is undefined and must not be used.
In short: Always check return values from *scanf. (And other I/O library functions.)
But there is a more subtle mistake as well, which makes little difference in this case but could, in general, be serious. The character read by getchar in statement 2 is simply discarded, regardless of what it was. Normally it will be whitespace, so it doesn't matter that it is discarded, but you don't actually know that because the character is discarded. Maybe it was a comma. Maybe it was a letter. Maybe it matters what it was.
It is bad style to rely on the assumption that whatever character is read by the getchar at statement 2 is unimportant. If you really need to peek at the next character, you should reinsert it into the input stream, just as scanf does:
while ((c = getchar()) != EOF) {
ungetc(c, stdin); /* Put c back into the input stream */
...
}
But actually, that test is not what you want at all. As we have already seen, it is extremely unlikely that getchar will return EOF at this point. (It's possible, but it's very unlikely). Much more more probable is that getchar will read a newline character, even though the next scanf will encounter the end-of-file. So there was absolutely no point peeking at the next character; the correct solution is to check the return code of scanf, as indicated above.
Putting that together, what you really want here is something more like:
/* No reason to use two scanf calls to read two consecutive numbers */
while ((count = scanf("%d%d", &p, &q)) == 2) {
/* Do something with p and q */
}
if (count != EOF) {
/* Invalid format. Issue an error message, at least */
}
/* Do whatever needs to be done at the end of input. */
Finally, let's examine glibc's behaviour. There is a very long-standing bug report linked to by an answer to the question cited in the OP. If you take the trouble to read through to the most recent post in the bugzilla thread, you'll find a link to a discussion on the glibc developer mailing list.
Let me give the TL;DR version, and save you the trouble of digital archaeology. Since C99, the standard has been clear that EOF is "sticky". §7.21.3/11 states that all input is performed as though successive bytes were read by fgetc:
...The byte input functions read characters from the stream as if by successive calls to the fgetc function.
And §7.21.7.1/3 states that fgetc returns EOF immediately if the stream's end-of-file indicator is set:
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
So once the end-of-file indicator is set, because either end of file was detected or some read error occurred, subsequent input operations must immediately return EOF without attempting to read from the stream. Various things can clear the end-of-file indicator, including clearerr, seek, and ungetc; once the end-of-file indicator has been cleared, the next input function call will again attempt to read from the stream.
However, it wasn't always like that. Before C99, the result of reading from a stream which had already returned EOF was unspecified. And different standard libraries chose to handle it in different ways.
So a decision was made to not change glibc to conform to the (then) new standard, but rather to maintain compatibility with certain other C libraries, notably Solaris. (A comment in the glibc source is quoted in the bug report.)
Although there is a compelling argument (at least, compelling to me) that fixing the bug is not likely to break anything important, there is still a certain reluctance to do anything about it. And so, here we are, ten years later, with a still-open bug report, and a non-conforming implementation.
If you run it through the debugger you will get a clearer picture. Here is the sequence of events.
scanf("%d", &size); is called.
A number is input followed by ENTER. The key here is that scanf does not consume the \n that results from the ENTER.
getchar is called. This consumes the \n.
scanf("%d", &p); is called. This consumes the first ctrl-D. If the return value were checked then it would be apparent that an error occured.
scanf("%d", &q); is called. This consumes the second ctrl-D.
Loop goes back to the top and calls getchar. The third ctrl-D then causes EOF to be returned by getchar and hence the loop breaks out at that point.
I'll leave it as an exercise for you to explain why the second program functions as expected.
There are different things messing here.
First of all, when you type Ctrl-D to the input terminal, the tty driver is processing your input, adding each character in a buffer and processing special characters. One of these special characters (Ctrl-D) means take up to the last char and make them all available to the system. This makes two things to happen: first, the Ctrl-D character is eliminated from the data stream and; second, all the characters typed up so far are made available to be read(2) by the process syscall. getchar() is a buffered library call that avoids making one read per character, allowing to store previously read characters in the buffer.
Other thing messing here is the way the system signals the end of file in posix systems (and all unix systems). When you make a read(2) system call, the return value is the actual number of characters read (or -1 in case of failure, but this has nothing to do with EOF, as will be explained soon). And the system marks the end of file condition by returning 0 characters. So, the operating system marks the end of file making read(2) return 0 bytes as a result (if you only hit the return key, that will make a \n to appear in the data stream).
The third thing messing up here is the type of return value from getchar(3) function. It doesn't return a char value. As all possible byte values are posible to be returned for getchar(3), there's no possibility to reserve a special value for signalling a EOF. The solution adopted a long, long, time ago (when getchar(3) was designed, that is in the first version of the C language, (see The C programming language by Brian Kernighan and Denis Ritchie, first ed.) was to use an int as return value to be able to return all the possible byte values (0..255) plus one extra value, called EOF. The exact value of EOF is implementation dependant, but normally defined as -1 (I think even the standard specifies now it must be defined as -1, but not sure)
So, making all things work together, EOF is an int constant defined to allow programers to write while ((c = getchar()) != EOF). You will never get -1 as a data value from the terminal. The system always marks the end of file condition by making read(2) to return 0. And the terminal driver on receiving Ctrl-D just eliminates it from the stream and makes data up to, but not including (as different from Ctrl-J or Ctrl-M, line feed and carry return, respectivelly, that are also interpreted and are input as \n in the data stream)
So, next the question is: Why there are needed normally two (or more) Ctrl-D chars to signal eof?
Right, as I've said, one only makes all thata up to the Ctrl-D (but not including it) available to the kernel, so the result from read(2) can be a number different than 0 for the first time. But what is sure is that if you enter the Ctrl-D char twice in sequence, after the first there were not be more chars in between the two chars, assuring a read() of zero chars. Normally, programs are in a loop, doing multiple reads
while ((n_read = read(fd, buffer, sizeof buffer)) > 0) {
/* NORMAL INPUT PROCESSING GOES HERE, for up to n_read bytes
* stored in buffer */
} /* while */
if (n_read < 0) {
/* ERROR PROCESSING GOES HERE */
} else {
/* EOF PROCESSING GOES HERE */
} /* if */
In the case of files, the behaviour is different, as Ctrl-D is not interpreted by any driver (it's stored in the disk file) so you'll get Ctrl-D as a normal character (it's value is \004)
When you read a file, normally this deals to reading a lot of complete buffers, then make a partial read (with less than the buffer size bytes input) and a final read of zero bytes, signalling that the file has ended.
Note
Depending on the configuration of the tty driver in some unices, the eof character can be changed and have different mean. Also happens to the return character and linefeed character. Se termios(3) manual page for a detailed documentation on this.
I am currently reading the book C Programming Language by Ritchie & Kernighan. And I am pretty confused about the usage of EOF in the getchar() function.
First, I want to know why the value of EOF is -1 and why the value of getchar()!=EOF is 0. Pardon me for my question but I really don't understand. I really tried but I can't.
Then I tried to run the example on the book that can count the number of characters using the code below but it seems that I never get out of the loop even if I press enter so I am wondering when would I reach the EOF?
main(){
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
Then, I read the same problem at Problem with EOF in C. Most people advised that instead of using EOF, use the terminator \n or the null terminator '\0' which makes a lot of sense.
Does it mean that the example on the book serves another purpose?
EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.
The code isn't wrong[*], it just doesn't do what you seem to expect. It reads to the end of the input, but you seem to want to read only to the end of a line.
The value of EOF is -1 because it has to be different from any return value from getchar that is an actual character. So getchar returns any character value as an unsigned char, converted to int, which will therefore be non-negative.
If you're typing at the terminal and you want to provoke an end-of-file, use CTRL-D (unix-style systems) or CTRL-Z (Windows). Then after all the input has been read, getchar() will return EOF, and hence getchar() != EOF will be false, and the loop will terminate.
[*] well, it has undefined behavior if the input is more than LONG_MAX characters due to integer overflow, but we can probably forgive that in a simple example.
EOF is -1 because that's how it's defined. The name is provided by the standard library headers that you #include. They make it equal to -1 because it has to be something that can't be mistaken for an actual byte read by getchar(). getchar() reports the values of actual bytes using positive number (0 up to 255 inclusive), so -1 works fine for this.
The != operator means "not equal". 0 stands for false, and anything else stands for true. So what happens is, we call the getchar() function, and compare the result to -1 (EOF). If the result was not equal to EOF, then the result is true, because things that are not equal are not equal. If the result was equal to EOF, then the result is false, because things that are equal are not (not equal).
The call to getchar() returns EOF when you reach the "end of file". As far as C is concerned, the 'standard input' (the data you are giving to your program by typing in the command window) is just like a file. Of course, you can always type more, so you need an explicit way to say "I'm done". On Windows systems, this is control-Z. On Unix systems, this is control-D.
The example in the book is not "wrong". It depends on what you actually want to do. Reading until EOF means that you read everything, until the user says "I'm done", and then you can't read any more. Reading until '\n' means that you read a line of input. Reading until '\0' is a bad idea if you expect the user to type the input, because it is either hard or impossible to produce this byte with a keyboard at the command prompt :)
That's a lot of questions.
Why EOF is -1: usually -1 in POSIX system calls is returned on error, so i guess the idea is "EOF is kind of error"
any boolean operation (including !=) returns 1 in case it's TRUE, and 0 in case it's FALSE, so getchar() != EOF is 0 when it's FALSE, meaning getchar() returned EOF.
in order to emulate EOF when reading from stdin press Ctrl+D
char **query;
query = (char**) malloc ( sizeof(char*) );
int f=0;
int i=0,j=0,c;
while((c=getchar())!=EOF)
{
if(!isalpha(c))
continue;
if(f==1)
query=(char**) realloc(query,(i+1)*sizeof(char*));
query[i]=(char*) malloc(sizeof(char));
query[i][j]=c;
j++;
while( (c=getchar())!=EOF&&c!=' '&&c!='\t' )
{
query[i]=(char*) realloc(query[i],(j+1)*sizeof(char));
query[i][j]=c;
++j;
}
query[i][j]='\0';
printf("%s\n",query[i]);
if(c==EOF){
break;
}
++i;
f=1;
j=0;
}
I want the above code snippet to read a line of strings separated by spaces and tabs until ONE EOF but it requires 2 EOFs to end the loop. Also, strings can consist of only alphabetic characters.
I am struggling on about 2 days.
Please, give some feedback.
EDIT: Most probably the reason is I hit CTRL+D keys after I write last string not the enter key, but now I hit enter and then CTRL+D, it works as expected.
But, how can I change it to finish after I hit CTRL+D once following the last string?
On Unix-like systems (at least by default), an end-of-file condition is triggered by typing Ctrl-D at the beginning of a line or by typing Ctrl-D twice if you're not at the beginning of a line.
In the latter case, the last line you read will not have a '\n' at the end of it; you may need to allow for that.
This is specified (rather indirectly) by POSIX / The Open Group Base Specifications Issue 7, in section 11, specifically 11.1.9:
EOF
Special character on input, which is recognized if the ICANON flag is
set. When received, all the bytes waiting to be read are immediately
passed to the process without waiting for a <newline>, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication. If
ICANON is set, the EOF character shall be discarded when processed.
The POSIX read() function indicates an end-of-file (or error) condition to its caller by returning a byte count of zero, indicating that there are no more bytes of data to read. (C's <stdio> is, on POSIX systems, built on top of read() and other POSIX-specific functions.)
EOF (not to be confused with the C EOF macro) is mapped by default to Ctrl-D. Typing the EOF character at the beginning of a line (either at the very beginning of the input or immediately after a newline) triggers an immediate end-of-file condition. Typing the EOF character other than at the beginning of a line causes the previous data on that line to be returned immediately by the next read() call that asks for enough bytes; typing the EOF character again does the same thing, but in that case there are no remaining bytes to be read and an end-of-file condition is triggered. A single EOF character in the middle of a line is discarded (if ICANON is set, which it normally is).
In the off chance that someone sees this that needs the help I've been needing... I had been searching, trying to figure out why I was getting this strange behavior with my while(scanf). Well, it turns out that I had while (scanf("%s\n", string) > 0). The editor I am using (Atom) automatically put a "\n" in my scan, with out me noticing. This took me hours, and luckily someone pointed it out to me.
The Return key doesn't produce EOF that's why the condition getchar() != EOF doesn't recognize it. You can do it by Pressing CTRL+Z in Windows or CTRL+D in Unix.
#include <stdio.h>
/* copy input to output; 2nd version*/
main()
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
this is very confusing... since you never escape the while loop. I've learned that EOF is -1. i type -1 but it just reprints it. It's a never ending loop. Over time did the library get changed and differs from what the book intended it to be? when i say library i mean the putchar()/getchar() that's in the library... sorry.
How EOF(end of file ) works?
EOF is simply a macro designating a negative value. It doesn't have to be -1. You can signal end-of-file from the command line with a control character, usually a combination of ctrl+z for Windows systems or ctrl+d for POSIX systems.
when you write '-1', you do not write the VALUE -1, but the character string { '-', '1' }
take a look here.
EOF is not a character
This is important. Remember it.
When you try to get a character, several things may happen. The usual is to effectively read a character: your program receives that character and goes on its way ...
int ch = getchar();
but, what happens is the disk crashes? or there's a bad sector just at that point? or the network goes down? or there is no more data?
Well, then there must be a way to differentiate these conditions from plain characters. The way chosen by C is to return a value that can NEVER be interpreted as a real character. That value is EOF, which is a negative value (all real characters are returned with their value converted to unsigned char even if your implementation uses signed char for char). That is also why it is very important to use int for characters (not char as would be expected).
So EOF means that it was impossible to read a character. You can try to figure out why (network down? bad sector? end of file? ...) or simply assume it was end-of-file and go from there.
So, in
while ((c = getchar()) != EOF)
there are two things: an assignment and a comparison
c = getchar()
will put a value of 0 or more in c for real characters or EOF if there's an error, therefore
while ((c = getchar()) != EOF)
means "while there is no error getting characters into variable c"
Important thing to remember:
EOF is not a character. Define characters with int.
After entering text, hit the Enter key for a new line. Then hold down Ctrl and press the letter D.
I'm working on the same book. I think you should be able to just hit Ctrl+D to exit, but I had to hit Enter first (I think there is something wrong with my shell).
That's in the chapter text, by the way.