I am reading K & R C language book, following code fragment:
char c;
while ((c = getchar()) != EOF) ...
It was mentioned that for EOF (i think it is -1) is an "out of band" return value from getchar, distinct from all possible
values that getchar can return.
My questions are following:
I ran my program with char and it ran successfully, and my
understanding is signed char can store -127 to +127 so it can check
for -1 how it is "out of band" ?
Can any one provide simple example where above program fragment will fail if we use char c instead of int c?
Thanks!
You have a small mistake, getchar returns an int, not a char:
int c;
while ((c = getchar()) != EOF) ...
The valid values for ascii chars are from 0 to 127, the EOF is some other (int) value.
If you keep using char, you might get into troubles (as I got into)
Well, your question is answered in the C FAQ.
Two failure modes are possible if, as in the fragment above, getchar's
return value is assigned to a char.
If type char is signed, and if EOF is defined (as is usual) as -1,
the character with the decimal value 255 ('\377' or '\xff' in C) will
be sign-extended and will compare equal to EOF, prematurely
terminating the input.
If type char is unsigned, an actual EOF value will be truncated (by
having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.
Whatever value EOF has depends on your platform. Take a look at stdio.h too see its actual definition.
Related
As written in book-
The problem is distinguishing the end of the input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF,for "end of file." We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.
main()
{
int c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
I am not able to understand the actual reason of using int instead of char. What will be returned by EOF such that cannot be stored in char.
A char can hold 256 different values (0 to 255). If EOF was a char, the value of EOF would therefore be some value between 0 and 255, which would imply that there would be one character that you cannot read. Therefore the value of EOF cannot be between 0 and 255, which implies that it cannot fit into a char, which implies that its type must be larger than char, for example an int.
In other words EOF is not a char and we don't want to store it in a char. It's only purpose is to enable a program to detect that one char beyond the end of the file has been attempted to read.
Or still in other words: let's suppose EOF is defined as 255 and therefore fit's into a char. Now let's suppose getchar returns the value 255 (that is EOF). Now what does that value represent? Is it an EOF or is it the character 255?
I'm learning the C programming on a raspberry pi, however I found that my program never catches the EOF successfully. I use char c=0; printf("%d",c-1); to test the char type, finding that the char type ranges from 0 to 255, as an unsigned short. but the EOF defined in stdio.h is (-1). So is the wrong cc package installed on my Pi? how can I fix it? If I changed the EOF value in stdio.h manually, will there be further problems?
what worries me is that ,when I learning from the K&R book, there are examples which use code like while ((c=getchar())!=EOF), I followed that on my Ubuntu machine and it works fine. I just wonder if such kind of syntax is abandoned by modern C practice or there is something conflict in my Raspberry Pi?
here is my code:
#include <stdio.h>
int main( void )
{
char c;
int i=0;
while ((c=getchar())!=EOF&&i<50) {
putchar(c);
i++;
}
if (c==EOF)
printf("\nEOF got.\n");
while ((c=getchar())!=EOF&&i<500) {
printf("%d",c);
i++;
}
}
even when I redirect the input to an file, it keeps printing 255 on the screen, never terminate this program.
Finally I found that I'm wrong,In the K&R book, it defined c as an int, not a char. Problem solved.
You need to store the character read by fgetc(), getchar(), etc. in an int so you can catch the EOF. This is well-known and has always been the case everywhere. EOF must be distinguishable from all proper characters, so it was decided that functions like fgetc() return valid characters as non-negative values (even if char is signed). An end-of-file condition is signalled by -1, which is negative and thus cannot collide with any valid character fgetc() could return.
Do not edit the system headers and especially do not change the value of constants defined there. If you do that, you break these headers. Notice that even if you change the value of EOF in the headers, this won't change the value functions like fgetc() return on end-of-file or error, it just makes EOF have the wrong value.
Why is EOF defined to be −1 when −1 cannot be represented in a char?
Because EOF isn't a character but a state.
If I changed the EOF value in stdio.h manually, will there be further
problems?
Absolutely, since you would be effectively breaking the header entirely. A header is not an actual function, just a set of prototypes and declarations for functions that are defined elsewhere ABSOLUTELY DO NOT change system headers, you will never succeed in doing anything but breaking your code, project and/or worse things.
On the subject of EOF: EOF is not a character, and thus cannot be represented in a character variable. To get around this, most programmers simple use an int value (by default signed) that can interpret the -1 from EOF. The reason that EOF can never be a character is because otherwise there would be one character indistinguishable from the end of file indicator.
int versus char.
fgetc() returns an int, not char. The values returned are in the range of unsigned char and EOF. This is typically 257 different values. So saving the result in char, signed char, unsigned char will lose some distinguishably.
Instead save the fgetc() return value in an int. After testing for an EOF result, the value can be saved as a char if needed.
// char c;
int c;
...
while ((c=getchar())!=EOF&&i<50) {
char ch = c;
...
Detail: "Why is EOF defined to be −1 when −1 cannot be represented in a char?" misleads. On systems where char is signed and EOF == -1, a char can have the value of EOF. Yet on such systems, a char can have a value of -1 that represents a character too - they overlap. So a char cannot distinctively represent all char and EOF. Best to use an int to save the return value of fgetc().
... the fgetc function obtains that character as an unsigned char converted to an int and ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, ... and the fgetc function returns EOF. ... C11 §7.21.7.1 2-3
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fgetc does not identify EOF
fgetc, checking EOF
I have created a file and named it "file.txt" in Unix. I tried to read the file content from my C program. I am not able to receive the EOF character. Unix doesn't store EOF character on file creation? If so what is the alternative way to read the EOF from a Unix created file using C.
Here's the code sample
int main(){
File *fp;
int nl,c;
nl =0;
fp = fopen("file.txt", "r");
while((c = fgetc(fp)) != EOF){
if (c=='\n')
nl++;
}
return 0;
}
If I explicitly give CTRL + D the EOF is detected even when I use char c.
This can happen if the type of c is char (and char is unsigned in your compiler, you can check this by examining the value of CHAR_MIN in ) and not int.
The value of EOF is negative according to the C standard.
So, implicitly casting EOF to unsigned char will lose the true value of EOF and the comparison will always fail.
UPDATE: There's a bigger problem that has to be addressed first. In the expression c = fgetc(fp) != EOF, fgetc(fp) != EOF is evaluated first (to 0 or 1) and then the value is assigned to c. If there's at least one character in the file, fgetc(fp) != EOF will evaluate to 0 and the body of the while loop will never execute. You need to add parentheses, like so: (c = fgetc(fp)) != EOF.
Missing parentheses. Should be:
while((c = fgetc(fp)) != EOF)
Remember: fgetc() returns an int, not a char. It has to return an int because its set of return values includes all possible valid characters plus a separate (negative) EOF indicator.
There are two possible traps if you use type char for c instead of int:
If the type char is signed with your compiler, you will detect a valid character as EOF. Often, the character ÿ (y-umlaut, officially known in Unicode as LATIN LOWER CASE Y WITH DIAERESIS, U+00FF, hex code 0xFF in the ISO 8859-1 aka Latin 1 code set) will be detected as equivalent to EOF, when it is a valid character.
If the type char is unsigned, then the comparison will never be true.
Both problems are serious, and both are avoided by using the correct type:
FILE *fp = fopen("file.txt", "r");
if (fp != 0)
{
int c;
int nl = 0;
while ((c = fgetc(fp)) != EOF)
if (c == '\n')
nl++;
printf("Number of lines: %d\n", nl);
}
Note that the type is FILE and not File. Note that you should check that the file was opened before trying to read via fp.
If I explicitly give CTRL + D, the EOF is detected even when I use char c.
This means that your compiler provides you with char as a signed type. It also means you will not be able to count lines accurately in files which contain ÿ.
Unlike CP/M and DOS, Unix does not use any character to indicate EOF; you reach EOF when there are no more characters to read. What confuses many people is that if you type a certain key combination at the terminal, programs detect EOF. What actually happens is that the terminal driver recognizes the character and sends any unread characters to the program. If there are no unread characters, the program gets 0 bytes returned, which is the same result you get when you've reached the end of file. So, the character combination (often, but not always, Ctrl-D) appears to 'send EOF' to the program. However, the character is not stored in a file if you are using cat >file; further, if you read a file which contains a control-D, that is a perfectly fine character with byte value 0x04. If a program generates a control-D and sends that to a program, that does not indicate EOF to the program. It is strictly a property of Unix terminals (tty and pty — teletype and pseudo-teletype — devices).
You do not show how you declare the variable c it should be of type int, not char.
in c language i am using EOF .....why EOF IS -1 ? why not other value?
From Wikipedia:
The actual value of EOF is system-dependent (but is commonly -1, such as in glibc) and is unequal to any valid character code.
It can't be any value in 0 - 255 because these are valid values for characters on most systems. For example if EOF were 0 then you wouldn't be able to tell the difference between reading a 0 and reaching the end of file.
-1 is the obvious remaining choice.
You may also want to consider using feof instead:
Since EOF is used to report both end of file and random errors, it's often better to use the feof function to check explicitly for end of file and ferror to check for errors.
It isn't. It is defined to be an implementation-defined negative int constant. It must be negative so that one can distinguish it easily from an unsigned char. in most implementations it is indeed -1, but that is not required by the C standard.
The historic reason for choosing -1 was that the character classification functions (see <ctype.h>) can be implemented as simple array lookups. And it is the "nearest" value that doesn't fit into an unsigned char.
[Update:] Making the character classification functions efficient was probably not the main reason for choosing -1 in the first place. I don't know all the historical details, but it is the most obvious decision. It had to be negative since there are machines whose char type didn't have exactly 8 bits, so choosing a positive value would be difficult. It had to be large enough so that it is not a valid value for unsigned char, yet small enough to fit into an int. And when you have to choose a negative number, why should you take an arbitrary large negative number? That leaves -1 as the best choice.
Refer to details at http://en.wikipedia.org/wiki/Getchar#EOF_pitfall and http://en.wikipedia.org/wiki/End-of-file
Easily you can change the EOF value.
In C program define the macro for EOF=-1 in default,
So you mention EOF in your program, that default c compiler assign value for -1;
for example;
Just you try and get the result
#include <stdio.h>
#define EOF 22
main()
{
int a;
a=EOF;
printf(" Value of a=%d\n",a);
}
Output:
Value of a=22
Reason:
That time EOF value is changed
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();
}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Consider this loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();
while(!isspace(c)) {
...
c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace character). A better way to write this would be:
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();
}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral constant with type int.
What do you put in to end the program, -1, doesn't work:
#include <stdio.h>
//copy input to output
main() {
char c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
Macro: int EOF
This macro is an integer value that is returned by a number of functions to indicate an end-of-file condition, or some other error situation. With the GNU library, EOF is -1. In other libraries, its value may be some other negative number.
The documentation for getchar is that it returns the next character available, cast to an unsigned char and then returned in an int return value.
The reason for this, is to make sure that all valid characters are returned as positive values and won't ever compare as equal to EOF, a macro which evaluates to a negative integer value.
If you put the return value of getchar into a char, then depending on whether your implementation's char is signed or unsigned you may get spurious detection of EOF, or you may never detect EOF even when you should.
Signaling EOF to the C library typically happens automatically when redirecting the input of a program from a file or a piped process. To do it interactively depends on your terminal and shell, but typically on unix it's achieved with Ctrl-D and on windows Ctrl-Z on a line by itself.
you should use int and not char
I agree with all other people in this thread by saying use int c not char.
To end the loop (at least on *nix like systems) you would press Ctrl-D to send EOF.
In addition, if you like to get your characters echoed instantly rewrite your code like this:
#include<stdio.h>
int
main(void)
{
int c;
c = getchar();
while (c != EOF)
{
putchar(c);
c = getchar();
fflush(stdout); /* optional, instant feedback */
}
return 0;
}
If the integer value returned by getchar() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.
-- opengroup POSIX standard
If char is unsigned by default for your compiler (or by whatever options are being used to invoke the compiler), it's likely that
(c == EOF)
can never be true. If sizeof(unsigned char) < sizeof( int), which is pretty much always true, then the promotion of the char to an int will never result in a negative value, and EOF must be a negative value.
That's one reason why all (or at least many if not all) the functions in the C standard that deal with or return characters specify int as the parameter or return type.
EOF is not an actual character or a sequence of characters. EOF denotes the end of the input file or stream, i.e., the situation when getchar() tries to read a character beyond the last one.
On Unix, you can close an interactive input stream by typing CTRL-D. That situation causes getchar() to return EOF. But if a file contains a character whose ASCII code is 4 (i.e., CTRL-D), getchar() will return 4, not EOF.
It Still Works with char data type. But the tricks are checking the condition in the loop with int value.
First: let's check it. if you write the following code like
printf("%d",getchar());
And then if you give the input from the keyboard A You should see 65 which is ASCII value of the A or if you give CTRL-D then see -1.
So that if you implement this logic then the solving code is
#include<stdio.h>
int main()
{
char c;
while ((c = getchar()) != EOF){
putchar(c);
//printf("%c",c); // this is another way for output
}
return 0;
}
Windows: Ctrl+z
Unix: Ctrl+d
reference:EOF
hi i think it's becoz in a stream -1 is not one but two characters and the ascii for neither of them is -1 or whatever is used for EOF