This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fgetc does not identify EOF
fgetc, checking EOF
I have created a file and named it "file.txt" in Unix. I tried to read the file content from my C program. I am not able to receive the EOF character. Unix doesn't store EOF character on file creation? If so what is the alternative way to read the EOF from a Unix created file using C.
Here's the code sample
int main(){
File *fp;
int nl,c;
nl =0;
fp = fopen("file.txt", "r");
while((c = fgetc(fp)) != EOF){
if (c=='\n')
nl++;
}
return 0;
}
If I explicitly give CTRL + D the EOF is detected even when I use char c.
This can happen if the type of c is char (and char is unsigned in your compiler, you can check this by examining the value of CHAR_MIN in ) and not int.
The value of EOF is negative according to the C standard.
So, implicitly casting EOF to unsigned char will lose the true value of EOF and the comparison will always fail.
UPDATE: There's a bigger problem that has to be addressed first. In the expression c = fgetc(fp) != EOF, fgetc(fp) != EOF is evaluated first (to 0 or 1) and then the value is assigned to c. If there's at least one character in the file, fgetc(fp) != EOF will evaluate to 0 and the body of the while loop will never execute. You need to add parentheses, like so: (c = fgetc(fp)) != EOF.
Missing parentheses. Should be:
while((c = fgetc(fp)) != EOF)
Remember: fgetc() returns an int, not a char. It has to return an int because its set of return values includes all possible valid characters plus a separate (negative) EOF indicator.
There are two possible traps if you use type char for c instead of int:
If the type char is signed with your compiler, you will detect a valid character as EOF. Often, the character ÿ (y-umlaut, officially known in Unicode as LATIN LOWER CASE Y WITH DIAERESIS, U+00FF, hex code 0xFF in the ISO 8859-1 aka Latin 1 code set) will be detected as equivalent to EOF, when it is a valid character.
If the type char is unsigned, then the comparison will never be true.
Both problems are serious, and both are avoided by using the correct type:
FILE *fp = fopen("file.txt", "r");
if (fp != 0)
{
int c;
int nl = 0;
while ((c = fgetc(fp)) != EOF)
if (c == '\n')
nl++;
printf("Number of lines: %d\n", nl);
}
Note that the type is FILE and not File. Note that you should check that the file was opened before trying to read via fp.
If I explicitly give CTRL + D, the EOF is detected even when I use char c.
This means that your compiler provides you with char as a signed type. It also means you will not be able to count lines accurately in files which contain ÿ.
Unlike CP/M and DOS, Unix does not use any character to indicate EOF; you reach EOF when there are no more characters to read. What confuses many people is that if you type a certain key combination at the terminal, programs detect EOF. What actually happens is that the terminal driver recognizes the character and sends any unread characters to the program. If there are no unread characters, the program gets 0 bytes returned, which is the same result you get when you've reached the end of file. So, the character combination (often, but not always, Ctrl-D) appears to 'send EOF' to the program. However, the character is not stored in a file if you are using cat >file; further, if you read a file which contains a control-D, that is a perfectly fine character with byte value 0x04. If a program generates a control-D and sends that to a program, that does not indicate EOF to the program. It is strictly a property of Unix terminals (tty and pty — teletype and pseudo-teletype — devices).
You do not show how you declare the variable c it should be of type int, not char.
Related
This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by
if ((c = fgetc (stream)) != EOF)
where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.
But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...
Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...
EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)
You asked:
Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?
On the contrary, the standard explicitly allows EOF to be returned when an error occurs.
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
In the footnotes, I see:
An end-of-file and a read error can be distinguished by use of the feof and ferror functions.
You also asked:
Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?
On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.
c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
// Got valid input in c.
}
I think you need to rely on stream error.
ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;
From the standard
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Edit for better version
ch = fgetc(stream);
if (ch == EOF) {
if (ferror(stream)) /* error reading */;
else if (feof(stream)) /* end of file */;
else /* read valid character with value equal to EOF */;
}
If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.
But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).
NOTE: as #Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as #pmg suggested.
I agree with your reading.
C Standard says (C11, 7.21.7.1 The fgetc function p3):
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.
I have the following code:
FILE *f = fopen('/path/to/some/file', 'rb');
char c;
while((c = fgetc(f)) != EOF)
{
printf("next char: '%c', '%d'", c, c);
}
For some reason, when printing out the characters, at the end of the file, an un-renderable character gets printed out, along with the ASCII ordinal -1.
next char: '?', '-1'
What character is this supposed to be? I know it's not EOF because there's a check for that, and quickly after the character is printed, the program SEGFAULT.
The trouble is that fgetc() and its relatives return an int, not a char:
If the end-of-file indicator for the input stream pointed to by stream is not set and a
next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the
stream (if defined).
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-
file indicator for the stream is set and the fgetc function returns EOF.
It has to return every possible valid character value and a distinct value, EOF (which is negative, and usually but not necessarily -1).
When you read the value into a char instead of an int, one of two undesirable things happens:
If plain char is unsigned, then you never get a value equal to EOF, so the loop never terminates.
If plain char is signed, then you can mistake a legitimate character, 0xFF (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is treated the same as EOF, so you detect EOF prematurely.
Either way, it is not good.
The Fix
The fix is to use int c; instead of char c;.
Incidentally, the fopen() call should not compile:
FILE *f = fopen('/path/to/some/file', 'rb');
should be:
FILE *f = fopen("/path/to/some/file", "rb");
Always check the result of fopen(); of all the I/O functions, it is more prone to failure than almost any other (not through its own fault, but because the user or programmer makes a mistake with the file name).
This is the culprit:
char c;
Please change it to:
int c;
The return type of fgetc is int, not char. You get strange behavior when you convert int to char in some platforms.
I was reading K&R book and wanted to test out printf() and putchar() functions in ways that I never tried. I encountered several unexpected events and would like to hear from more experienced programmers why that happens.
char c;
while((c = getchar()) != EOF) {
//put char(c);
printf("%d your character was.\n", c);
}
How would one get EOF (end of file) in the input stream (getchar() or scanf() functions)? Unexpected key that is not recognized by getchar()/scanf() function could produce it?
In my book, it says that c has to be an integer, because it needs to store EOF and the variable has to be big enough to hold any possible char that EOF can hold. This doesn't make sense for me, because EOF is a constant integer with a value of -1, which even char can store. Can anyone clarify what was meant by this?
What happens when I send "hello" or 'hello' to putchar() function? It expects to get an integer, but returns weird output, such as EE or oo, if I send the latter string or char sequence.
Why when I use printf() function that is written above I get two outputs? One is the one I entered and the other one is integer, which in ASCII is end of line. Does it produce the second output, because I press enter, which it assumes to be the second character?
Thanks.
on Linux, you can send it with Ctrl+d
you need an int, otherwise you can't make the difference between EOF and the last possible character (0xFFFF is not the same than 0x00FF)
putchar wants a character, not a string, if you're trying to give it a string, it'll print a part of the string address
you only get one output: the ascii value of the character you entered, the other "input" is what you typed in the terminal
edit - more details about 2
You need an int, because getchar can returns both a character of value -1 (0x00FF) and an integer of value -1 (0xFFFF), they don't have the same meaning: the character of value -1 is a valid character (for instance, it's ÿ in latin-1) while the integer of value -1 is EOF in this context.
Here's a simple program that shows the difference:
#include <stdio.h>
int main(int argc, char ** argv) {
{
char c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
{
int c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
}
The first test succeeds because 0xFF == -1 for char, while the second tests fails because 0x00FF != -1 for int.
I hope that makes it a bit clearer.
you must close the input stream to get EOF. Usually with CTRL-D in UNIX, but see you tty config (stty -a)
already answered
same
your tty echoes what you type by default. If you don't want this, set it in noecho mode (stty -echo). Becareful as some shells sets it again to echo. Try with sh. You must be aware taht tty also buffers your inputs until RETURN is types (see stty manual for raw mode if you need).
I am reading K & R C language book, following code fragment:
char c;
while ((c = getchar()) != EOF) ...
It was mentioned that for EOF (i think it is -1) is an "out of band" return value from getchar, distinct from all possible
values that getchar can return.
My questions are following:
I ran my program with char and it ran successfully, and my
understanding is signed char can store -127 to +127 so it can check
for -1 how it is "out of band" ?
Can any one provide simple example where above program fragment will fail if we use char c instead of int c?
Thanks!
You have a small mistake, getchar returns an int, not a char:
int c;
while ((c = getchar()) != EOF) ...
The valid values for ascii chars are from 0 to 127, the EOF is some other (int) value.
If you keep using char, you might get into troubles (as I got into)
Well, your question is answered in the C FAQ.
Two failure modes are possible if, as in the fragment above, getchar's
return value is assigned to a char.
If type char is signed, and if EOF is defined (as is usual) as -1,
the character with the decimal value 255 ('\377' or '\xff' in C) will
be sign-extended and will compare equal to EOF, prematurely
terminating the input.
If type char is unsigned, an actual EOF value will be truncated (by
having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.
Whatever value EOF has depends on your platform. Take a look at stdio.h too see its actual definition.
What do you put in to end the program, -1, doesn't work:
#include <stdio.h>
//copy input to output
main() {
char c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
Macro: int EOF
This macro is an integer value that is returned by a number of functions to indicate an end-of-file condition, or some other error situation. With the GNU library, EOF is -1. In other libraries, its value may be some other negative number.
The documentation for getchar is that it returns the next character available, cast to an unsigned char and then returned in an int return value.
The reason for this, is to make sure that all valid characters are returned as positive values and won't ever compare as equal to EOF, a macro which evaluates to a negative integer value.
If you put the return value of getchar into a char, then depending on whether your implementation's char is signed or unsigned you may get spurious detection of EOF, or you may never detect EOF even when you should.
Signaling EOF to the C library typically happens automatically when redirecting the input of a program from a file or a piped process. To do it interactively depends on your terminal and shell, but typically on unix it's achieved with Ctrl-D and on windows Ctrl-Z on a line by itself.
you should use int and not char
I agree with all other people in this thread by saying use int c not char.
To end the loop (at least on *nix like systems) you would press Ctrl-D to send EOF.
In addition, if you like to get your characters echoed instantly rewrite your code like this:
#include<stdio.h>
int
main(void)
{
int c;
c = getchar();
while (c != EOF)
{
putchar(c);
c = getchar();
fflush(stdout); /* optional, instant feedback */
}
return 0;
}
If the integer value returned by getchar() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.
-- opengroup POSIX standard
If char is unsigned by default for your compiler (or by whatever options are being used to invoke the compiler), it's likely that
(c == EOF)
can never be true. If sizeof(unsigned char) < sizeof( int), which is pretty much always true, then the promotion of the char to an int will never result in a negative value, and EOF must be a negative value.
That's one reason why all (or at least many if not all) the functions in the C standard that deal with or return characters specify int as the parameter or return type.
EOF is not an actual character or a sequence of characters. EOF denotes the end of the input file or stream, i.e., the situation when getchar() tries to read a character beyond the last one.
On Unix, you can close an interactive input stream by typing CTRL-D. That situation causes getchar() to return EOF. But if a file contains a character whose ASCII code is 4 (i.e., CTRL-D), getchar() will return 4, not EOF.
It Still Works with char data type. But the tricks are checking the condition in the loop with int value.
First: let's check it. if you write the following code like
printf("%d",getchar());
And then if you give the input from the keyboard A You should see 65 which is ASCII value of the A or if you give CTRL-D then see -1.
So that if you implement this logic then the solving code is
#include<stdio.h>
int main()
{
char c;
while ((c = getchar()) != EOF){
putchar(c);
//printf("%c",c); // this is another way for output
}
return 0;
}
Windows: Ctrl+z
Unix: Ctrl+d
reference:EOF
hi i think it's becoz in a stream -1 is not one but two characters and the ascii for neither of them is -1 or whatever is used for EOF