in c language i am using EOF .....why EOF IS -1 ? why not other value?
From Wikipedia:
The actual value of EOF is system-dependent (but is commonly -1, such as in glibc) and is unequal to any valid character code.
It can't be any value in 0 - 255 because these are valid values for characters on most systems. For example if EOF were 0 then you wouldn't be able to tell the difference between reading a 0 and reaching the end of file.
-1 is the obvious remaining choice.
You may also want to consider using feof instead:
Since EOF is used to report both end of file and random errors, it's often better to use the feof function to check explicitly for end of file and ferror to check for errors.
It isn't. It is defined to be an implementation-defined negative int constant. It must be negative so that one can distinguish it easily from an unsigned char. in most implementations it is indeed -1, but that is not required by the C standard.
The historic reason for choosing -1 was that the character classification functions (see <ctype.h>) can be implemented as simple array lookups. And it is the "nearest" value that doesn't fit into an unsigned char.
[Update:] Making the character classification functions efficient was probably not the main reason for choosing -1 in the first place. I don't know all the historical details, but it is the most obvious decision. It had to be negative since there are machines whose char type didn't have exactly 8 bits, so choosing a positive value would be difficult. It had to be large enough so that it is not a valid value for unsigned char, yet small enough to fit into an int. And when you have to choose a negative number, why should you take an arbitrary large negative number? That leaves -1 as the best choice.
Refer to details at http://en.wikipedia.org/wiki/Getchar#EOF_pitfall and http://en.wikipedia.org/wiki/End-of-file
Easily you can change the EOF value.
In C program define the macro for EOF=-1 in default,
So you mention EOF in your program, that default c compiler assign value for -1;
for example;
Just you try and get the result
#include <stdio.h>
#define EOF 22
main()
{
int a;
a=EOF;
printf(" Value of a=%d\n",a);
}
Output:
Value of a=22
Reason:
That time EOF value is changed
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();
}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Consider this loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();
while(!isspace(c)) {
...
c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace character). A better way to write this would be:
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();
}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral constant with type int.
Related
I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?
Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)
Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.
I'm learning the C programming on a raspberry pi, however I found that my program never catches the EOF successfully. I use char c=0; printf("%d",c-1); to test the char type, finding that the char type ranges from 0 to 255, as an unsigned short. but the EOF defined in stdio.h is (-1). So is the wrong cc package installed on my Pi? how can I fix it? If I changed the EOF value in stdio.h manually, will there be further problems?
what worries me is that ,when I learning from the K&R book, there are examples which use code like while ((c=getchar())!=EOF), I followed that on my Ubuntu machine and it works fine. I just wonder if such kind of syntax is abandoned by modern C practice or there is something conflict in my Raspberry Pi?
here is my code:
#include <stdio.h>
int main( void )
{
char c;
int i=0;
while ((c=getchar())!=EOF&&i<50) {
putchar(c);
i++;
}
if (c==EOF)
printf("\nEOF got.\n");
while ((c=getchar())!=EOF&&i<500) {
printf("%d",c);
i++;
}
}
even when I redirect the input to an file, it keeps printing 255 on the screen, never terminate this program.
Finally I found that I'm wrong,In the K&R book, it defined c as an int, not a char. Problem solved.
You need to store the character read by fgetc(), getchar(), etc. in an int so you can catch the EOF. This is well-known and has always been the case everywhere. EOF must be distinguishable from all proper characters, so it was decided that functions like fgetc() return valid characters as non-negative values (even if char is signed). An end-of-file condition is signalled by -1, which is negative and thus cannot collide with any valid character fgetc() could return.
Do not edit the system headers and especially do not change the value of constants defined there. If you do that, you break these headers. Notice that even if you change the value of EOF in the headers, this won't change the value functions like fgetc() return on end-of-file or error, it just makes EOF have the wrong value.
Why is EOF defined to be −1 when −1 cannot be represented in a char?
Because EOF isn't a character but a state.
If I changed the EOF value in stdio.h manually, will there be further
problems?
Absolutely, since you would be effectively breaking the header entirely. A header is not an actual function, just a set of prototypes and declarations for functions that are defined elsewhere ABSOLUTELY DO NOT change system headers, you will never succeed in doing anything but breaking your code, project and/or worse things.
On the subject of EOF: EOF is not a character, and thus cannot be represented in a character variable. To get around this, most programmers simple use an int value (by default signed) that can interpret the -1 from EOF. The reason that EOF can never be a character is because otherwise there would be one character indistinguishable from the end of file indicator.
int versus char.
fgetc() returns an int, not char. The values returned are in the range of unsigned char and EOF. This is typically 257 different values. So saving the result in char, signed char, unsigned char will lose some distinguishably.
Instead save the fgetc() return value in an int. After testing for an EOF result, the value can be saved as a char if needed.
// char c;
int c;
...
while ((c=getchar())!=EOF&&i<50) {
char ch = c;
...
Detail: "Why is EOF defined to be −1 when −1 cannot be represented in a char?" misleads. On systems where char is signed and EOF == -1, a char can have the value of EOF. Yet on such systems, a char can have a value of -1 that represents a character too - they overlap. So a char cannot distinctively represent all char and EOF. Best to use an int to save the return value of fgetc().
... the fgetc function obtains that character as an unsigned char converted to an int and ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, ... and the fgetc function returns EOF. ... C11 §7.21.7.1 2-3
I am reading K & R C language book, following code fragment:
char c;
while ((c = getchar()) != EOF) ...
It was mentioned that for EOF (i think it is -1) is an "out of band" return value from getchar, distinct from all possible
values that getchar can return.
My questions are following:
I ran my program with char and it ran successfully, and my
understanding is signed char can store -127 to +127 so it can check
for -1 how it is "out of band" ?
Can any one provide simple example where above program fragment will fail if we use char c instead of int c?
Thanks!
You have a small mistake, getchar returns an int, not a char:
int c;
while ((c = getchar()) != EOF) ...
The valid values for ascii chars are from 0 to 127, the EOF is some other (int) value.
If you keep using char, you might get into troubles (as I got into)
Well, your question is answered in the C FAQ.
Two failure modes are possible if, as in the fragment above, getchar's
return value is assigned to a char.
If type char is signed, and if EOF is defined (as is usual) as -1,
the character with the decimal value 255 ('\377' or '\xff' in C) will
be sign-extended and will compare equal to EOF, prematurely
terminating the input.
If type char is unsigned, an actual EOF value will be truncated (by
having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.
Whatever value EOF has depends on your platform. Take a look at stdio.h too see its actual definition.
What do you put in to end the program, -1, doesn't work:
#include <stdio.h>
//copy input to output
main() {
char c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
Macro: int EOF
This macro is an integer value that is returned by a number of functions to indicate an end-of-file condition, or some other error situation. With the GNU library, EOF is -1. In other libraries, its value may be some other negative number.
The documentation for getchar is that it returns the next character available, cast to an unsigned char and then returned in an int return value.
The reason for this, is to make sure that all valid characters are returned as positive values and won't ever compare as equal to EOF, a macro which evaluates to a negative integer value.
If you put the return value of getchar into a char, then depending on whether your implementation's char is signed or unsigned you may get spurious detection of EOF, or you may never detect EOF even when you should.
Signaling EOF to the C library typically happens automatically when redirecting the input of a program from a file or a piped process. To do it interactively depends on your terminal and shell, but typically on unix it's achieved with Ctrl-D and on windows Ctrl-Z on a line by itself.
you should use int and not char
I agree with all other people in this thread by saying use int c not char.
To end the loop (at least on *nix like systems) you would press Ctrl-D to send EOF.
In addition, if you like to get your characters echoed instantly rewrite your code like this:
#include<stdio.h>
int
main(void)
{
int c;
c = getchar();
while (c != EOF)
{
putchar(c);
c = getchar();
fflush(stdout); /* optional, instant feedback */
}
return 0;
}
If the integer value returned by getchar() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.
-- opengroup POSIX standard
If char is unsigned by default for your compiler (or by whatever options are being used to invoke the compiler), it's likely that
(c == EOF)
can never be true. If sizeof(unsigned char) < sizeof( int), which is pretty much always true, then the promotion of the char to an int will never result in a negative value, and EOF must be a negative value.
That's one reason why all (or at least many if not all) the functions in the C standard that deal with or return characters specify int as the parameter or return type.
EOF is not an actual character or a sequence of characters. EOF denotes the end of the input file or stream, i.e., the situation when getchar() tries to read a character beyond the last one.
On Unix, you can close an interactive input stream by typing CTRL-D. That situation causes getchar() to return EOF. But if a file contains a character whose ASCII code is 4 (i.e., CTRL-D), getchar() will return 4, not EOF.
It Still Works with char data type. But the tricks are checking the condition in the loop with int value.
First: let's check it. if you write the following code like
printf("%d",getchar());
And then if you give the input from the keyboard A You should see 65 which is ASCII value of the A or if you give CTRL-D then see -1.
So that if you implement this logic then the solving code is
#include<stdio.h>
int main()
{
char c;
while ((c = getchar()) != EOF){
putchar(c);
//printf("%c",c); // this is another way for output
}
return 0;
}
Windows: Ctrl+z
Unix: Ctrl+d
reference:EOF
hi i think it's becoz in a stream -1 is not one but two characters and the ascii for neither of them is -1 or whatever is used for EOF
Is EOF always negative?
I'm thinking of writing a function that reads the next word in the input and returns the line number the word was found in or EOF if the end of the input has been reached. If EOF is not necessarily negative, the function would be incorrect.
EOF is always == EOF. Don't assume anything else.
On a second reading of the standard (and as per some other comments here) it seems EOF is always negative - and for the use specified in this question (line number or EOF) it would work. What I meant to warn against (and still do) is assuming characters are positive and EOF is negative.
Remember that it's possible for a standard conforming C implementation to have negative character values - this is even mentioned in 'The C programming language' (K&R). Printing characters are always positive, but on some architectures (probably all ancient), control characters are negative. The C standard does not specify whether the char type is signed or unsigned, and the only character constant guaranteed to have the same value across platforms, is '\0'.
Yes, EOF is always negative.
The Standard says:
7.19 Input/output
7.19.1 Introduction
3 The macros are [...] EOF which
expands to an integer constant
expression, with type int and a
negative value, that is returned by
several functions to indicate
end-of-file, that is, no more input
from a stream;
Note that there's no problem with "plain" char being signed. The <stdio.h> functions which deal with chars, specifically cast the characters to unsigned char and then to int, so that all valid characters have a positive value. For example:
int fgetc(FILE *stream)
7.19.7.1
... the fgetc function obtains that character as an unsigned char converted to an int ...
Have that function return
the line number the word was found in
or -1 in case the end of the input has been reached
Problem solved, without a need for relying on any EOF values. The caller can easily test for greater-or-equal-to-zero for a successful call, and assume EOF/IO-error otherwise.
From the online draft n1256, 17.9.1.3:
EOF
which expands to an integer constant expression, with type int and a negative value,
that is returned by several functions to indicate end-of-file, that is, no more input
from a stream;
EOF is always negative, though it may not always be -1.
For issues like this, I prefer separating error conditions from data by returning an error code (SUCCESS, END_OF_FILE, READ_ERROR, etc.) as the function's return value, and then writing the data of interest to separate parameters, such as
int getNextWord (FILE *stream, char *buffer, size_t bufferSize, int *lineNumber)
{
if (!fgets(buffer, bufferSize, stream))
{
if (feof(stream)) return END_OF_FILE; else return READ_ERROR;
}
else
{
// figure out the line number
*lineNumber = ...;
}
return SUCCESS;
}
EOF is a condition, rather than a value. The exact value of this sentinel is implementation defined. In a lot of cases, it is a negative number.
From wikipedia :
The actual value of EOF is a
system-dependent negative number,
commonly -1, which is guaranteed to be
unequal to any valid character code.
But no references ...
From Secure Coding : Detect and handle input and output errors
EOF is negative but only when sizeof(int) > sizeof(char).