Why are islower() and friends required to handle EOF?

Why are islower() and friends required to handle EOF? - c

Why are islower() and friends required to handle EOF, whereas putchar() and friends don't have to?
Why isn't islower() treating int as unsigned char, as it is the case in putchar()? This would make total sense, because we have to check for EOF first anyway. See also Why the argument type of putchar(), fputc(), and putc() is not char?

because we have to check for EOF first anyway.
We absolutely don't.
int c;
while(isspace(c=fgetc(fp)));
if (c==EOF) ...
This is totally legitimate code to skip whitespaces. Checking each character for EOF separately is a waste of time.
The ctype functions are made to handle EOF specifically to enable code like this.
See also this question.

None of character type functions are required to handle EOF, other than ignoring it (i.e. returning false). In fact, EOF marker is not even mentioned in <ctype.h> header documentation.
The most likely reason for character classification function signatures to use int in place of char, signed or unsigned, is to avoid implementation-defined behavior in loops like this:
int c;
while ((c =getchar()) != EOF) {
if (islower(c)) {
...
} else if (isdigi(c)) {
...
}
}
This would compile and run with islower(char) instead of islower(int), but the result would be implementation defined, which is not desirable under such basic circumstances. Essentially, int in the signature of getchar became "contagious," getting into signatures of functions only marginally related to it.

Related

why getchar() function work different in a loop?

#include <stdio.h>
int main()
{
char c;
while ((c=getchar()) != EOF)
putchar(c);
}
why when i input text such as for example "omar"put char print "omar" back .Isn't it supposed to print 'o' repeatedly since getchar will take first character only.I know that I am wrong about something probably because i don't know how exactly getchar or putchar works ,so can somebody please explain how they works.Another question why getchar and putchar work normally without a while loop but inside while loop behavior is something different.

why getchar() function work different in a loop?
I take you to be asking why getchar() works differently than you expect, as opposed to differently than in other contexts. If you in fact meant the latter then the answer would be "it doesn't."
But of course reading a character from a stream, whether via getchar() or some other I/O function, removes it from the stream. getchar() would not be very useful if it did not do that. Therefore, if you call it repeatedly, you read (and remove) each character in turn until and unless all available characters are consumed. You can test this by replacing the loop in your program with several getchar() calls in row.
And, of course, your loop does call it repeatedly. The loop-control expression, (c=getchar()) != EOF, is evaluated before each iteration of the loop, and that involves calling getchar() (as opposed to using a value previously returned by that function).
On a completely separate note, do be aware that getchar() returns a result of type int, exactly so that it can return at least one value, EOF, that is outside the range of type unsigned char. If you convert the result to type char then either there is one real input value that you will mistake for EOF, or you will never detect EOF, depending on whether char is signed or unsigned. To reliably and portably detect the end of the file, you must handle that return value as an int, not a char.

Why is EOF defined to be −1 when −1 cannot be represented in a char?

I'm learning the C programming on a raspberry pi, however I found that my program never catches the EOF successfully. I use char c=0; printf("%d",c-1); to test the char type, finding that the char type ranges from 0 to 255, as an unsigned short. but the EOF defined in stdio.h is (-1). So is the wrong cc package installed on my Pi? how can I fix it? If I changed the EOF value in stdio.h manually, will there be further problems?
what worries me is that ,when I learning from the K&R book, there are examples which use code like while ((c=getchar())!=EOF), I followed that on my Ubuntu machine and it works fine. I just wonder if such kind of syntax is abandoned by modern C practice or there is something conflict in my Raspberry Pi?
here is my code:
#include <stdio.h>
int main( void )
{
char c;
int i=0;
while ((c=getchar())!=EOF&&i<50) {
putchar(c);
i++;
}
if (c==EOF)
printf("\nEOF got.\n");
while ((c=getchar())!=EOF&&i<500) {
printf("%d",c);
i++;
}
}
even when I redirect the input to an file, it keeps printing 255 on the screen, never terminate this program.
Finally I found that I'm wrong,In the K&R book, it defined c as an int, not a char. Problem solved.

You need to store the character read by fgetc(), getchar(), etc. in an int so you can catch the EOF. This is well-known and has always been the case everywhere. EOF must be distinguishable from all proper characters, so it was decided that functions like fgetc() return valid characters as non-negative values (even if char is signed). An end-of-file condition is signalled by -1, which is negative and thus cannot collide with any valid character fgetc() could return.
Do not edit the system headers and especially do not change the value of constants defined there. If you do that, you break these headers. Notice that even if you change the value of EOF in the headers, this won't change the value functions like fgetc() return on end-of-file or error, it just makes EOF have the wrong value.

Why is EOF defined to be −1 when −1 cannot be represented in a char?
Because EOF isn't a character but a state.

If I changed the EOF value in stdio.h manually, will there be further
problems?
Absolutely, since you would be effectively breaking the header entirely. A header is not an actual function, just a set of prototypes and declarations for functions that are defined elsewhere ABSOLUTELY DO NOT change system headers, you will never succeed in doing anything but breaking your code, project and/or worse things.
On the subject of EOF: EOF is not a character, and thus cannot be represented in a character variable. To get around this, most programmers simple use an int value (by default signed) that can interpret the -1 from EOF. The reason that EOF can never be a character is because otherwise there would be one character indistinguishable from the end of file indicator.

int versus char.
fgetc() returns an int, not char. The values returned are in the range of unsigned char and EOF. This is typically 257 different values. So saving the result in char, signed char, unsigned char will lose some distinguishably.
Instead save the fgetc() return value in an int. After testing for an EOF result, the value can be saved as a char if needed.
// char c;
int c;
...
while ((c=getchar())!=EOF&&i<50) {
char ch = c;
...
Detail: "Why is EOF defined to be −1 when −1 cannot be represented in a char?" misleads. On systems where char is signed and EOF == -1, a char can have the value of EOF. Yet on such systems, a char can have a value of -1 that represents a character too - they overlap. So a char cannot distinctively represent all char and EOF. Best to use an int to save the return value of fgetc().
... the fgetc function obtains that character as an unsigned char converted to an int and ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, ... and the fgetc function returns EOF. ... C11 §7.21.7.1 2-3

why in c language EOF IS -1?

in c language i am using EOF .....why EOF IS -1 ? why not other value?

From Wikipedia:
The actual value of EOF is system-dependent (but is commonly -1, such as in glibc) and is unequal to any valid character code.
It can't be any value in 0 - 255 because these are valid values for characters on most systems. For example if EOF were 0 then you wouldn't be able to tell the difference between reading a 0 and reaching the end of file.
-1 is the obvious remaining choice.
You may also want to consider using feof instead:
Since EOF is used to report both end of file and random errors, it's often better to use the feof function to check explicitly for end of file and ferror to check for errors.

It isn't. It is defined to be an implementation-defined negative int constant. It must be negative so that one can distinguish it easily from an unsigned char. in most implementations it is indeed -1, but that is not required by the C standard.
The historic reason for choosing -1 was that the character classification functions (see <ctype.h>) can be implemented as simple array lookups. And it is the "nearest" value that doesn't fit into an unsigned char.
[Update:] Making the character classification functions efficient was probably not the main reason for choosing -1 in the first place. I don't know all the historical details, but it is the most obvious decision. It had to be negative since there are machines whose char type didn't have exactly 8 bits, so choosing a positive value would be difficult. It had to be large enough so that it is not a valid value for unsigned char, yet small enough to fit into an int. And when you have to choose a negative number, why should you take an arbitrary large negative number? That leaves -1 as the best choice.

Refer to details at http://en.wikipedia.org/wiki/Getchar#EOF_pitfall and http://en.wikipedia.org/wiki/End-of-file

Easily you can change the EOF value.
In C program define the macro for EOF=-1 in default,
So you mention EOF in your program, that default c compiler assign value for -1;
for example;
Just you try and get the result
#include <stdio.h>
#define EOF 22
main()
{
int a;
a=EOF;
printf(" Value of a=%d\n",a);
}
Output:
Value of a=22
Reason:
That time EOF value is changed

int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();
}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Consider this loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();
while(!isspace(c)) {
...
c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace character). A better way to write this would be:
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();
}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral constant with type int.

Is getchar() equivalent to scanf("%c") and putchar() equivalent to printf("%c")?

Is a = getchar() equivalent to scanf("%c",&a);?
Is putchar(a) equivalent to printf("%c",a); where a is a char variable?

Generally speaking yes they are the same.
But they are not in a few nitpicky ways. The function getchar is typed to return int and not char. This is done so that getchar can both all possible char values and additionally error codes.
So while the following happily compiles in most compilers you are essentially truncating away an error message
char c = getchar();
The function scanf, though, allows you to use a char type directly and separates out the error code into the return value.

They do the same thing here. However, if you know you are just doing characters then getchar and putchar will be more efficient, since the printf and scanf variants will have to parse the string each time to determine how to process your request. Plus, they may be called in a lower level library meaning you may not have to have the printf/scanf linked if they are not needed elsewhere.

EndOfFile in C - EOF

What do you put in to end the program, -1, doesn't work:
#include <stdio.h>
//copy input to output
main() {
char c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}

Macro: int EOF
This macro is an integer value that is returned by a number of functions to indicate an end-of-file condition, or some other error situation. With the GNU library, EOF is -1. In other libraries, its value may be some other negative number.

The documentation for getchar is that it returns the next character available, cast to an unsigned char and then returned in an int return value.
The reason for this, is to make sure that all valid characters are returned as positive values and won't ever compare as equal to EOF, a macro which evaluates to a negative integer value.
If you put the return value of getchar into a char, then depending on whether your implementation's char is signed or unsigned you may get spurious detection of EOF, or you may never detect EOF even when you should.
Signaling EOF to the C library typically happens automatically when redirecting the input of a program from a file or a piped process. To do it interactively depends on your terminal and shell, but typically on unix it's achieved with Ctrl-D and on windows Ctrl-Z on a line by itself.

you should use int and not char

I agree with all other people in this thread by saying use int c not char.
To end the loop (at least on *nix like systems) you would press Ctrl-D to send EOF.
In addition, if you like to get your characters echoed instantly rewrite your code like this:
#include<stdio.h>
int
main(void)
{
int c;
c = getchar();
while (c != EOF)
{
putchar(c);
c = getchar();
fflush(stdout); /* optional, instant feedback */
}
return 0;
}

If the integer value returned by getchar() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.
-- opengroup POSIX standard

If char is unsigned by default for your compiler (or by whatever options are being used to invoke the compiler), it's likely that
(c == EOF)
can never be true. If sizeof(unsigned char) < sizeof( int), which is pretty much always true, then the promotion of the char to an int will never result in a negative value, and EOF must be a negative value.
That's one reason why all (or at least many if not all) the functions in the C standard that deal with or return characters specify int as the parameter or return type.

EOF is not an actual character or a sequence of characters. EOF denotes the end of the input file or stream, i.e., the situation when getchar() tries to read a character beyond the last one.
On Unix, you can close an interactive input stream by typing CTRL-D. That situation causes getchar() to return EOF. But if a file contains a character whose ASCII code is 4 (i.e., CTRL-D), getchar() will return 4, not EOF.

It Still Works with char data type. But the tricks are checking the condition in the loop with int value.
First: let's check it. if you write the following code like
printf("%d",getchar());
And then if you give the input from the keyboard A You should see 65 which is ASCII value of the A or if you give CTRL-D then see -1.
So that if you implement this logic then the solving code is
#include<stdio.h>
int main()
{
char c;
while ((c = getchar()) != EOF){
putchar(c);
//printf("%c",c); // this is another way for output
}
return 0;
}

Windows: Ctrl+z
Unix: Ctrl+d
reference:EOF

hi i think it's becoz in a stream -1 is not one but two characters and the ascii for neither of them is -1 or whatever is used for EOF

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why are islower() and friends required to handle EOF? - c

Related

why getchar() function work different in a loop?

Why is EOF defined to be −1 when −1 cannot be represented in a char?

why in c language EOF IS -1?

Is getchar() equivalent to scanf("%c") and putchar() equivalent to printf("%c")?

EndOfFile in C - EOF

Categories

Resources