I am writing a simple code to print the content of the file to stdout.
When i use this :
while((c=fgetc(fp))!=EOF)putchar(c);
It works like it should but i wanna to merge putchar and fgetc. So i wrote
while(putchar(fgetc(fp))!=EOF);
But it doesn't seem to work. So i check the return value of putchar
RETURN VALUE
fputc(), putc() and putchar() return the character written as an
unsigned char cast to an int or EOF on error.
So why it doesn't work?
getchar returns one of the following:
A character, represented as an unsigned char value (e.g. typically between 0 and 255, inclusive of those values), converted to an int. Thus, there are typically one of 256 (UCHAR_MAX+1, technically) values that fall into this category.
A non-character, EOF, which has a negative value, typically -1.
Thus, getchar may typically return one of 257 (not 256) values. If you attempt to convert that value straight to char or unsigned char (e.g. by calling putchar), you'll be losing the EOF information.
For this reason you need to store the return value of getchar into an int before you convert it to an unsigned char or char.
Related
I'm trying to loop over all bytes of a file using a simple while loop, like so:
char c = fgetc(InputFile);
while (c != EOF)
{
doStuff(c)
c = fgetc(InputFile);
}
However, when working with non-text files, I've found that some of the bytes within the file (that aren't the last one) contain the value 255, and therefore register as EOF and the while loop ends prematurely.
How do I get around this and loop over all bytes?
As mentioned in the comments, you should assign the value returned by fgetc to an int variable, not a char. That way, you will be able to distinguish between a successfully input character that has the hex value 0xFF (fgetc will return 255) and a end-of-file condition (fgetc will return EOF, which is -1).
From the cppreference page for fgetc:
On success, returns the obtained character as an unsigned char
converted to an int. On failure, returns EOF.
As written in book-
The problem is distinguishing the end of the input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF,for "end of file." We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.
main()
{
int c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
I am not able to understand the actual reason of using int instead of char. What will be returned by EOF such that cannot be stored in char.
A char can hold 256 different values (0 to 255). If EOF was a char, the value of EOF would therefore be some value between 0 and 255, which would imply that there would be one character that you cannot read. Therefore the value of EOF cannot be between 0 and 255, which implies that it cannot fit into a char, which implies that its type must be larger than char, for example an int.
In other words EOF is not a char and we don't want to store it in a char. It's only purpose is to enable a program to detect that one char beyond the end of the file has been attempted to read.
Or still in other words: let's suppose EOF is defined as 255 and therefore fit's into a char. Now let's suppose getchar returns the value 255 (that is EOF). Now what does that value represent? Is it an EOF or is it the character 255?
I'm learning the C programming on a raspberry pi, however I found that my program never catches the EOF successfully. I use char c=0; printf("%d",c-1); to test the char type, finding that the char type ranges from 0 to 255, as an unsigned short. but the EOF defined in stdio.h is (-1). So is the wrong cc package installed on my Pi? how can I fix it? If I changed the EOF value in stdio.h manually, will there be further problems?
what worries me is that ,when I learning from the K&R book, there are examples which use code like while ((c=getchar())!=EOF), I followed that on my Ubuntu machine and it works fine. I just wonder if such kind of syntax is abandoned by modern C practice or there is something conflict in my Raspberry Pi?
here is my code:
#include <stdio.h>
int main( void )
{
char c;
int i=0;
while ((c=getchar())!=EOF&&i<50) {
putchar(c);
i++;
}
if (c==EOF)
printf("\nEOF got.\n");
while ((c=getchar())!=EOF&&i<500) {
printf("%d",c);
i++;
}
}
even when I redirect the input to an file, it keeps printing 255 on the screen, never terminate this program.
Finally I found that I'm wrong,In the K&R book, it defined c as an int, not a char. Problem solved.
You need to store the character read by fgetc(), getchar(), etc. in an int so you can catch the EOF. This is well-known and has always been the case everywhere. EOF must be distinguishable from all proper characters, so it was decided that functions like fgetc() return valid characters as non-negative values (even if char is signed). An end-of-file condition is signalled by -1, which is negative and thus cannot collide with any valid character fgetc() could return.
Do not edit the system headers and especially do not change the value of constants defined there. If you do that, you break these headers. Notice that even if you change the value of EOF in the headers, this won't change the value functions like fgetc() return on end-of-file or error, it just makes EOF have the wrong value.
Why is EOF defined to be −1 when −1 cannot be represented in a char?
Because EOF isn't a character but a state.
If I changed the EOF value in stdio.h manually, will there be further
problems?
Absolutely, since you would be effectively breaking the header entirely. A header is not an actual function, just a set of prototypes and declarations for functions that are defined elsewhere ABSOLUTELY DO NOT change system headers, you will never succeed in doing anything but breaking your code, project and/or worse things.
On the subject of EOF: EOF is not a character, and thus cannot be represented in a character variable. To get around this, most programmers simple use an int value (by default signed) that can interpret the -1 from EOF. The reason that EOF can never be a character is because otherwise there would be one character indistinguishable from the end of file indicator.
int versus char.
fgetc() returns an int, not char. The values returned are in the range of unsigned char and EOF. This is typically 257 different values. So saving the result in char, signed char, unsigned char will lose some distinguishably.
Instead save the fgetc() return value in an int. After testing for an EOF result, the value can be saved as a char if needed.
// char c;
int c;
...
while ((c=getchar())!=EOF&&i<50) {
char ch = c;
...
Detail: "Why is EOF defined to be −1 when −1 cannot be represented in a char?" misleads. On systems where char is signed and EOF == -1, a char can have the value of EOF. Yet on such systems, a char can have a value of -1 that represents a character too - they overlap. So a char cannot distinctively represent all char and EOF. Best to use an int to save the return value of fgetc().
... the fgetc function obtains that character as an unsigned char converted to an int and ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, ... and the fgetc function returns EOF. ... C11 §7.21.7.1 2-3
This question already has answers here:
Difference between int and char in getchar/fgetc and putchar/fputc?
(2 answers)
Closed 3 years ago.
Well, i've read some months ago another "well know" C book(in my language), and i never learn't nothing about this. The way that K&R writes 3 chapters in 20 pages it's simply amazing, and of course that i can't expect huge explanations, but that also rises questions.
I have a question about this point 1.5.1
The book says(pag 16):
main(){
int c;// <-- Here is the question
c=getchar();
while (c != EOF){
putchar(c);
c = getchar();
}
}
[...] The type char is specifically meant for storing such character
data, but any integer type can be used. We used int for a subtle but
important reason.
The problem is distinguishing the end of input from
valid data. The solution is that getchar returns a distinctive value
when there is no more input, a value that cannot be cinfused with any
real character. This value is called EOF, for "end of file". We must
declare c to be a type big enought to hold any value that getchar
returns. We can't use char since c must be big enough to hold EOF in
addition to any possible char. Therefore we use int.[...]
After searching google for another explanation:
EOF is a special macro representing End Of File (Linux: use CTRL+d on
the keyboard to create this, Windows command: use CTRL+z (may have to
be at beginning of new line, followed by RETURN)): Often EOF = -1, but
implementation dependent. Must be a value that is not a valid value
for any possible character. For this reason, c is of type int (not
char as one may have expected).
So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way.
I also didn't undestrood how does getchar takes every character i write, and prints everything. Int type is 4bytes long, so it can take 4 characters inside a variable.
But i can put any number of characters, it will read and write everything the same way.
And with char, happens the same...
What does really happens? Where are the values stored when there are more than 1-4 characters?
So i modified source from int to char to see what is the problem,
about taking EOF values... but there is no problem. Works the same way
I happens to work the same way. It all depends on the real type of char, i.e. if it's signed or unsigned. There's also a C FAQ about this very subject. You're more likely to see the bug if your chars are unsigned.
The bug can go undetected for a long time, however, if chars are
signed and if the input is all 7-bit characters.
EDIT
The last question is: char type is one byte long, and int is 4bytes
long. So, char will only take one ascii character. But if i type
"stack overflow is over 1byte long", the output will be "stack
overflow is over 1byte long". Where is "tack overflow is over 1byte
long" stored, and how does putchar, puts an entire string
Each character will be stored by c in turn. So the first time, getchar() will return s, and putchar will send it on its way. Then t will come along and so on. At no point will c store more than one character. So although you feed it a large string, it deals with it by eating one character at a time.
Separating into two answers:
Why int and not char
Short and formal answer: if you want to be able to represent all real characters, and another non-real character (EOF), you can't use a datatype that's designed to hold only real characters.
Answer that can be understood but not entirely accurate: The function getchar() returns the ASCII code of the character it reads, or EOF.
Because -1 casted to char equals 255, we can't distinguish between the 255-character and EOF. That is,
char a = 255;
char b = EOF;
a == b // Evaluates to TRUE
but,
int a = 255;
int b = EOF;
a == b // Evaluates to FALSE
So using char won't allow you to distinguish between a character whose ASCII code is 255 (which could happen when reading from a file), and EOF.
How come you can use putchar() with an int
The function putchar() looks at its parameter, sees a number, and goes to the ASCII table and draws the glyph it sees. When you pass it an int, it is implicitly casted to char. If the number in the int fits in the char, all is good and nobody notices anything.
If you are using char to store the result of getchar(), there are two potential problems, which one you'll meet depend on the signedness of char.
if char is unsigned, c == EOF will never be true and you'll get an infinite loop.
if char is signed, c == EOF will be true when you input some char. Which will depend on the charset used; in locale using ISO8859-1 or CP852 it is 'ÿ' if EOF is -1 (the most common value). Some charset, for instance UTF-8, don't use the value (char)EOF in valid codes, but you rarely can guarantee than your problem will stay on signed char implementation and only be used in non problematic locales.
Is EOF always negative?
I'm thinking of writing a function that reads the next word in the input and returns the line number the word was found in or EOF if the end of the input has been reached. If EOF is not necessarily negative, the function would be incorrect.
EOF is always == EOF. Don't assume anything else.
On a second reading of the standard (and as per some other comments here) it seems EOF is always negative - and for the use specified in this question (line number or EOF) it would work. What I meant to warn against (and still do) is assuming characters are positive and EOF is negative.
Remember that it's possible for a standard conforming C implementation to have negative character values - this is even mentioned in 'The C programming language' (K&R). Printing characters are always positive, but on some architectures (probably all ancient), control characters are negative. The C standard does not specify whether the char type is signed or unsigned, and the only character constant guaranteed to have the same value across platforms, is '\0'.
Yes, EOF is always negative.
The Standard says:
7.19 Input/output
7.19.1 Introduction
3 The macros are [...] EOF which
expands to an integer constant
expression, with type int and a
negative value, that is returned by
several functions to indicate
end-of-file, that is, no more input
from a stream;
Note that there's no problem with "plain" char being signed. The <stdio.h> functions which deal with chars, specifically cast the characters to unsigned char and then to int, so that all valid characters have a positive value. For example:
int fgetc(FILE *stream)
7.19.7.1
... the fgetc function obtains that character as an unsigned char converted to an int ...
Have that function return
the line number the word was found in
or -1 in case the end of the input has been reached
Problem solved, without a need for relying on any EOF values. The caller can easily test for greater-or-equal-to-zero for a successful call, and assume EOF/IO-error otherwise.
From the online draft n1256, 17.9.1.3:
EOF
which expands to an integer constant expression, with type int and a negative value,
that is returned by several functions to indicate end-of-file, that is, no more input
from a stream;
EOF is always negative, though it may not always be -1.
For issues like this, I prefer separating error conditions from data by returning an error code (SUCCESS, END_OF_FILE, READ_ERROR, etc.) as the function's return value, and then writing the data of interest to separate parameters, such as
int getNextWord (FILE *stream, char *buffer, size_t bufferSize, int *lineNumber)
{
if (!fgets(buffer, bufferSize, stream))
{
if (feof(stream)) return END_OF_FILE; else return READ_ERROR;
}
else
{
// figure out the line number
*lineNumber = ...;
}
return SUCCESS;
}
EOF is a condition, rather than a value. The exact value of this sentinel is implementation defined. In a lot of cases, it is a negative number.
From wikipedia :
The actual value of EOF is a
system-dependent negative number,
commonly -1, which is guaranteed to be
unequal to any valid character code.
But no references ...
From Secure Coding : Detect and handle input and output errors
EOF is negative but only when sizeof(int) > sizeof(char).