Clarification on the use of fgetc [duplicate] - c

This question already has answers here:
Difference between int and char in getchar/fgetc and putchar/fputc?
(2 answers)
Closed 5 years ago.
There is the following code from part of a function that I have been given:
char ch;
ch = fgetc(fp);
if (ch == EOF)
return -1;
Where fp is a pointer-to-FILE/stream passed as a parameter to the function.
However, having checked the usage of fgetc(),getc() and getchar(), it seems that they all return type int rather than type char because EOF does not fit in the values 0-255 that are used in a char, and so is usually < 0 (e.g. -1). However, this leads me to ask three questions:
If getchar() returns int, why is char c; c = getchar(); a valid usage of the function? Does C automatically type cast to char in this case, and in the case that getchar() is replaced with getc(fp) or fgetc(fp)?
What would happen in the program when fgetc() or the other two functions return EOF? Would it again try and cast to char like before but then fail? What gets stored in ch, if anything?
If EOF is not actually a character, how is ch == EOF a valid comparison, since EOF cannot be represented by a char variable?

If getchar() returns int, why is char c; c = getchar(); a valid usage of the function?
It's not. Just because you can write and compiler (somehow) allows you to compile it, does not make a code valid.
I believe the above answers all the questions.
Just to add, in case EOF is returned, it cannot be stored in a char. Signedness of a char is implementation defined, thus, as per chapter 6.3.1.3, C11
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.

Related

Using int or char data types and doing exercises from the K&R - "The C programming language" book?

First of all, please excuse me for asking this question because there are already dozens of similar variations of it. However, I am not entirely sure If I can understand it correctly. So, please let me explain what I managed to understand and correct me if I am wrong.
This is an example from the K&R book:
#include <stdio.h>
main()
{
1) int c;
2) 4) while ((c = getchar()) != EOF)
3) putchar(c);
}
So, I understand the above program like this:
1) We declare c as int instead of char because using a char data type won't always operate correctly and it will eventually mess it up. Char data type has variations depending on the system and it could be by default either signed [−127, +127] or unsigned [0, 255]. Also, EOF is not always -1 by default because it depends on the system and the compiler as well.
1.1) So, if we declare c as char and it's by default signed char on the system then it will still work but it will mess up if we enter a character equal to ASCII 128 and above? What will happen? getchar() will return a value of maximum possible ASCII value for the selected data type, 127?
1.2) In the opposite, if we declare c as char and by default is unsigned char on the system then getchar() will always be NOT equal to EOF no matter what because we cannot store a negative value, right?
Because all of the variations above it's correctly to declare c as int to avoid a possible conflict?
2) We type some characters as input c = getchar() grabs this input and convert it to ASCII number after that it checks to make sure it's not equal to EOF.
3) If it is NOT equal to EOF, it displays the input characters as output.
4) It goes back in a state where we must input new characters to continue the loop.
Is all above correct?
[Additional question] Also, the statement getchar() != EOF will output 1 or 0 as value. 1 value will mean that getchar() is NOT equal to EOF. And 0 as value will shows us that getchar() is actually equal to EOF, right?
[Additional question] I saw another question from another user here on Stack Overflow regarding getchar() and char data type, however, I cannot understand the answer of Oliver Charlesworth.
Your program doesn't work fine; it won't be able to distinguish
between EOF and 255.
What that means? Could you explain it to me? Also, I can't understand what this means too:
0 through 7 (# 255) and EOF can be represented as 1111....32
times..... (assuming a 4 byte int)? There > will be no conflict here.
Link to the Oliver Charlesworth's answer.
UPDATE
Thank you all! Regarding this:
0 through 7 (# 255) and EOF can be represented as 1111....32
times..... (assuming a 4 byte int)? There > will be no conflict
here.
If I understood it correctly after all the answers and explanations below. That means EOF with value -1 will be represented as 1111 1111 for example and if the data type is char then it will think it's #255 because it's only 8 bit and it will be stored in the memory exactly as it is 0xFF (#255) with no other indications (in a few words: data is lost and now instead of value -1 it means something entirely different), is that correct? So, to avoid this confusion, we allocate 4 bytes when we declare the c as int to make sure no data will be lost and it will store the EOF value -1 in 32 bits like 32 times...1111 1111 including a sign that it's also a negative value. Do I understand it correctly?
Thanks once again!
The crucial piece of information you are missing is this sentence, from the specification of fgetc (getchar is defined to be equivalent to fgetc(stdin)):
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function [returns] that character as an unsigned char converted to an int. [Otherwise, it returns EOF.]
Emphasis mine. What this means is, on the typical implementation where unsigned char can represent the values 0 through 255 inclusive, getchar will always return values in the range 0 through 255 inclusive, or EOF, even if char is signed.
EOF, meanwhile, is not guaranteed to be −1 (although it almost always is), but it is guaranteed to be negative, and to fit in an int.
So, when you do
int c = getchar();
you can be certain that none of the possible return values collide with each other: c will either be EOF, which is negative, or it will be one of the values representable by unsigned char (0 through 255), which are all nonnegative. If you convert c back to a char after you have checked that it is not EOF, that's safe; the conversion from unsigned char to char is at worst implementation-defined.
On the other hand, when you do any of these
char c = getchar(); // this is wrong
unsigned char d = getchar(); // also wrong
signed char e = getchar(); // also wrong
you lose the ability to distinguish EOF from some byte value that could have been in the file. The signedness of the variable is irrelevant, and so is the actual value of EOF; what matters is that char, unsigned char, and signed char can only represent 2CHAR_BIT different values, all of those could have been in the file, and EOF is one more. It's the pigeonhole principle.
You should be aware that K&R is very old and no longer considered the best book to learn C from. (I don't know what the current best book is.)
1.1 So, if we declare c as char and it's by default signed char on the system then it will still work but it will mess up if we enter a character equal to ASCII 128 and above?
If char is signed, the bit pattern for characters above 128 would be interpreted as negative signed values. The only true mess-up is going to happen when character 255 (nbsp in extended ASCII) is entered, because it would be interpreted as EOF on systems where it is represented by -1.
1.2 In the opposite, if we declare c as char and by default is unsigned char on the system then getchar() will always be NOT equal to EOF no matter what because we cannot store a negative value, right?
That's correct, it would never be equal to EOF. Any bit pattern inside unsigned char would end up in the range from 0..255, inclusive, when promoted to int for comparison with EOF. Hence, the comparison would be false even when getchar() actually returns EOF.
We type some characters as input c = getchar() grabs this input and convert it to ASCII number after that it checks to make sure it's not equal to EOF.
There is no ASCII conversion going on; the character starts as an ASCII character (assuming that the system uses ASCII) or a character in whatever encoding style that your system is using.
If it is NOT equal to EOF, it displays the input characters as output.
It goes back in a state where we must input new characters to continue the loop.
Correct on both 3 and 4.
On every normal system, char is 8 bits. So it takes the values 0-255. Functions like fgetc() need to be able to return 0-255, plus -1 for EOF. So they return an int instead of a char, and we tend to pass single characters about as int rather than as char to handle EOF smoothly.

Why EOF coincides with valid char value? [duplicate]

This question already has answers here:
What is EOF in the C programming language?
(10 answers)
Closed 6 years ago.
As said in comments to the answer to this question: Why gcc does not produce type mismatch warning for int and char?
both -1 and 255 are 0xFF as 8 bit HEX number on any current CPU.
But EOF is equal to -1. This is a contradiction, because the value of EOF must not coincide with any valid 8-bit character. This example demonstrates it:
#include <stdio.h>
int main(void)
{
char c = 255;
if (c == EOF) printf("oops\n");
return 0;
}
On my machine it prints oops.
How this contradiction can be explained?
When you compare an int value to a char value, the char value is promoted to an int value. This promotion is automatic and part of the C language specification (see e.g. this "Usual arithmetic conversions" reference, especially point 4). Sure the compiler could give a warning about it, but why should it if it's a valid language construct?
There's also the problem with the signedness of char which is implementation defined. If char is unsigned, then your condition would be false.
Also if you read just about any reference for functions reading characters from files (for example this one for fgetc and getc) you will see that they return an int and not a char, precisely for the reasons mentioned above.

Difference between storing the value in int and char returned from getchar function in C [duplicate]

This question already has answers here:
While (( c = getc(file)) != EOF) loop won't stop executing
(2 answers)
Closed 7 years ago.
While going through the book by Dennis Ritchie , I found that it is better to storing the value returned by getchar() function in C in integer type variable rather than character type variable. The reason it stated was that character type variable cannot store the value of EOF . While implementing it practically, there was no such difficult in storing the return in char type variable. And What does getchar() function originally returns , the charcter or the ascii value of the character?
EOF is end of file. You won't see the difference until you implement some file read/write operation/code.
The value of EOF is always defined to be -1.
That works well because all ASCII codes are positive, so it can't possibly clash with any real character's representation.
Unfortunately, C has a very strange feature that can cause trouble. It is not defined what the range of possible values for a char variable must be. On some systems it is -128 to +127, which is fine; but on other systems it is 0 to +255, which is fine for normal ASCII values, but not so hot for EOF's -1.
For one thing, the variable to hold getchar()'s return value must be an int. EOF is an out of band return value from getchar(): it is distinct from all possible char values which getchar() can return. (On modern systems, it does not reflect any actual end-of-file character stored in a file; it is a signal that no more characters are available.) getchar()'s return value must be stored in a variable larger than char so that it can hold all possible char values, and EOF.
Two failure modes are possible if, as in the fragment above, getchar()'s return value is assigned to a char.
If type char is signed, and if EOF is defined (as is usual) as -1, the character with the decimal value 255 ('\377' or '\xff' in C) will be sign-extended and will compare equal to EOF, prematurely terminating the input.t
If type char is unsigned, an actual EOF value will be truncated (by having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.
The bug can go undetected for a long time, however, if chars are signed and if the input is all 7-bit characters. (Whether plain char is signed or unsigned is implementation-defined.)
References:
K&R1 Sec. 1.5 p. 14
K&R2 Sec. 1.5.1 p. 16
ISO Sec.
6.1.2.5, Sec. 7.9.1, Sec. 7.9.7.5
H&S Sec. 5.1.3 p. 116, Sec. 15.1, Sec. 15.6
CT&P Sec. 5.1 p. 70 PCS Sec. 11 p. 157
Generally it is best to store getchar()'s result in an int guaranteeing that EOF is properly handled.

Why would a type int variable be used in a string input in C?

I am working through methods of input and output in C, and I have been presented with a segment of code that has an element that I cannot understand. The purport of this code is to show how the 'echoing' and 'buffered' input/outputs work, and in the code, they have a type 'int' declared for, as I understand, characters:
#include <stdio.h>
int main(void){
int ch; //This is what I do not get: why is this type 'int'?
while((ch = getchar()) != '\n'){
putchar(ch);
}
return 0;
}
I'm not on firm footing with type casting as it is, and this 'int' / 'char' discrepancy is undermining all notions that I have regarding data types and compatibility.
getchar() returns an int type because it is designed to be able to return a value that cannot be represented by char to indicate EOF. (C.11 §7.21.1 ¶3 and §7.21.7.6 ¶3)
Your looping code should take into account that getchar() might return EOF:
while((ch = getchar()) != EOF){
if (ch != '\n') putchar(ch);
}
The getc, fgetc and getchar functions return int because they are capable of handling binary data, as well as providing an in-band signal of an error or end-of-data condition.
Except on certain embedded platforms which have an unusual byte size, the type int is capable of representing all of the byte values from 0 to UCHAR_MAX as positive values. In addition, it can represent negative values, such as the value of the constant EOF.
The type unsigned char would only be capable of representing the values 0 to UCHAR_MAX, and so the functions would not be able to use the return value as a way of indicating the inability to read another byte of data. The value EOF is convenient because it can be treated as if it were an input symbol; for instance it can be included in a switch statement which handles various characters.
There is a little bit more to this because in the design of C, values of short and char type (signed or unsigned) undergo promotion when they are evaluated in expressions.
In classic C, before prototypes were introduced, when you pass a char to a function, it's actually an int value which is passed. Concretely:
int func(c)
char c;
{
/* ... */
}
This kind of old style definition does not introduce information about the parameter types.
When we call this as func(c), where c has type char, the expression c is subject to the usual promotion, and becomes a value of type int. This is exactly the type which is expected by the above function definition. A parameter of type char actually passes through as a value of type int. If we write an ISO C prototype declaration for the above function, it has to be, guess what:
int func(int); /* not int func(char) */
Another legacy is that character constants like 'A' actually have type int and not char. It is noteworthy that this changes in C++, because C++ has overloaded functions. Given the overloads:
void f(int);
void f(char);
we want f(3) to call the former, and f('A') to call the latter.
So the point is that the designers of C basically regarded char as being oriented toward representing a compact storage location, and the smallest addressable unit of memory. But as far as data manipulation in the processor was concerned, they were thinking of the values as being word-sized int values: that character processing is essentially data manipulation based on int.
This is one of the low-level facets of C. In machine languages on byte-addressable machines, we usually think of bytes as being units of storage, and when we load the into registers to work with them, they occupy a full register, and so become 32 bit values (or what have you). This is mirrored in the concept of promotion in C.
The return type of getchar() is int. It returns the ASCII code of the character it's just read. This is (and I know someone's gonna correct me on this) the same as the char representation, so you can freely compare them and so on.
Why is it this way? The getchar() function is ancient -- from the very earliest days of K&R C. putchar() similarly takes an int argument, when you'd think it might take a char.
Hope that helps!

Why putchar, toupper, tolower, etc. take a int instead of a char?

In C, strings are arrays of char (char *) and characters are usually stored in char. I noticed that some functions from the libC are taking as argument integers instead of a char.
For instance, let's take the functions toupper() and tolower() that both use int. The man page says:
If c is not an unsigned char value, or EOF, the behavior of these
functions is undefined.
My guess is that with a int, toupper and tolower are able to deal with unsigned char and EOF. But in fact EOF is in practice (is there any rule about its value?) a value that can be stored with a char, and since those functions won't transform EOF into something else, I'm wondering why toupper does not simply take a char as argument.
In any case why do we need to accept something that is not a character (such as EOF)? Could someone provide me a relevant use case?
This is similar with fputc or putchar, that also take a int that is converted into an unsigned char anyway.
I am looking for the precise motivations for that choice. I want to be convinced, I don't want to answer that I don't know if someone ask me one day.
C11 7.4
The header <ctype.h> declares several functions useful for classifying and mapping
characters. In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the
macro EOF. If the argument has any other value, the behavior is
undefined.
C11 7.21.1
EOF
which expands to an integer constant expression, with type int and a
negative value, ...
The C standard explicitly states that EOF is always an int with negative value. And furthermore, the signedness of the default char type is implementation-defined, so it may be unsigned and not able to store a negative value:
C11 6.2.5
If a member of the basic execution character set is stored in a char
object, its value is guaranteed to be nonnegative. If any other
character is stored in a char object, the resulting value is
implementation-defined but shall be within the range of values that
can be represented in that type.
BITD a coding method included:
/* example */
int GetDecimal() {
int sum = 0;
int ch;
while (isdigit(ch = getchar())) { /* isdigit(EOF) return 0 */
sum *= 10;
sum += ch - '0';
}
ungetc(ch, stdin); /* If c is EOF, operation fails and the input stream is unchanged. */
return sum;
}
ch with the value of EOF then could be used in various functions like isalpha() , tolower().
This style caused problems with putchar(EOF) which I suspect did the same as putchar(255).
The method is discouraged today for various reasons. Various models like the following are preferred.
int GetDecimal() {
int ch;
while (((ch = getchar()) != EOF)) && isdigit(ch)) {
...
}
...
}
If c is not an unsigned char value, or EOF, the behavior of these functions is undefined.
But EOF is a negative int in C and some platforms (hi ARM!) have char the same as unsigned char.

Resources