This question already has answers here:
Difference between int and char in getchar/fgetc and putchar/fputc?
(2 answers)
Closed 3 years ago.
Well, i've read some months ago another "well know" C book(in my language), and i never learn't nothing about this. The way that K&R writes 3 chapters in 20 pages it's simply amazing, and of course that i can't expect huge explanations, but that also rises questions.
I have a question about this point 1.5.1
The book says(pag 16):
main(){
int c;// <-- Here is the question
c=getchar();
while (c != EOF){
putchar(c);
c = getchar();
}
}
[...] The type char is specifically meant for storing such character
data, but any integer type can be used. We used int for a subtle but
important reason.
The problem is distinguishing the end of input from
valid data. The solution is that getchar returns a distinctive value
when there is no more input, a value that cannot be cinfused with any
real character. This value is called EOF, for "end of file". We must
declare c to be a type big enought to hold any value that getchar
returns. We can't use char since c must be big enough to hold EOF in
addition to any possible char. Therefore we use int.[...]
After searching google for another explanation:
EOF is a special macro representing End Of File (Linux: use CTRL+d on
the keyboard to create this, Windows command: use CTRL+z (may have to
be at beginning of new line, followed by RETURN)): Often EOF = -1, but
implementation dependent. Must be a value that is not a valid value
for any possible character. For this reason, c is of type int (not
char as one may have expected).
So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way.
I also didn't undestrood how does getchar takes every character i write, and prints everything. Int type is 4bytes long, so it can take 4 characters inside a variable.
But i can put any number of characters, it will read and write everything the same way.
And with char, happens the same...
What does really happens? Where are the values stored when there are more than 1-4 characters?
So i modified source from int to char to see what is the problem,
about taking EOF values... but there is no problem. Works the same way
I happens to work the same way. It all depends on the real type of char, i.e. if it's signed or unsigned. There's also a C FAQ about this very subject. You're more likely to see the bug if your chars are unsigned.
The bug can go undetected for a long time, however, if chars are
signed and if the input is all 7-bit characters.
EDIT
The last question is: char type is one byte long, and int is 4bytes
long. So, char will only take one ascii character. But if i type
"stack overflow is over 1byte long", the output will be "stack
overflow is over 1byte long". Where is "tack overflow is over 1byte
long" stored, and how does putchar, puts an entire string
Each character will be stored by c in turn. So the first time, getchar() will return s, and putchar will send it on its way. Then t will come along and so on. At no point will c store more than one character. So although you feed it a large string, it deals with it by eating one character at a time.
Separating into two answers:
Why int and not char
Short and formal answer: if you want to be able to represent all real characters, and another non-real character (EOF), you can't use a datatype that's designed to hold only real characters.
Answer that can be understood but not entirely accurate: The function getchar() returns the ASCII code of the character it reads, or EOF.
Because -1 casted to char equals 255, we can't distinguish between the 255-character and EOF. That is,
char a = 255;
char b = EOF;
a == b // Evaluates to TRUE
but,
int a = 255;
int b = EOF;
a == b // Evaluates to FALSE
So using char won't allow you to distinguish between a character whose ASCII code is 255 (which could happen when reading from a file), and EOF.
How come you can use putchar() with an int
The function putchar() looks at its parameter, sees a number, and goes to the ASCII table and draws the glyph it sees. When you pass it an int, it is implicitly casted to char. If the number in the int fits in the char, all is good and nobody notices anything.
If you are using char to store the result of getchar(), there are two potential problems, which one you'll meet depend on the signedness of char.
if char is unsigned, c == EOF will never be true and you'll get an infinite loop.
if char is signed, c == EOF will be true when you input some char. Which will depend on the charset used; in locale using ISO8859-1 or CP852 it is 'ÿ' if EOF is -1 (the most common value). Some charset, for instance UTF-8, don't use the value (char)EOF in valid codes, but you rarely can guarantee than your problem will stay on signed char implementation and only be used in non problematic locales.
Related
I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?
Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)
Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.
I have done the reading from a file and is stored as hex values(say the first value be D4 C3). This is then stored to a buffer of char datatype. But whenever i print the buffer i am getting a value likebuff[0]=ffffffD4; buff[1]=ffffffC3 and so on.
How can I store the actual value to the buffer without any added bytes?
Attaching the snippet along with this
ic= (char *)malloc(1);
temp = ic;
int i=0;
char buff[1000];
while ((c = fgetc(pFile)) != EOF)
{
printf("%x",c);
ic++;
buff[i]=c;
i++;
}
printf("\n\nNo. of bytes written: %d", i);
ic = temp;
int k;
printf("\nBuffer value is : ");
for(k=0;k<i;k++)
{
printf("%x",buff[k]);
}
The problem is a combination of two things:
First is that when you pass a smaller type to a variable argument function like printf it's converted to an int, which might include sign extension.
The second is that the format "%x" you are using expects the corresponding argument to be an unsigned int and treat it as such
If you want to print a hexadecimal character, then use the prefix hh, as in "%hhx".
See e.g. this printf (and family) reference for more information.
Finally, if you only want to treat the data you read as binary data, then you should consider using int8_t (or possibly uint8_t) for the buffer. On any platform with 8-bit char they are the same, but gives more information to the reader of the code (saying "this is binary data and not a string").
By default, char is signed on many platforms (standards doesn't dictate its signedness). When passing to variable argument list, standard expansions like char -> int are invoked. If char is unsigned, 0xd3 remains integer 0xd3. If char is signed, 0xd3 becomes 0xffffffd3 (for 32-bit integer) because this is the same integer value -45.
NB if you weren't aware of this, you should recheck the entire program, because such errors are very subtle. I've dealed once with a tool which properly worked only with forced -funsigned-char into make's CFLAGS. OTOH this flag, if available to you, could be a quick-and-dirty solution to this issue (but I suggest avoiding it for any longer appoaching).
The approach I'm constantly using is passing to printf()-like functions a value not c, but 0xff & c, it's visually easy to understand and stable for multiple versions. You can consider using hh modifier (UPD: as #JoachimPileborg have already suggested) but I'm unsure it's supported in all real C flavors, including MS and embedded ones. (MSDN doesn't list it at all.)
You did store the actual values in the buffer without the added bytes. You're just outputting the signed numbers with more digits. It's like you have "-1" in your buffer but you're outputting it as "-01". The value is the same, it's just you're choosing to sign extend it in the output code.
I'm learning the C programming on a raspberry pi, however I found that my program never catches the EOF successfully. I use char c=0; printf("%d",c-1); to test the char type, finding that the char type ranges from 0 to 255, as an unsigned short. but the EOF defined in stdio.h is (-1). So is the wrong cc package installed on my Pi? how can I fix it? If I changed the EOF value in stdio.h manually, will there be further problems?
what worries me is that ,when I learning from the K&R book, there are examples which use code like while ((c=getchar())!=EOF), I followed that on my Ubuntu machine and it works fine. I just wonder if such kind of syntax is abandoned by modern C practice or there is something conflict in my Raspberry Pi?
here is my code:
#include <stdio.h>
int main( void )
{
char c;
int i=0;
while ((c=getchar())!=EOF&&i<50) {
putchar(c);
i++;
}
if (c==EOF)
printf("\nEOF got.\n");
while ((c=getchar())!=EOF&&i<500) {
printf("%d",c);
i++;
}
}
even when I redirect the input to an file, it keeps printing 255 on the screen, never terminate this program.
Finally I found that I'm wrong,In the K&R book, it defined c as an int, not a char. Problem solved.
You need to store the character read by fgetc(), getchar(), etc. in an int so you can catch the EOF. This is well-known and has always been the case everywhere. EOF must be distinguishable from all proper characters, so it was decided that functions like fgetc() return valid characters as non-negative values (even if char is signed). An end-of-file condition is signalled by -1, which is negative and thus cannot collide with any valid character fgetc() could return.
Do not edit the system headers and especially do not change the value of constants defined there. If you do that, you break these headers. Notice that even if you change the value of EOF in the headers, this won't change the value functions like fgetc() return on end-of-file or error, it just makes EOF have the wrong value.
Why is EOF defined to be −1 when −1 cannot be represented in a char?
Because EOF isn't a character but a state.
If I changed the EOF value in stdio.h manually, will there be further
problems?
Absolutely, since you would be effectively breaking the header entirely. A header is not an actual function, just a set of prototypes and declarations for functions that are defined elsewhere ABSOLUTELY DO NOT change system headers, you will never succeed in doing anything but breaking your code, project and/or worse things.
On the subject of EOF: EOF is not a character, and thus cannot be represented in a character variable. To get around this, most programmers simple use an int value (by default signed) that can interpret the -1 from EOF. The reason that EOF can never be a character is because otherwise there would be one character indistinguishable from the end of file indicator.
int versus char.
fgetc() returns an int, not char. The values returned are in the range of unsigned char and EOF. This is typically 257 different values. So saving the result in char, signed char, unsigned char will lose some distinguishably.
Instead save the fgetc() return value in an int. After testing for an EOF result, the value can be saved as a char if needed.
// char c;
int c;
...
while ((c=getchar())!=EOF&&i<50) {
char ch = c;
...
Detail: "Why is EOF defined to be −1 when −1 cannot be represented in a char?" misleads. On systems where char is signed and EOF == -1, a char can have the value of EOF. Yet on such systems, a char can have a value of -1 that represents a character too - they overlap. So a char cannot distinctively represent all char and EOF. Best to use an int to save the return value of fgetc().
... the fgetc function obtains that character as an unsigned char converted to an int and ...
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, ... and the fgetc function returns EOF. ... C11 §7.21.7.1 2-3
This question already has answers here:
copying the contents of a binary file
(4 answers)
Closed 9 years ago.
The following program is intended to make a copy of one .exe application file.But just one little thing determines whether it indeed gives me a proper copy of the intended file RealPlayer.exe or gives me a corrupted file.
What I do is read from the source file in binary mode and write to the new copy in the same mode.For this I use a variable ch.But if ch is of type char, I get a corrupted file which has a size of few bytes while the original file is 26MB.But if I change the type of ch to int, the program works fine and gives me the exact copy of RealPlayer.exe sized 26MB.So let me ask two questions that arise from this premise.I would appreciate if you can answer both parts:
1) Why does using type char for ch mess things up while int type works?What is wrong with char type?After all, shouldn't it read byte by byte from the original file(as char is one byte itself) and write it byte by byte to the new copy file?After all isn't what the int type does,ie, read 4 bytes from original file and then write that to the copy file?Why the difference between the two?
2) Why is the file so small-sized compared to original file if we use char type for ch?Let's forget for a moment that the copied file is corrupt to begin with and focus on the size.Why is it that the size is so small if we copy character by character (or byte by byte), but is big(original size) when we copy "integer by integer" (or 4-bytes by 4-bytes)?
I was suggested by a friend to simply stop asking questions and use int because it works while char doesn't!!.But I need to understand what's going on here as I see a serious lapse in my understanding in this matter.Your detailed answers are much sought.Thanks.
#include<stdio.h>
#include<stdlib.h>
int main()
{
char ch; //This is the cause of problem
//int ch; //This solves the problem
FILE *fp,*tp;
fp=fopen("D:\\RealPlayer.exe","rb");
tp=fopen("D:\\copy.exe","wb");
if(fp==NULL||tp==NULL)
{
printf("Error opening files");
exit(-1);
}
while((ch=getc(fp))!=EOF)
putc(ch,tp);
fclose(fp);
fclose(tp);
}
The problem is in the termination condition for the loop. In particular, the type of the variable ch, combined with rules for implicit type conversions.
while((ch=getc(fp))!=EOF)
getc() returns int - either a value from 0-255 (i.e. a char) or -1 (EOF).
You stuff the result into a char, then promote it back to int to do the comparison. Unpleasant things happen, such as sign extension.
Let's assume your compiler treats "char" as "signed char" (the standard gives it a choice).
You read a bit pattern of 0xff (255) from your binary file - that's -1, expressed as a char. That gets promoted to int, giving you 0xffffffff, and compared with EOF (also -1, i.e 0xffffffff). They match, and your program thinks it found the end of file, and obediently stops copying. Oops!
One other note - you wrote:
After all isn't what the int type does,ie, read 4 bytes from original
file and then write that to the copy file?
That's incorrect. getc(fp) behaves the same regardless of what you do with the value returned - it reads exactly one byte from the file, if there's one available, and returns that value - as an int.
int getc ( FILE * stream );
Returns the character currently pointed by the internal file position indicator of the specified stream.
On success, the character read is returned (promoted to an int value).If you have already defined ch as int all works fine but if ch is defined as char, returned value from getc() is supressed back to char.
above reasons are causing corruption in data and loss in size.
in c language i am using EOF .....why EOF IS -1 ? why not other value?
From Wikipedia:
The actual value of EOF is system-dependent (but is commonly -1, such as in glibc) and is unequal to any valid character code.
It can't be any value in 0 - 255 because these are valid values for characters on most systems. For example if EOF were 0 then you wouldn't be able to tell the difference between reading a 0 and reaching the end of file.
-1 is the obvious remaining choice.
You may also want to consider using feof instead:
Since EOF is used to report both end of file and random errors, it's often better to use the feof function to check explicitly for end of file and ferror to check for errors.
It isn't. It is defined to be an implementation-defined negative int constant. It must be negative so that one can distinguish it easily from an unsigned char. in most implementations it is indeed -1, but that is not required by the C standard.
The historic reason for choosing -1 was that the character classification functions (see <ctype.h>) can be implemented as simple array lookups. And it is the "nearest" value that doesn't fit into an unsigned char.
[Update:] Making the character classification functions efficient was probably not the main reason for choosing -1 in the first place. I don't know all the historical details, but it is the most obvious decision. It had to be negative since there are machines whose char type didn't have exactly 8 bits, so choosing a positive value would be difficult. It had to be large enough so that it is not a valid value for unsigned char, yet small enough to fit into an int. And when you have to choose a negative number, why should you take an arbitrary large negative number? That leaves -1 as the best choice.
Refer to details at http://en.wikipedia.org/wiki/Getchar#EOF_pitfall and http://en.wikipedia.org/wiki/End-of-file
Easily you can change the EOF value.
In C program define the macro for EOF=-1 in default,
So you mention EOF in your program, that default c compiler assign value for -1;
for example;
Just you try and get the result
#include <stdio.h>
#define EOF 22
main()
{
int a;
a=EOF;
printf(" Value of a=%d\n",a);
}
Output:
Value of a=22
Reason:
That time EOF value is changed
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();
}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Consider this loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();
while(!isspace(c)) {
...
c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace character). A better way to write this would be:
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();
}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral constant with type int.