File copying with getchar() and putchar() - c

I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?

Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)

Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.

Related

why getchar() function work different in a loop?

#include <stdio.h>
int main()
{
char c;
while ((c=getchar()) != EOF)
putchar(c);
}
why when i input text such as for example "omar"put char print "omar" back .Isn't it supposed to print 'o' repeatedly since getchar will take first character only.I know that I am wrong about something probably because i don't know how exactly getchar or putchar works ,so can somebody please explain how they works.Another question why getchar and putchar work normally without a while loop but inside while loop behavior is something different.
why getchar() function work different in a loop?
I take you to be asking why getchar() works differently than you expect, as opposed to differently than in other contexts. If you in fact meant the latter then the answer would be "it doesn't."
But of course reading a character from a stream, whether via getchar() or some other I/O function, removes it from the stream. getchar() would not be very useful if it did not do that. Therefore, if you call it repeatedly, you read (and remove) each character in turn until and unless all available characters are consumed. You can test this by replacing the loop in your program with several getchar() calls in row.
And, of course, your loop does call it repeatedly. The loop-control expression, (c=getchar()) != EOF, is evaluated before each iteration of the loop, and that involves calling getchar() (as opposed to using a value previously returned by that function).
On a completely separate note, do be aware that getchar() returns a result of type int, exactly so that it can return at least one value, EOF, that is outside the range of type unsigned char. If you convert the result to type char then either there is one real input value that you will mistake for EOF, or you will never detect EOF, depending on whether char is signed or unsigned. To reliably and portably detect the end of the file, you must handle that return value as an int, not a char.

Getchar not stopping on comparison

#include <stdio.h>
main() {
long nc;
nc = 0;
while (getchar() != -1)
++nc;
printf("%ld\n", nc);
}
Above is my code from K&R C book for example 1.5.2 but I changed the code so getchar() is checking for -1. However when typing in negative one (-1) the code does not stop reading in getchar(). How do I get the code to stop and why is it not stopping?
The getchar() function reads a single byte from stdin, and returns the value of that byte. For example, on an ASCII based system, the byte value 32 represents a space, the value 33 represents an exclamation point (!), the value 34 represents a double quote ("), and so on. In particular, the characters - and 1 (which make up the string "-1") have the byte values 45 and 48 respectively.
The number -1 does not correspond to any actual character, but rather to the special value EOF (an acronym for end of file) that getchar() will return when there are no more bytes to be read from stdin. (Actually, the EOF value is not guaranteed by the C standard to be equal to -1, although on most systems it is. It is guaranteed to be less than zero, though.)
So your loop, as written, will continue to run until there's no more input to be read. If you're running your code from a terminal, that basically means it will keep running until you type Ctrl+D (on Unixish systems) or Ctrl+Z (on Windows). Alternatively, you could run your program with its input coming from a file (e.g. with my_program < some_file.txt), which would cause the loop to run until it has read the entire file byte by byte.
If you instead want to read a number from stdin, and loop until the number equals -1, you should use scanf() instead.

What is EOF and what is its significance? How can it be noticed? [duplicate]

This question already has answers here:
What is EOF in the C programming language?
(10 answers)
Closed 7 years ago.
While studying getchar() function in C ,I came across this EOF being returned , I want to know how can its existence be noticed, where is it stored?
Can we type EOF character explicitly?
EOF is short for End of File. It's not an actual character, but more like a signal that indicates the end of input stream.
Think about getchar(), it's used to get a character from stdin (the standard input). How could it tell when the stdin stream has come to the end? There must be a special return value which is different from any valid characters. EOF plays this role.
To generate EOF for stdin, type Ctrl + D on Unix systems, or Ctrl + Z on Windows.
EOF is the named constant to apply for End Of File when reading from a stdio input stream. If you look at the getchar() prototype, you'll first notice some strange thing is that it returns not a char value, but an int. Normally, EOF translates in some negative integer value (historically -1) meaning it's impossible to generate that character from the keyboard/input file.
Definitely, EOF constant is not a character, is the int value getchar(3) returns on end of file condition. This is also the reason of getchar(3) returning an int instead of a char value. It is also the reason always EOF maps to a negative value.
getchar(3) returns one of 257 possible values, 0 up to 255 and EOF (which is normally -1). Viewed as integer values, -1 is not the same as 255. It's one of the oldest implemented functions in C (it's there since the first edition of "The C programming language" by K&R)
EOF is the abbr. for End-Of-File. It's the special character for indicating that you have reached the end of the file you're reading a file stream.
Normally, people check whether they have reached to the end of file by:
while(!feof(fileStream)) {
// read one line here or so
...
// do your stuff here.
...
}

K&R C Programming Language 1.5.1 (File Copying) [duplicate]

This question already has answers here:
Difference between int and char in getchar/fgetc and putchar/fputc?
(2 answers)
Closed 3 years ago.
Well, i've read some months ago another "well know" C book(in my language), and i never learn't nothing about this. The way that K&R writes 3 chapters in 20 pages it's simply amazing, and of course that i can't expect huge explanations, but that also rises questions.
I have a question about this point 1.5.1
The book says(pag 16):
main(){
int c;// <-- Here is the question
c=getchar();
while (c != EOF){
putchar(c);
c = getchar();
}
}
[...] The type char is specifically meant for storing such character
data, but any integer type can be used. We used int for a subtle but
important reason.
The problem is distinguishing the end of input from
valid data. The solution is that getchar returns a distinctive value
when there is no more input, a value that cannot be cinfused with any
real character. This value is called EOF, for "end of file". We must
declare c to be a type big enought to hold any value that getchar
returns. We can't use char since c must be big enough to hold EOF in
addition to any possible char. Therefore we use int.[...]
After searching google for another explanation:
EOF is a special macro representing End Of File (Linux: use CTRL+d on
the keyboard to create this, Windows command: use CTRL+z (may have to
be at beginning of new line, followed by RETURN)): Often EOF = -1, but
implementation dependent. Must be a value that is not a valid value
for any possible character. For this reason, c is of type int (not
char as one may have expected).
So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way.
I also didn't undestrood how does getchar takes every character i write, and prints everything. Int type is 4bytes long, so it can take 4 characters inside a variable.
But i can put any number of characters, it will read and write everything the same way.
And with char, happens the same...
What does really happens? Where are the values stored when there are more than 1-4 characters?
So i modified source from int to char to see what is the problem,
about taking EOF values... but there is no problem. Works the same way
I happens to work the same way. It all depends on the real type of char, i.e. if it's signed or unsigned. There's also a C FAQ about this very subject. You're more likely to see the bug if your chars are unsigned.
The bug can go undetected for a long time, however, if chars are
signed and if the input is all 7-bit characters.
EDIT
The last question is: char type is one byte long, and int is 4bytes
long. So, char will only take one ascii character. But if i type
"stack overflow is over 1byte long", the output will be "stack
overflow is over 1byte long". Where is "tack overflow is over 1byte
long" stored, and how does putchar, puts an entire string
Each character will be stored by c in turn. So the first time, getchar() will return s, and putchar will send it on its way. Then t will come along and so on. At no point will c store more than one character. So although you feed it a large string, it deals with it by eating one character at a time.
Separating into two answers:
Why int and not char
Short and formal answer: if you want to be able to represent all real characters, and another non-real character (EOF), you can't use a datatype that's designed to hold only real characters.
Answer that can be understood but not entirely accurate: The function getchar() returns the ASCII code of the character it reads, or EOF.
Because -1 casted to char equals 255, we can't distinguish between the 255-character and EOF. That is,
char a = 255;
char b = EOF;
a == b // Evaluates to TRUE
but,
int a = 255;
int b = EOF;
a == b // Evaluates to FALSE
So using char won't allow you to distinguish between a character whose ASCII code is 255 (which could happen when reading from a file), and EOF.
How come you can use putchar() with an int
The function putchar() looks at its parameter, sees a number, and goes to the ASCII table and draws the glyph it sees. When you pass it an int, it is implicitly casted to char. If the number in the int fits in the char, all is good and nobody notices anything.
If you are using char to store the result of getchar(), there are two potential problems, which one you'll meet depend on the signedness of char.
if char is unsigned, c == EOF will never be true and you'll get an infinite loop.
if char is signed, c == EOF will be true when you input some char. Which will depend on the charset used; in locale using ISO8859-1 or CP852 it is 'ΓΏ' if EOF is -1 (the most common value). Some charset, for instance UTF-8, don't use the value (char)EOF in valid codes, but you rarely can guarantee than your problem will stay on signed char implementation and only be used in non problematic locales.

why in c language EOF IS -1?

in c language i am using EOF .....why EOF IS -1 ? why not other value?
From Wikipedia:
The actual value of EOF is system-dependent (but is commonly -1, such as in glibc) and is unequal to any valid character code.
It can't be any value in 0 - 255 because these are valid values for characters on most systems. For example if EOF were 0 then you wouldn't be able to tell the difference between reading a 0 and reaching the end of file.
-1 is the obvious remaining choice.
You may also want to consider using feof instead:
Since EOF is used to report both end of file and random errors, it's often better to use the feof function to check explicitly for end of file and ferror to check for errors.
It isn't. It is defined to be an implementation-defined negative int constant. It must be negative so that one can distinguish it easily from an unsigned char. in most implementations it is indeed -1, but that is not required by the C standard.
The historic reason for choosing -1 was that the character classification functions (see <ctype.h>) can be implemented as simple array lookups. And it is the "nearest" value that doesn't fit into an unsigned char.
[Update:] Making the character classification functions efficient was probably not the main reason for choosing -1 in the first place. I don't know all the historical details, but it is the most obvious decision. It had to be negative since there are machines whose char type didn't have exactly 8 bits, so choosing a positive value would be difficult. It had to be large enough so that it is not a valid value for unsigned char, yet small enough to fit into an int. And when you have to choose a negative number, why should you take an arbitrary large negative number? That leaves -1 as the best choice.
Refer to details at http://en.wikipedia.org/wiki/Getchar#EOF_pitfall and http://en.wikipedia.org/wiki/End-of-file
Easily you can change the EOF value.
In C program define the macro for EOF=-1 in default,
So you mention EOF in your program, that default c compiler assign value for -1;
for example;
Just you try and get the result
#include <stdio.h>
#define EOF 22
main()
{
int a;
a=EOF;
printf(" Value of a=%d\n",a);
}
Output:
Value of a=22
Reason:
That time EOF value is changed
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();
}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Consider this loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();
while(!isspace(c)) {
...
c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace character). A better way to write this would be:
int c;
c = getchar();
while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();
}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral constant with type int.

Resources