#include <stdio.h>
int main()
{
char c;
while ((c=getchar()) != EOF)
putchar(c);
}
why when i input text such as for example "omar"put char print "omar" back .Isn't it supposed to print 'o' repeatedly since getchar will take first character only.I know that I am wrong about something probably because i don't know how exactly getchar or putchar works ,so can somebody please explain how they works.Another question why getchar and putchar work normally without a while loop but inside while loop behavior is something different.
why getchar() function work different in a loop?
I take you to be asking why getchar() works differently than you expect, as opposed to differently than in other contexts. If you in fact meant the latter then the answer would be "it doesn't."
But of course reading a character from a stream, whether via getchar() or some other I/O function, removes it from the stream. getchar() would not be very useful if it did not do that. Therefore, if you call it repeatedly, you read (and remove) each character in turn until and unless all available characters are consumed. You can test this by replacing the loop in your program with several getchar() calls in row.
And, of course, your loop does call it repeatedly. The loop-control expression, (c=getchar()) != EOF, is evaluated before each iteration of the loop, and that involves calling getchar() (as opposed to using a value previously returned by that function).
On a completely separate note, do be aware that getchar() returns a result of type int, exactly so that it can return at least one value, EOF, that is outside the range of type unsigned char. If you convert the result to type char then either there is one real input value that you will mistake for EOF, or you will never detect EOF, depending on whether char is signed or unsigned. To reliably and portably detect the end of the file, you must handle that return value as an int, not a char.
Related
I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?
Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)
Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.
Why are islower() and friends required to handle EOF, whereas putchar() and friends don't have to?
Why isn't islower() treating int as unsigned char, as it is the case in putchar()? This would make total sense, because we have to check for EOF first anyway. See also Why the argument type of putchar(), fputc(), and putc() is not char?
because we have to check for EOF first anyway.
We absolutely don't.
int c;
while(isspace(c=fgetc(fp)));
if (c==EOF) ...
This is totally legitimate code to skip whitespaces. Checking each character for EOF separately is a waste of time.
The ctype functions are made to handle EOF specifically to enable code like this.
See also this question.
None of character type functions are required to handle EOF, other than ignoring it (i.e. returning false). In fact, EOF marker is not even mentioned in <ctype.h> header documentation.
The most likely reason for character classification function signatures to use int in place of char, signed or unsigned, is to avoid implementation-defined behavior in loops like this:
int c;
while ((c =getchar()) != EOF) {
if (islower(c)) {
...
} else if (isdigi(c)) {
...
}
}
This would compile and run with islower(char) instead of islower(int), but the result would be implementation defined, which is not desirable under such basic circumstances. Essentially, int in the signature of getchar became "contagious," getting into signatures of functions only marginally related to it.
I have a small program I'm writing to practice programming in C.
I want it to use the getchar(); function to get input from the user.
I use the following function to prompt for user input, then loop using getchar() to store input in an array:
The function is passed a pointer referencing a struct's member.
getInput(p->firstName); //The function is passed an argument like this one
void getInput(char * array)
{
int c;
while((c=getchar()) != '\n')
*array++ = c;
*array = '\0'; //Null terminate
}
This function is called multiple times, as it is a part of a function that creates a structure, and populates it's array members.
However when the program executes, The first two calls to it work fine, but any subsequent calls to this function will cause every-other call to getchar() to not wait for keyboard input.
After some debugging I traced the bug to be that getchar(); was for some reason reading in the '\n' character instead of waiting for input, the while loop test fails, and the function returns essentially an empty string.
I have done some research and keep finding to use
while(getchar() != '\n');
at the end of the function in order to properly flush stdin, however, this produces undesirable results, as the program will prompt again for more input after I type ENTER. Pressing ENTER again continues the program, but every-other subsequent calls continue to read in this mysterious '\n' character right off the bat, causing the test to fail, and resulting in empty strings whenever it comes time to print the contents of the the structure.
Could anyone explain to me what is going on here? Why does getchar() keep fetching a '\n' even though I supposedly cleared the input buffer? I have tried just placing a getchar(); statement at the beginning and end of the function, tried 'do while' loops, and taken other jabs at it, but I can't seem to figure this out.
The code you have written has several drawbacks. I'll try to explain them as it is unclear where your code is failing (probably outside the function you posted)
First of all, you don't check for EOF in getchar() result value. getchar(3) doesn't return a char precisely to allow to return al possible char values plus an extra one, EOF, to mark the end of file (this can be generated from a terminal by input of Ctrl-D in unix, or Ctrl-Z on windows machines) That case must be explicitly contempled in your code, as you'll convert the result to a char and will lose the extra information you received from the function. Read getchar(3) man page to solve this issue.
Second, you don't check for input of enough characters to fill all the array and overflow it. To the function you pass only a pointer to the beginning of the array, but nothing indicates how far it extends, so you can be overfilling past the end of its bounds, just overwritting memory that was not reserved for input purposes. This normally results in something called U.B. in the literature (Undefined Behaviour) and is something you must care of. This can be solved by passing a counter of valid positions to fill in the array and decrementing it for each valid position filled. And not allowing more input once the buffer has filled up.
On other side, you have a standar function that does exactly that, fgets(3) just reads one string array from an input file, and stores it on the pointer (and size) you pass to it:
char *fgets(char *buffer, size_t buffer_size, FILE *file_descriptor);
You can use it as in:
char buffer[80], *line;
...
while (line = fgets(buffer, sizeof buffer, stdin)) {
/* process one full line of input, with the final \n included */
....
}
/* on EOF, fgets(3) returns NULL, so we shall be here after reading the
* full input file */
This question already has answers here:
Difference between int and char in getchar/fgetc and putchar/fputc?
(2 answers)
Closed 3 years ago.
Well, i've read some months ago another "well know" C book(in my language), and i never learn't nothing about this. The way that K&R writes 3 chapters in 20 pages it's simply amazing, and of course that i can't expect huge explanations, but that also rises questions.
I have a question about this point 1.5.1
The book says(pag 16):
main(){
int c;// <-- Here is the question
c=getchar();
while (c != EOF){
putchar(c);
c = getchar();
}
}
[...] The type char is specifically meant for storing such character
data, but any integer type can be used. We used int for a subtle but
important reason.
The problem is distinguishing the end of input from
valid data. The solution is that getchar returns a distinctive value
when there is no more input, a value that cannot be cinfused with any
real character. This value is called EOF, for "end of file". We must
declare c to be a type big enought to hold any value that getchar
returns. We can't use char since c must be big enough to hold EOF in
addition to any possible char. Therefore we use int.[...]
After searching google for another explanation:
EOF is a special macro representing End Of File (Linux: use CTRL+d on
the keyboard to create this, Windows command: use CTRL+z (may have to
be at beginning of new line, followed by RETURN)): Often EOF = -1, but
implementation dependent. Must be a value that is not a valid value
for any possible character. For this reason, c is of type int (not
char as one may have expected).
So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way.
I also didn't undestrood how does getchar takes every character i write, and prints everything. Int type is 4bytes long, so it can take 4 characters inside a variable.
But i can put any number of characters, it will read and write everything the same way.
And with char, happens the same...
What does really happens? Where are the values stored when there are more than 1-4 characters?
So i modified source from int to char to see what is the problem,
about taking EOF values... but there is no problem. Works the same way
I happens to work the same way. It all depends on the real type of char, i.e. if it's signed or unsigned. There's also a C FAQ about this very subject. You're more likely to see the bug if your chars are unsigned.
The bug can go undetected for a long time, however, if chars are
signed and if the input is all 7-bit characters.
EDIT
The last question is: char type is one byte long, and int is 4bytes
long. So, char will only take one ascii character. But if i type
"stack overflow is over 1byte long", the output will be "stack
overflow is over 1byte long". Where is "tack overflow is over 1byte
long" stored, and how does putchar, puts an entire string
Each character will be stored by c in turn. So the first time, getchar() will return s, and putchar will send it on its way. Then t will come along and so on. At no point will c store more than one character. So although you feed it a large string, it deals with it by eating one character at a time.
Separating into two answers:
Why int and not char
Short and formal answer: if you want to be able to represent all real characters, and another non-real character (EOF), you can't use a datatype that's designed to hold only real characters.
Answer that can be understood but not entirely accurate: The function getchar() returns the ASCII code of the character it reads, or EOF.
Because -1 casted to char equals 255, we can't distinguish between the 255-character and EOF. That is,
char a = 255;
char b = EOF;
a == b // Evaluates to TRUE
but,
int a = 255;
int b = EOF;
a == b // Evaluates to FALSE
So using char won't allow you to distinguish between a character whose ASCII code is 255 (which could happen when reading from a file), and EOF.
How come you can use putchar() with an int
The function putchar() looks at its parameter, sees a number, and goes to the ASCII table and draws the glyph it sees. When you pass it an int, it is implicitly casted to char. If the number in the int fits in the char, all is good and nobody notices anything.
If you are using char to store the result of getchar(), there are two potential problems, which one you'll meet depend on the signedness of char.
if char is unsigned, c == EOF will never be true and you'll get an infinite loop.
if char is signed, c == EOF will be true when you input some char. Which will depend on the charset used; in locale using ISO8859-1 or CP852 it is 'ΓΏ' if EOF is -1 (the most common value). Some charset, for instance UTF-8, don't use the value (char)EOF in valid codes, but you rarely can guarantee than your problem will stay on signed char implementation and only be used in non problematic locales.
Is a = getchar() equivalent to scanf("%c",&a);?
Is putchar(a) equivalent to printf("%c",a); where a is a char variable?
Generally speaking yes they are the same.
But they are not in a few nitpicky ways. The function getchar is typed to return int and not char. This is done so that getchar can both all possible char values and additionally error codes.
So while the following happily compiles in most compilers you are essentially truncating away an error message
char c = getchar();
The function scanf, though, allows you to use a char type directly and separates out the error code into the return value.
They do the same thing here. However, if you know you are just doing characters then getchar and putchar will be more efficient, since the printf and scanf variants will have to parse the string each time to determine how to process your request. Plus, they may be called in a lower level library meaning you may not have to have the printf/scanf linked if they are not needed elsewhere.