Extra loop when using getchar() and WHAT indeed getchar() behave - c

When I want to figure out what does getchar() actually do, this little piece of loop did confuse me.
int i;
int c;
for (i = 0; i < 100; i++) {
c = getchar();
printf("%d\n", c);
printf("i is %d\n", i);
}
The input and output is:
input: 1
output:
49
i is 0
10
i is 1
input: 12
output:
49
i is 2
50
i is 3
10
i is 4
As I previously supposed, if I enter 1 character, the getchar() should extract it out and putchar() would print it, then the program move to the next loop and wait for my next input. But the results seem to show that the code do not work as I supposed:
What do the output numbers mean?
There is always an extra loop printing 10, what does this 10 mean? If it means EOF, why after replacing c = getchar(); with c = (getchar() != EOF); within the loop, the code always print 1 which, as I supposed, should print a 0 in the last loop?
Thx very much!

Question
What do the output numbers mean?
The output numbers refer to the value of your characters according to your character set, usually based on ASCII (some mainframes use also EDCDIC).
C11 (n1570), § 5.2.1 Character sets
Two sets of characters and their associated collating sequences shall be defined: the set in
which source files are written (the source character set), and the set interpreted in the
execution environment (the execution character set). Each set is further divided into a
basic character set, whose contents are given by this subclause, and a set of zero or more
locale-specific members (which are not members of the basic character set) called
extended characters. The combined set is also called the extended character set. The
values of the members of the execution character set are implementation-defined.
Therefore, through this character encoding, 49 is the character'1' and 50 is the character '2'.
Question
There is always an extra loop printing 10, what does this 10 mean?
With ASCII charset, 10 is the linefeed character '\n'.
When you are typing the character '1' on your keyboard, the standard input stream stdin will receive in fact two characters : '1' and '\n', since you are pressing <Enter> to valide your input.
Therefore, you should clean the standard input stream once you have done your getchar call. One possible way to achieve it is to consume every characters until you reach a newline character or EOF:
#include <stdio.h>
int c;
while ((c = getchar()) != '\n' && c != EOF)
;
On BSD, there is also the function fpurge and, on Solaris and GNU/Linux, __fpurge is available.
Question
If it means EOF, why after replacing c = getchar(); with c = (getchar() != EOF); within the loop, the code always print 1 which, as I supposed, should print a 0 in the last loop?
The value of EOF can't be 10, since EOF must have a negative value.
C11 (n1570), § 7.21.1 Introduction
EOF, which expands to an integer constant expression, with type int and a negative value [...].

What do the output numbers mean?
Character codes of the character getchar() returns, since you get a one and you're printing it using the %d specifier. In this case, it seems your character encoding is ASCII or maybe UTF-8, so 1 stands for 49, 2 for 50, etc.
2: [... too long to quote...]
10 is the ASCII and Unicode char code for newline ('\n'). Since you press Enter (getchar() waits for it!), you will get the character that Enter sent to the terminal.

Related

C: difference in behaviour of scanf and getchar

I wanted to write a function to in C to read characters until a newline is encountered. I wrote the following codes using scanf and getchar:
Code using scanf :
while(scanf("%c",&x)!=EOF&&x!='\n'){....}
Code using getchar : while(((x=getchar())!=EOF)&&x!='\n'){....}
int x is a local variable declared inside the function. The second code stops after reading word (EG: "ADAM\n"), while scanf code does not break the loop and keeps on waiting.
Later I found that after scanf, x's value was (2^7-1)*(2^8) + ascii value of character read ( = 32522 for newline), while character constant '\n' was 10. So the comparison was failing.
My question is that why scanf assigns a value > 32000 to x after reading '\n', while getchar assigns a value 10( which matches with character constant '\n') ?
The key difference here is in the scanf behavior:
1) in general scanf is used to read different data types (not only char), for example scanf("%d",&num) will read integer number and ignore all "space" characters (characters as ' ' (space), '\t' (tab) and '\n' (new line)).
2) scanf("%c",&x) as well as scanf("%d",&num) (if number was entered) will return 1 - number of successfully data read from stdin. Note: scanf("%d",&num) will return 0 if not number is in stdin.
The main difference is that scanf does skip whitespace characters in the input stream, while getchar returns them. So you can't see newlines in the input with scanf. Also the return value of scanf is the number of successful converted variables.
You need to check for scanf(...) == 1 to see, if the variable contains a valid value (more details here). When scanf did not converted all the input variables, the value of the not converted variables is undefined. This is why you see the strange value for x in your case. This is just some (more or less) random value caused by the fact that the compiler assigned x to a memory location, which was used before, and still contains some left over data.

C - inputting correct code but receiving no output

When I run the following code and input a sentence I am not given any output. The cursor just goes to a new line.
I copied this straight off the book and double checked it for mistakes (1st edition C programming language by kernighan & ritchie)
#include <stdio.h>
int main()
{
int c,i,nwhite,nother;
int ndigit[10];
nwhite=nother=0;
for(i=0;i<10;++i)
ndigit[i] = 0;
while (( c=getchar()) != EOF)
if(c>= '0' && c<= '9')
++ndigit[c-'0'];
else if (c==' ' || c == '\n' || c == '\t')
++nwhite;
else
++nother;
printf("digits =");
for( i=0; i<10; ++i)
printf("%d",ndigit[i]);
printf(", white space = %d, other = %d\n", nwhite,nother);
return 0;
}
Since you are testing a program copied from another source, I suppose that you don't want to change it, but understanding it.
getchar() obtains exactly 1 character from the standard input, which is a file named stdin in the standard header <stdio.h>.
The standard input, stdin, is considered a file.
Formally speaking, the end-of-file is a "mark" and not a "character".
However, in general, a specific "character" is used to mark the "end-of-file" of a text file.
In Windows the "end-of-file" mark is the character CTRL-Z (whose ASCII code is 26).
In Linxu the mark is CTRL-D (whose ASCII code is 4).
On the other hand, the standard input commonly has the following behaviour:
Wait for user enter characters until an Intro/Enter key is pressed.
If the user does not press Enter, then the standard input does not give back the control to the program. This happens even if you enter an "end-of-file" character (say, CTRL-Z).
However, other behaviours are possible.
For example, in Ubuntu console I obtain that CTRL-D is recognized without waiting for the Enter key be pressed.
In any case, you must explicitely type the end-of-file mark in the console of your system.
So, CTRL-Z (perhaps followed by Enter) or CTRL-D have to be pressed for yourself.
ABOUT ENTER and EOF
After Enter is pressed, your program test for EOF, that is, the "end-of-file" mark in your system.
However, the Enter keyword does not prints "end-of-file" marks, but only "end-of-line" ones, which corresponds to the standard character newline '\n'.
Thus, if it is desired that the while() loop terminates after pressing Intro/Enter, the test must be done against '\n', and not EOF.
AN OBSERVATION
It can be observed that getchar() doesn't retrieve the character CTRL-Z, because the ASCII code for CTRL-Z is 26, but getchar() retrieves a negative value (in general -1).
This means that getchar() recognizes the character ASCII 26 as and end-of-file mark, and then it converts to a value with meaning in C, provided by the macro EOF, which is not 26.
What I mean is that EOF is not CTRL-Z, and then one cannot naively send EOF under the assumption that the ASCII 26 (CTRL-Z) will be sent to a text file.
Summarizing, I think that it is important to delucidate the abstract concept of "end-of-file", the role of EOF, and the difference between a "mark" and a "character".
(Another example: in Windows the "mark" for "end-of-line" is the couple of characters CTRL-M CTRL-J, which is not only 1 character, but 2).
Quoted from the standard:
The getchar function returns the next character from the input stream pointed to by
stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and
getchar returns EOF. If a read error occurs, the error indicator for the stream is set and
getchar returns EOF.

printf vs putchar - different output

I have this simple code (trying to do an exercise in KandR):-
#include <stdio.h>
int main(){
int c = EOF;
while(c=(getchar() != EOF)){
printf("%d",c);
}
return 0;
}
When i run this and enter any character (a single character), i get the output as 11. If i enter multiple characters for example 'bbb' i get the output as 1111. I understand that i have explicitly added brackets to give precendence to the condition check of getchar() != EOF which should either result in 1 or 0. But i don't understand why am i getting multiple 1's.
Another case is:
#include <stdio.h>
int main(){
int c = EOF;
while(c=(getchar() != EOF)){
putchar(c);
}
return 0;
}
No matter which character i enter, i always get the output as a square box with 1's and 0's in it (shown at the bottom of the screenshot below)
1) In the first case, why is the output printing more than 1 1's?
2) Why isn't the output of case 2 same as case 1?
Until unless you press EOF, (getchar() != EOF) will return true which assigns 1 to c. That's why you are getting output always as 11, first 1 for the character you entered and second 1 is for \n passed to the input buffer on pressing Enter key.
Similarly in case of putchar it prints the character corresponding to the returned value 1 which is non-printable (printable characters start from 32) and you will get some weird output, one for input character and another for \n.
Now change the parentheses in conditional expression to
while( (c=getchar()) != EOF ){...}
Now it will work as it should but will give you two ASCII code in first case (one for \n).
1) In the first case, why is the output printing more than 1 1's?
Because you are looking for an EOF. In order to send your program EOF from the keyboard, press Ctrl+Z
2) Why isn't the output of case 2 same as case 1?
Because %d produces a decimal representation of the character code, while putchar produces the character itself. For example, if you print 'A' using printf's %d format, you would see 65 - ASCII code of the uppercase character A. On the other hand, if you print it using putchar, you would see character A itself.
Demo on ideone.

Representing EOF in C code?

The newline character is represented by "\n" in C code. Is there an equivalent for the end-of-file (EOF) character?
EOF is not a character (in most modern operating systems). It is simply a condition that applies to a file stream when the end of the stream is reached. The confusion arises because a user may signal EOF for console input by typing a special character (e.g Control-D in Unix, Linux, et al), but this character is not seen by the running program, it is caught by the operating system which in turn signals EOF to the process.
Note: in some very old operating systems EOF was a character, e.g. Control-Z in CP/M, but this was a crude hack to avoid the overhead of maintaining actual file lengths in file system directories.
EOF is not a character. It can't be: A (binary) file can contain any character. Assume you have a file with ever-increasing bytes, going 0 1 2 3 ... 255 and once again 0 1 ... 255, for a total of 512 bytes. Whichever one of those 256 possible bytes you deem EOF, the file will be cut short.
That's why getchar() et al. return an int. The range of possible return values are those that a char can have, plus a genuine int value EOF (defined in stdio.h). That's also why converting the return value to a char before checking for EOF will not work.
Note that some protocols have "EOF" "characters." ASCII has "End of Text", "End of Transmission", "End of Transmission Block" and "End of Medium". Other answers have mentioned old OS'es. I myself input ^D on Linux and ^Z on Windows consoles to stop giving programs input. (But files read via pipes can have ^D and ^Z characters anywhere and only signal EOF when they run out of bytes.) C strings are terminated with the '\0' character, but that also means they cannot contain the character '\0'. That's why all C non-string data functions work using a char array (to contain the data) and a size_t (to know where the data ends).
Edit: The C99 standard §7.19.1.3 states:
The macros are [...]
EOF
which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to
indicate end-of-file, that is, no more input from a stream;
No. EOF is not a character, but a state of the filehandle.
While there are there are control characters in the ASCII charset that represents the end of the data, these are not used to signal the end of files in general. For example EOT (^D) which in some cases almost signals the same.
When the standard C library uses signed integer to return characters and uses -1 for end of file, this is actually just the signal to indicate than an error happened. I don't have the C standard available, but to quote SUSv3:
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgetc() shall return EOF. If a read error occurs, the error indicator for the stream shall be set, fgetc() shall return EOF, and shall set errno to indicate the error.
I've read all the comments. It's interesting to notice what happens when you print out this:
printf("\nInteger = %d\n", EOF); //OUTPUT = -1
printf("Decimal = %d\n", EOF); //OUTPUT = -1
printf("Octal = %o\n", EOF); //OUTPUT = 37777777777
printf("Hexadecimal = %x\n", EOF); //OUTPUT = ffffffff
printf("Double and float = %f\n", EOF); //OUTPUT = 0.000000
printf("Long double = %Lf\n", EOF); //OUTPUT = 0.000000
printf("Character = %c\n", EOF); //OUTPUT = nothing
As we can see here, EOF is NOT a character (whatsoever).
The EOF character recognized by the command interpreter on Windows (and MSDOS, and CP/M) is 0x1a (decimal 26, aka Ctrl+Z aka SUB)
It can still be be used today for example to mark the end of a human-readable header in a binary file: if the file begins with "Some description\x1a" the user can dump the file content to the console using the TYPE command and the dump will stop at the EOF character, i.e. print Some description and stop, instead of continuing with the garbage that follows.
This is system dependent but often -1. See here
I think it may vary from system to system but one way of checking would be to just use printf
#include <stdio.h>
int main(void)
{
printf("%d", EOF);
return 0;
}
I did this on Windows and -1 was printed to the console. Hope this helps.
The value of EOF can't be confused with any real character.
If a= getchar(), then we must declare a big enough to hold any value that getchar() returns. We can't use char since a must be big enough to hold EOF in addition to characters.
The answer is NO, but...
You may confused because of the behavior of fgets()
From http://www.cplusplus.com/reference/cstdio/fgets/ :
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
I have been researching a lot about the EOF signal. In the book on Programming in C by Dennis Ritchie it is first encountered while introducing putchar() and getchar() commands.
It basically marks the end of the character string input.
For eg. Let us write a program that seeks two numerical inputs and prints their sum. You'll notice after each numerical input you press Enter to mark the signal that you have completed the iput action. But while working with character strings Enter is read as just another character ['\n': newline character]. To mark the termination of input you enter ^Z(Ctrl + Z on keyboard) in a completely new line and then enter. That signals the next lines of command to get executed.
#include <stdio.h>
int main()
{
char c;
int i = 0;
printf("INPUT:\t");
c = getchar();
while (c != EOF)
{
++i;
c = getchar();
};
printf("NUMBER OF CHARACTERS %d.", i);
return 0;}
The above is the code to count number of characters including '\n'(newline) and '\t'( space) characters. If you don't wanna count the newline characters do this :
#include <stdio.h>
int main()
{
char c;
int i = 0;
printf("INPUT:\t");
c = getchar();
while (c != EOF)
{
if (c != '\n')
{
++i;
}
c = getchar();
};
printf("NUMBER OF CHARACTERS %d.", i);
return 0;}.
NOW THE MAIN THINK HOOW TO GIVE INPUT. IT'S SIMPLE:
Write all the story you want then go in a new line and enter ^Z and then enter again.
There is the constant EOF of type int, found in stdio.h. There is no equivalent character literal specified by any standard.

What is EOF in the C programming language?

How do you get to see the last print? In other words what to put in for EOF? I checked the definitions and it says EOF is -1.
And if you enter Ctrl-D you won't see anything.
#include <stdio.h>
int main() {
int c;
while((c = getchar() != EOF)) {
printf("%d\n", c);
}
printf("%d - at EOF\n", c);
}
On Linux systems and OS X, the character to input to cause an EOF is Ctrl-D. For Windows, it's Ctrl-Z.
Depending on the operating system, this character will only work if it's the first character on a line, i.e. the first character after an Enter. Since console input is often line-oriented, the system may also not recognize the EOF character until after you've followed it up with an Enter.
And yes, if that character is recognized as an EOF, then your program will never see the actual character. Instead, a C program will get a -1 from getchar().
You should change your parenthesis to
while((c = getchar()) != EOF)
Because the "=" operator has a lower precedence than the "!=" operator. Then you will get the expected results. Your expression is equal to
while (c = (getchar()!= EOF))
You are getting the two 1's as output, because you are making the comparison "c!=EOF". This will always become one for the character you entered and then the "\n" that follows by hitting return. Except for the last comparison where c really is EOF it will give you a 0.
EDIT about EOF: EOF is typically -1, but this is not guaranteed by the standard. The standard only defines about EOF in section 7.19.1:
EOF which expands to an integer
constant expression, with type int and
a negative value, that is returned by
several functions to indicate
end-of-file, that is, no more input
from a stream;
It is reasonable to assume that EOF equals -1, but when using EOF you should not test against the specific value, but rather use the macro.
The value of EOF is a negative integer to distinguish it from "char" values that are in the range 0 to 255. It is typically -1, but it could be any other negative number ... according to the POSIX specs, so you should not assume it is -1.
The ^D character is what you type at a console stream on UNIX/Linux to tell it to logically end an input stream. But in other contexts (like when you are reading from a file) it is just another data character. Either way, the ^D character (meaning end of input) never makes it to application code.
As #Bastien says, EOF is also returned if getchar() fails. Strictly speaking, you should call ferror or feof to see whether the EOF represents an error or an end of stream. But in most cases your application will do the same thing in either case.
Couple of typos:
while((c = getchar())!= EOF)
in place of:
while((c = getchar() != EOF))
Also getchar() treats a return key as a valid input, so you need to buffer it too.EOF is a marker to indicate end of input. Generally it is an int with all bits set.
#include <stdio.h>
int main()
{
int c;
while((c = getchar())!= EOF)
{
if( getchar() == EOF )
break;
printf(" %d\n", c);
}
printf("%d %u %x- at EOF\n", c , c, c);
}
prints:
49
50
-1 4294967295 ffffffff- at EOF
for input:
1
2
<ctrl-d>
EOF means end of file. It's a sign that the end of a file is reached, and that there will be no data anymore.
Edit:
I stand corrected. In this case it's not an end of file. As mentioned, it is passed when CTRL+d (linux) or CTRL+z (windows) is passed.
nput from a terminal never really "ends" (unless the device is disconnected), but it is useful to enter more than one "file" into a terminal, so a key sequence is reserved to indicate end of input. In UNIX the translation of the keystroke to EOF is performed by the terminal driver, so a program does not need to distinguish terminals from other input files. By default, the driver converts a Control-D character at the start of a line into an end-of-file indicator. To insert an actual Control-D (ASCII 04) character into the input stream, the user precedes it with a "quote" command character (usually Control-V). AmigaDOS is similar but uses Control-\ instead of Control-D.
In Microsoft's DOS and Windows (and in CP/M and many DEC operating systems), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26. Some MS-DOS programs, including parts of the Microsoft MS-DOS shell (COMMAND.COM) and operating-system utility programs (such as EDLIN), treat a Control-Z in a text file as marking the end of meaningful data, and/or append a Control-Z to the end when writing a text file. This was done for two reasons:
Backward compatibility with CP/M. The CP/M file system only recorded the lengths of files in multiples of 128-byte "records", so by convention a Control-Z character was used to mark the end of meaningful data if it ended in the middle of a record. The MS-DOS filesystem has always recorded the exact byte-length of files, so this was never necessary on MS-DOS.
It allows programs to use the same code to read input from both a terminal and a text file.
#include <stdio.h>
int main() {
int c;
while((c = getchar()) != EOF) { //precedence of != is greater than =, so use braces
printf("%d\n", c);
}
printf("%d - at EOF\n", c);
}
I think this is right way to check value of EOF.
And I checked the output.
For INPUT: abc and Enter I got OUTPUT: 97 98 99 10. ( the ASCII values)
For INPUT Ctrl-D I got OUTPUT: -1 - at EOF.
So I think -1 is the value for EOF.
Try other inputs instead of Ctrl-D, like Ctrl-Z.
I think it varies from compiler to compiler.
to keep it simple: EOF is an integer type with value -1. Therefore, we must use an integer variable to test EOF.
#include <stdio.h>
int main() {
int c;
while((c = getchar()) != EOF) {
putchar(c);
}
printf("%d at EOF\n", c);
}
modified the above code to give more clarity on EOF, Press Ctrl+d and putchar is used to print the char avoid using printf within while loop.
int c;
while((c = getchar())!= 10)
{
if( getchar() == EOF )
break;
printf(" %d\n", c);
}

Resources