Why EOF is recognized if it is first character in line?

Why EOF is recognized if it is first character in line? - c

I wrote this C program:
#include <stdio.h>
main()
{
int numspaces, numtabs, numnl, c;
while ((c = getchar()) != EOF){
if (c == ' ') ++numspaces;
if (c == '\t') ++numtabs;
if (c == '\n') ++ numnl;
}
printf("Spaces:\t\t%d\nTabs:\t\t%d\nNew lines:\t\t%d\n", numspaces, numtabs, numnl);
}
I think this while loop must finish when I press Ctrl+D and "return". It does if Ctrl+D is the first thing I type in a line. But if I start a line with other character (letter, space, tab) and then Ctrl+D and then "return" - the loop continues.
I test this in Mac OS X terminal, I see it echoing ^D and it still continues the loop. What am I missing here?

CTRL-D or more precisely the character set to the VEOF entry of the c_cc field of a termios control structure (stty is the easiest way to query and modify those settings from the command line) is to be interpreted as (quote from the Single Unix Specification V4):
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a , and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
Thus to send a EOF indication from a terminal you hit CTRL-D once when there is no bytes waiting to be returned (typically at the start of a line), and twice when there are some.

If you type something, hit the return key, and then hit Ctrl+D, it will stop.
Ctrl+D will end input to the program, but only if there is nothing left in the terminal's buffer.
FYI, EOF is not really a character. getchar() just returns an integer value EOF (typically -1) when there is nothing left to read.
Also, note that this question only applies to terminals. If you piped it in from a file, things would be behave just as you expect.

Ctrl+D terminates the current read() call.
If you already have typed something, this "something" will be read back, and the return value of read() is the length of this "something".
Only if you are at the start of a line, read() will return 0, what is used as an indicator to EOF.

Related

Why should I press ENTER before CTRL+D to indicate an EOF to stdin?

The following is my code, where I use fgetc to get input from stdin. (running the program from a UNIX shell)
What I don't understand is that, when I type some characters from the keyboard, and then I press ctrl + D and then press ENTER, the program does not stop. It seems to me that EOF has already been transmitted to the program, why wouldn't it stop?
I also found that if I press ENTER, then ctrl + D, the program does stop, but why?
#include "stdio.h"
int main()
{
char ch;
int i;
for (i = 0; i < 200; i++)
{
ch = fgetc(stdin);
if (ch == EOF)
break;
}
return 0;
}

Saying that Ctrl-D sends EOF is an educational lie-to-children. What it actually does is make any ongoing read() from the terminal return immediately with the contents of the current line buffer if any.
Synergy happens because the Unix convention is that a read() of zero bytes represents EOF.
This means that if you press Ctrl-D with an empty buffer, the read() will return with zero bytes, and a canonical program will interpret it as end-of-file. This is obviously just an illusion since you're still there to input more on the terminal, and a less canonical program could just keep reading if it wanted to.
If you instead press Ctrl-D after entering some data, then that data is just returned and a canonical program will keep reading to find a linefeed or whatever else it's looking for.
This is why EOF behavior is only triggered in canonical programs when Ctrl-D is pressed either after another Ctrl-D (the first flushes the buffer, the second returns a now-empty buffer) or after an Enter (for the same reason).

K&R Exercise 1.9 - program in infinite loop

I am trying to complete exercise 1-9 in K&R and I came up with this:
#include <stdio.h>
/* this program will trim each group of spaces down to a single space */
int main()
{
int c, lastCharblank;
lastCharblank = 0;
while ((c = getchar()) != EOF) {
if (c != ' ' || !lastCharblank)
putchar(c);
lastCharblank = (c == ' ');
}
putchar('\n');
return 0;
}
While running the program through the bash command line I am able to enter text("fix these spaces") and then I enter cntl-d to signal an EOF. The program returns the line with all the spaces corrected, but it does not exit. It seems to be in an infinite loop. Why doesn't it exit?

This is due to the way read system call is specified in POSIX standard.
http://pubs.opengroup.org/onlinepubs/9699919799/
When EOF is received, all the bytes waiting to be read are immediately
passed to the process without waiting for a newline, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication
getchar in the I/O library is implemented using read system call which is line oriented. That means read() delivers one line at a time to the caller. The I/O library stores the bytes returned by read() in a buffer and in case of getchar, delivers one byte at a time to the process.
The fact that read() is line oriented leads to different behavior of Cntl-D depending upon whether the input is at the beginning of a line or not. According to spec, Cntl-D at the beginning of a line signals EOF immediately.
However, Cntl-D in the middle of a line acts differently. It sends the remaining bytes to the process and EOF is discarded. That is why another Cntl-D is required to signal EOF to the process.

I seems you are using a platform on which CTRL-D is not the right key binding for sending an EOF. E.g. on Windows, it's usually CTRL-Z.
EDIT: And yes, as others have already stated, the EOF needs to be at the beginning of a line, so you need to send a newline first.

C - inputting correct code but receiving no output

When I run the following code and input a sentence I am not given any output. The cursor just goes to a new line.
I copied this straight off the book and double checked it for mistakes (1st edition C programming language by kernighan & ritchie)
#include <stdio.h>
int main()
{
int c,i,nwhite,nother;
int ndigit[10];
nwhite=nother=0;
for(i=0;i<10;++i)
ndigit[i] = 0;
while (( c=getchar()) != EOF)
if(c>= '0' && c<= '9')
++ndigit[c-'0'];
else if (c==' ' || c == '\n' || c == '\t')
++nwhite;
else
++nother;
printf("digits =");
for( i=0; i<10; ++i)
printf("%d",ndigit[i]);
printf(", white space = %d, other = %d\n", nwhite,nother);
return 0;
}

Since you are testing a program copied from another source, I suppose that you don't want to change it, but understanding it.
getchar() obtains exactly 1 character from the standard input, which is a file named stdin in the standard header <stdio.h>.
The standard input, stdin, is considered a file.
Formally speaking, the end-of-file is a "mark" and not a "character".
However, in general, a specific "character" is used to mark the "end-of-file" of a text file.
In Windows the "end-of-file" mark is the character CTRL-Z (whose ASCII code is 26).
In Linxu the mark is CTRL-D (whose ASCII code is 4).
On the other hand, the standard input commonly has the following behaviour:
Wait for user enter characters until an Intro/Enter key is pressed.
If the user does not press Enter, then the standard input does not give back the control to the program. This happens even if you enter an "end-of-file" character (say, CTRL-Z).
However, other behaviours are possible.
For example, in Ubuntu console I obtain that CTRL-D is recognized without waiting for the Enter key be pressed.
In any case, you must explicitely type the end-of-file mark in the console of your system.
So, CTRL-Z (perhaps followed by Enter) or CTRL-D have to be pressed for yourself.
ABOUT ENTER and EOF
After Enter is pressed, your program test for EOF, that is, the "end-of-file" mark in your system.
However, the Enter keyword does not prints "end-of-file" marks, but only "end-of-line" ones, which corresponds to the standard character newline '\n'.
Thus, if it is desired that the while() loop terminates after pressing Intro/Enter, the test must be done against '\n', and not EOF.
AN OBSERVATION
It can be observed that getchar() doesn't retrieve the character CTRL-Z, because the ASCII code for CTRL-Z is 26, but getchar() retrieves a negative value (in general -1).
This means that getchar() recognizes the character ASCII 26 as and end-of-file mark, and then it converts to a value with meaning in C, provided by the macro EOF, which is not 26.
What I mean is that EOF is not CTRL-Z, and then one cannot naively send EOF under the assumption that the ASCII 26 (CTRL-Z) will be sent to a text file.
Summarizing, I think that it is important to delucidate the abstract concept of "end-of-file", the role of EOF, and the difference between a "mark" and a "character".
(Another example: in Windows the "mark" for "end-of-line" is the couple of characters CTRL-M CTRL-J, which is not only 1 character, but 2).
Quoted from the standard:
The getchar function returns the next character from the input stream pointed to by
stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and
getchar returns EOF. If a read error occurs, the error indicator for the stream is set and
getchar returns EOF.

Why do I need to type Ctrl-D twice to mark end-of-file?

char **query;
query = (char**) malloc ( sizeof(char*) );
int f=0;
int i=0,j=0,c;
while((c=getchar())!=EOF)
{
if(!isalpha(c))
continue;
if(f==1)
query=(char**) realloc(query,(i+1)*sizeof(char*));
query[i]=(char*) malloc(sizeof(char));
query[i][j]=c;
j++;
while( (c=getchar())!=EOF&&c!=' '&&c!='\t' )
{
query[i]=(char*) realloc(query[i],(j+1)*sizeof(char));
query[i][j]=c;
++j;
}
query[i][j]='\0';
printf("%s\n",query[i]);
if(c==EOF){
break;
}
++i;
f=1;
j=0;
}
I want the above code snippet to read a line of strings separated by spaces and tabs until ONE EOF but it requires 2 EOFs to end the loop. Also, strings can consist of only alphabetic characters.
I am struggling on about 2 days.
Please, give some feedback.
EDIT: Most probably the reason is I hit CTRL+D keys after I write last string not the enter key, but now I hit enter and then CTRL+D, it works as expected.
But, how can I change it to finish after I hit CTRL+D once following the last string?

On Unix-like systems (at least by default), an end-of-file condition is triggered by typing Ctrl-D at the beginning of a line or by typing Ctrl-D twice if you're not at the beginning of a line.
In the latter case, the last line you read will not have a '\n' at the end of it; you may need to allow for that.
This is specified (rather indirectly) by POSIX / The Open Group Base Specifications Issue 7, in section 11, specifically 11.1.9:
EOF
Special character on input, which is recognized if the ICANON flag is
set. When received, all the bytes waiting to be read are immediately
passed to the process without waiting for a <newline>, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication. If
ICANON is set, the EOF character shall be discarded when processed.
The POSIX read() function indicates an end-of-file (or error) condition to its caller by returning a byte count of zero, indicating that there are no more bytes of data to read. (C's <stdio> is, on POSIX systems, built on top of read() and other POSIX-specific functions.)
EOF (not to be confused with the C EOF macro) is mapped by default to Ctrl-D. Typing the EOF character at the beginning of a line (either at the very beginning of the input or immediately after a newline) triggers an immediate end-of-file condition. Typing the EOF character other than at the beginning of a line causes the previous data on that line to be returned immediately by the next read() call that asks for enough bytes; typing the EOF character again does the same thing, but in that case there are no remaining bytes to be read and an end-of-file condition is triggered. A single EOF character in the middle of a line is discarded (if ICANON is set, which it normally is).

In the off chance that someone sees this that needs the help I've been needing... I had been searching, trying to figure out why I was getting this strange behavior with my while(scanf). Well, it turns out that I had while (scanf("%s\n", string) > 0). The editor I am using (Atom) automatically put a "\n" in my scan, with out me noticing. This took me hours, and luckily someone pointed it out to me.

The Return key doesn't produce EOF that's why the condition getchar() != EOF doesn't recognize it. You can do it by Pressing CTRL+Z in Windows or CTRL+D in Unix.

I'm trying to understand getchar() != EOF

I'm reading The C Programming Language and have understood everything so far.
However when I came across the getchar() and putchar(), I failed to understand what is their use, and more specifically, what the following code does.
main()
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
I understand the main() function, the declaration of the integer c and the while loop. Yet I'm confused about the condition inside of the while loop. What is the input in this C code, and what is the output.

This code can be written more clearly as:
main()
{
int c;
while (1) {
c = getchar(); // Get one character from the input
if (c == EOF) { break; } // Exit the loop if we receive EOF ("end of file")
putchar(c); // Put the character to the output
}
}
The EOF character is received when there is no more input. The name makes more sense in the case where the input is being read from a real file, rather than user input (which is a special case of a file).
[As an aside, generally the main function should be written as int main(void).]

getchar() is a function that reads a character from standard input. EOF is a special character used in C to state that the END OF FILE has been reached.
Usually you will get an EOF character returning from getchar() when your standard input is other than console (i.e., a file).
If you run your program in unix like this:
$ cat somefile | ./your_program
Then your getchar() will return every single character in somefile and EOF as soon as somefile ends.
If you run your program like this:
$ ./your_program
And send a EOF through the console (by hitting CTRL+D in Unix or CTRL+Z in Windows), then getchar() will also returns EOF and the execution will end.

The code written with current C standards should be
#include <stdio.h>
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
The loop could be rewritten as
int c;
while (1) {
c = getchar();
if (c != EOF)
putchar(c);
else
break;
}
this reads as
repeat forever
get the next character ("byte") of input from standard input and store it into c
if no exceptional condition occurred while reading the said character
then output the character stored into c into standard output
else
break the loop
Many programming languages handle exceptional conditions through raising exceptions that break the normal program flow. C does no such thing. Instead, functions that can fail have a return value and any exceptional conditions are signalled by a special return value, which you need to check from the documentation of the given function. In case of getchar, the documentation from the C11 standard says (C11 7.21.7.6p3):
The getchar function returns the next character from the input stream pointed to by stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getchar returns EOF. If a read error occurs, the error indicator for the stream is set and getchar returns EOF.
It is stated elsewhere that EOF is an integer constant that is < 0, and any ordinary return value is >= 0 - the unsigned char zero-extended to an int.
The stream being at end-of-file means that all of the input has been consumed. For standard input it is possible to cause this from keyboard by typing Ctrl+D on Unix/Linux terminals and Ctrl+Z in Windows console windows. Another possibility would be for the program to receive the input from a file or a pipe instead of from keyboard - then end-of-file would be signalled whenever that input were fully consumed, i.e.
cat file | ./myprogram
or
./myprogram < file
As the above fragment says, there are actually two different conditions that can cause getchar to return EOF: either the end-of-file was reached, or an actual error occurred. This cannot be deduced from the return value alone. Instead you must use the functions feof and ferror. feof(stdin) would return a true value if end-of-file was reached on the standard input. ferror(stdin) would return true if an error occurred.
If an actual error occurred, the variable errno defined by <errno.h> would contain the error code; the function perror can be used to automatically display a human readable error message with a prefix. Thus we could expand the example to
#include <stdio.h>
#include <errno.h> // for the definition of errno
#include <stdlib.h> // for exit()
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
if (feof(stdin)) {
printf("end-of-file reached\n");
exit(0);
}
else if (ferror(stdin)) {
printf("An error occurred. errno set to %d\n", errno);
perror("Human readable explanation");
exit(1);
}
else {
printf("This should never happen...\n");
exit('?');
}
}
To trigger the end-of-file, one would use Ctrl+D (here displayed as ^D) on a new line on Linux:
% ./a.out
Hello world
Hello world
^D
end-of-file reached
(notice how the input here is line-buffered, so the input is not interleaved within the line with output).
Likewise, we can get the same effect by using a pipeline.
% echo Hello world | ./a.out
Hello world
end-of-file reached
To trigger an error is a bit more tricky. In bash and zsh shells the standard input can be closed so that it doesn't come from anywhere, by appending <&- to the command line:
% ./a.out <&-
An error occurred. errno set to 9
Human readable explanation: Bad file descriptor
Bad file descriptor, or EBADF means that the standard input - file descriptor number 0 was invalid, as it was not opened at all.
Another fun way to generate an error would be to read the standard input from a directory - this causes errno to be set to EISDIR on Linux:
% ./a.out < /
An error occurred. errno set to 21
Human readable explanation: Is a directory
Actually the return value of putchar should be checked too - it likewise
returns EOF on error, or the character written:
while ((c = getchar()) != EOF) {
if (putchar(c) == EOF) {
perror("putchar failed");
exit(1);
}
}
And now we can test this by redirecting the standard output to /dev/full - however there is a gotcha - since standard output is buffered we need to write enough to cause the buffer to flush right away and not at the end of the program. We get infinite zero bytes from /dev/zero:
% ./a.out < /dev/zero > /dev/full
putchar failed: No space left on device
P.S. it is very important to always use a variable of type int to store the return value of getchar(). Even though it reads a character, using signed/unsigned/plain char is always wrong.

Maybe you got confused by the fact that entering -1 on the command line does not end your program? Because getchar() reads this as two chars, - and 1. In the assignment to c, the character is converted to the ASCII numeric value. This numeric value is stored in some memory location, accessed by c.
Then putchar(c) retrieves this value, looks up the ASCII table and converts back to character, which is printed.
I guess finding the value -1 decimal in the ASCII table is impossible, because the table starts at 0. So getchar() has to account for the different solutions at different platforms. maybe there is a getchar() version for each platform?
I just find it strange that this EOF is not in the regular ascii. It could have been one of the first characters, which are not printable. For instance, End-of-line is in the ASCII.
What happens if you transfer your file from windows to linux? Will the EOF file character be automatically updated?

getchar() function reads a character from the keyboard (ie, stdin)
In the condition inside the given while loop, getchar() is called before each iteration and the received value is assigned to the integer c.
Now, it must be understood that in C, the standard input (stdin) is like a file. ie, the input is buffered. Input will stay in the buffer till it is actually consumed.
stdin is actually the standard input stream.
getchar() returns the the next available value in the input buffer.
The program essentially displays whatever that was read from the keyboard; including white space like \n (newline), space, etc.
ie, the input is the input that the user provides via the keyboard (stdin usually means keyboard).
And the output is whatever we provide as input.
The input that we provide is read character by character & treated as characters even if we give them as numbers.
getchar() will return EOF only if the end of file is reached. The ‘file’ that we are concerned with here is the stdin itself (standard input).
Imagine a file existing where the input that we provide via keyboard is being stored. That’s stdin.
This ‘file’ is like an infinite file. So no EOF.
If we provide more input than that getchar() can handle at a time (before giving it as input by pressing enter), the extra values will still be stored in the input buffer unconsumed.
The getchar() will read the first character from the input, store it in c and printcwithputchar(c)`.
During the next iteration of the while loop, the extra characters given during the previous iteration which are still in stdin are taken during while ((c = getchar()) != EOF) with the c=getchar() part.
Now the same process is repeated till there is nothing left in the input buffer.
This makes it look as if putchar() is returning a string instead of a single character at a time if more than one character is given as input during an iteration.
Eg: if input was
abcdefghijkl
the output would’ve been the same
abcdefghijkl
If you don’t want this behaviour, you can add fflush(stdin); right after the putchar(c);.
This will cause the loop to print only the first character in the input provided during each iteration.
Eg: if input was
adgbad
only a will be printed.
The input is sent to stdin only after you press enter.
putchar() is the opposite of getchar(). It writes the output to the standard output stream (stdout, usually the monitor).
EOF is not a character present in the file. It’s something returned by the function as an error code.
You probably won’t be able to exit from the give while loop normally though. The input buffer will emptied (for displaying to the output) as soon as something comes into it via keyboard and the stdin won't give EOF.
For manually exiting the loop, EOF can be sent using keyboard by pressing
ctrl+D in Linux and
ctrl+Z in Windows
eg:
while ((c = getchar()) != EOF)
{
putchar(c);
fflush(stdin);
}
printf("\nGot past!");
If you press the key combination to give EOF, the message Got past! will be displayed before exiting the program.
If stdin is not already empty, you will have to press this key combination twice. Once to clear this buffer and then to simuate EOF.
EDIT: The extra pair of parenthesis around c = getchar() in while ((c = getchar()) != EOF) is to make sure that the value returned by getchar() is first assigned to c before that value is compared with EOF.
If this extra parenthesis were not there, the expression would effectively have been while (c = (getchar() != EOF) ) which would've meant that c could have either of 2 values: 1 (for true) or 0 (for false) which is obviously not what is intended.

getchar()
gets a character from input.
c = getchar()
The value of this assignment is the value of the left side after the assignment, or the value of the character that's been read. Value of EOF is by default -1.
((c = getchar()) != EOF)
As long as the value stays something other than EOF or, in other words, as long as the condition stays true, the loop will continue to iterate. Once the value becomes EOF the value of the entire condition will be 0 and it will break the loop.
The additional parentheses around c = getchar() are for the compiler, to emphasize that we really wanted to do an assignment inside the condition, because it usually assumes you wanted to type == and warns you.
main() {
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
So the entire code actually echoes back what you input. It assigns the value of the characters to c inside the condition and then outputs it back in the body of the loop, ending only when the end of file is detected.

In a similar manner to the | pipe command above you can use redirection on your system to utilize the above code to display all the character contents of a file, till it reaches the end (EOF) represented by CTRL-Z or CTRL-D usually.
In console:
ProgramName < FileName1.txt
And to create a copy of what is read from FileName1 you can:
ProgramName < FileName1.txt > CopyOfInput.txt
This demonstrates your program in multiple ways to hopefully aid your understanding.
-Hope that helps.

main(){
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
Actually c=getchar() provides the character which user enters on the console and that value is checked with EOF which represents End Of File . EOF is encountered at last of file. (c = getchar()) != EOF is equivalent to c != EOF . Now i think this is much easier . If you any further query let me know.