This example is from the K&R book
#include<stdio.h>
main()
{
long nc;
nc = 0;
while(getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
Could you explain me why it works that way. Thanks.
^Z^Z doesn't work either (unless it's in the beginning of a line)
Traditional UNIX interpretation of tty EOF character is to make blocking read return after reading whatever is buffered inside a cooked tty line buffer. In the start of a new line, it means read returning 0 (reading zero bytes), and incidentally, 0-sized read is how the end of file condition on ordinary files is detected.
That's why the first EOF in the middle of a line just forces the beginning of the line to be read, not making C runtime library detect an end of file. Two EOF characters in a row produce 0-sized read, because the second one forces an empty buffer to be read by an application.
$ cat
foo[press ^D]foo <=== after ^D, input printed back before EOL, despite cooked mode. No EOF detected
foo[press ^D]foo[press ^D] <=== after first ^D, input printed back, and on second ^D, cat detects EOF
$ cat
Some first line<CR> <=== input
Some first line <=== the line is read and printed
[press ^D] <=== at line start, ^D forces 0-sized read to happen, cat detects EOF
I assume that your C runtime library imitates the semantics described above (there is no special handling of ^Z at the level of kernel32 calls, let alone system calls, on Windows). That's why it would probably detect EOF after ^Z^Z even in the middle of an input line.
The program will read EOF only at the actual end of the input. If your terminal/OS/whatever only permit files to end at the start of a line then that's where you'll find them. I believe this is a throw-back to old-fashioned terminals where data was only transmitted a line at a time (for all I know it goes back to punched card readers).
Try reading your data from a file that you've preprepared with an EOF mid-line. You may even find that some editors make this difficult! Your program should work fine with that as input.
EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.
Depending on the operating system, EOF character will only work if it's the first character on a line, i.e. the first character after an Enter. Since console input is often line-oriented, the system may also not recognize the EOF character until after you've followed it up with an Enter.
I happened to have the same question as you. When I want to end the function getchar(), I have to enter 2 EOF or enter a <ENTER> plus a EOF.
And here's an easier answer I searched about this question:
If there is characters entering in the terminal, EOF will play the role as stopping this entering, which will arouse a new turn of entering; while, if there is no entering happening, or in another word, when the getchar() is waiting for a new enter(such as you've just finished entering or a EOF), the EOF you are about to enter now equals "end of file", which will lead the program stop executing the function getchar().
PS: the question happens when you are using getchar(). I think this answer is easier to understand, but maybe not for you since it is translated from Chinese...
Related
int main(void){
char cmdline[MAXLINE];
while(1){
printf("> ");
fgets(cmdline, MAXLINE, stdin);
if(feof(stdin)){
exit(0);
}
eval(cmdline);
}
}
This is main part of myShell program that professor gave to me.
But there is one thing I don't understand in code.
There says if(feof(stdin)) exit(0);
What is the end of the standard input?
fgets accept all characters until the enter key is input. The end of a typical "file"(e.g.txt) is intuitively understandable, but what does the end of a standard input mean?
In what practical situations does the feof(stdin) actually return true?
Even if you enter a space without entering anything, the IF statement does not pass.
feof tests the stream’s end-of-file indicator and returns true (non-zero) iff the end-of-file indicator is set.
For regular files, attempting to read past the end of the file sets the end-of-file indicator. For terminals, a typical behavior is that when a program attempts to read from the terminal and gets no data, the end-of-file indicator is set. In Unix systems with default settings, a way to trigger this “no data, end-of-file behavior” is to press control-D at the beginning of a line or immediately after a prior control-D.
The reason this works is because control-D is used to mean “send pending data to the program immediately.” That is described further in this answer.
Thus, if you want to end input for a program, press control-D (and, if not at the beginning of a line, press it a second time).
For input from terminals, while this does cause an end-of-file indication, it does not actually end the input or close the stream. The program can clear the end-of-file indicator and keep reading. Even for regular files, the program could clear the end-of-file indicator, reset the file context to a different position, and continue reading.
The confusion is to assume stdin = terminal. It is not necessarily true.
What stdin is depends on how you run your program.
For example, assuming your executable is named a.out, if you run it like this:
echo "foo" | ./a.out
Stdin is an output of a different process, in this example this process simply outputs the word "foo", so stdin will contain "foo" and then EOF.
Another example is:
./a.out < file.txt
In this case, stdin is "file.txt". When the file is read to the end, stdin gets EOF.
Stdin can also be a special device, for example:
./a.out < /dev/random
In this specific case it is infinite.
Last, when you simply run your program and stdin is terminal - you can generate EOF too - just press CTRL-D, this sends a special symbol meaning EOF to the terminal.
P.S.
There are other ways to execute a process. Here I only gave examples of processes executed from the command line shell. But process can be executed by a different process, not necessarily from the shell. In this case the creator of the process can decide what stdin will be - terminal, pipe, socket, file or any other object.
I cant understand how the following code really works.
int main() {
char ch;
while((ch=getchar())!='\n')
{
printf("test\n");
}
return 0;
}
Lets say we give as an input "aaa". Then we get the word "test" as an output in 3 seperate lines.
Now my question is, for the first letter that we type, 'a', does the program goes inside the while loop and remembers that it has to print something when the '\n' character is entered? Does it store the characters somewhere and then traverses them and executes the body of the while loop? Im lost.
There are many layers between the user writing input into a terminal, and your program receiving that input.
Typically the terminal itself have a buffer, which is flushed and sent to the operating system when the user presses the Enter key (together with a newline from the Enter key itself).
The operating system will have some internal buffers where the input is stored until the application reads it.
Then in your program the getchar function itself reads from stdin which is usually also buffered, and the characters returned by getchar are taken one by one from that stdin buffer.
And as mentioned in a comment to your question, note that getchar returns an int, which is really important if you ever want to compare what it returns against EOF (which is an int constant).
And you really should compare against EOF, otherwise you won't detect if there's an error or the user presses the "end-of-file" key sequence (Ctrl-D on POSIX systems like Linux or macOS, or Ctrl-Z on Windows).
What you see is due to the I/O line buffering.
The getchar() functions doesn't receive any input until you press the enter. This add the \n completing the line.
Only at this point the OS will start to feed characters to the getchar(), that for each input different from \n prints the test message.
Apparently the printout is done together after you press the enter.
You can change this behavior by modifying the buffering mode with the function setvbuf(). Setting the mode as _IONBF you can force the stream as unbuffered, giving back each character as it is pressed on the keyboard (or at least on an *nix system, MS is not so compliant).
For my program, I have a prompt to stdout
>
and then my program reads from stdin. The prompt loops if EOF has not been reached. I have noticed if I enter something, such as:
> bee
When I press CTRL-D once, nothing happens. When I press CTRL-D again, my prompt comes up again. And only when I press it a third time, does my program terminate due to EOF. Does this mean there is a problem in my code? Or is this normal behavior?
Heres a simplified version of my code:
(fopen used)
(print prompt)
while((fgets(tester, 1026, input)) != NULL) {
if(there is a # in tester) {
(print prompt)
continue;
}
}
In a unix terminal, CTRL-D does nothing more or less than immediately send all bytes pending in the terminals input buffer.
Background:
Normally, when you enter stuff into your terminal, that stuff is line buffered, so you can keep editing a line until you are satisfied with it, and then send it to the running process by entering a newline (or CTRL-D, the difference is only that CTRL-D does not add a newline character at the end).
Now, processes detect the end of an input stream by checking whether the read() call returned anything. So, if you press CTRL-D on an empty input buffer, the read() call returns with nothing, and the process thinks "no more bytes coming out of this stream, I'd better not try again". Afaik, there is no other way to check for the end of an input stream, so all programs that recognize EOF on stdin do this, either directly or via the standard C library. The later is what you did when you called fgets().
Your case:
The first CTRL-D simply sends the three characters "bee" to your process. The read() call within your fgets() call returns these three characters, and your fgets() implementation checks for a newline character. As it finds none, and as its own output buffer is not full yet, it immediately proceeds to fetch more characters with another read() call.
The second CTRL-D sends nothing as you have not entered any other characters since your last CTRL-D. The write() call returns with no output, the fgets() sees that it received zero characters and calls it an EOF condition. So it returns the (mostly buffered) string "bee" to you.
Your program may check whether that string contains a # character. But its loop cannot terminate until a fgets() call returns NULL (there is no break statement to leave the loop preliminarily).
The third CTRL-D agains sends zero bytes to your process. This causes the first read() call of the second fgets() call to return zero bytes (the loop is about to be reentered after a successful first iteration). The fgets() implementation sees the empty results, and since it finds that it has not yet received any bytes, it returns NULL. Your loop condition sees the NULL and terminates the loop, which in turn causes your main() to return, exiting the process.
TL;DR:
Yes, this is totally expected behavior, even though it seems rather counter-intuitive. That's UNIX: It's KISS, not necessarily intuitive.
I have looked around the site regarding this K&R example and the answers seem to revolve around 'why is this a type int or what is EOF?' kinda guys. I believe that I understand those.
It's the results that I don't understand. I had expected this code to take a single character, print it and then wait for another character or EOF.
The results that I see are the input waiting until I press return, then everything that I typed shows up and the more waiting for input.
Is the while loop just 'looping' until I end the text stream with the carrage return and then shows what putchar(c) has been hiding somewhere?
The code is:
#include <stdio.h>
/* copy input to output: 1st version */
main()
{
int c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
Now, if I sneak a putchar(c) before on the line just before the while, I sort of get what I expected. I still must enter a text stream and press return. The result is the first character of the stream and the program exits.
Evidently there is a big picture gap for me going on.
Thank you for your help
By default, stdin and stdout are buffered. That means that they save up batches of characters and send them at once for efficiency. Typically, the batch is saved up until there's no more room in the buffer or until there's a newline or EOF in the stream.
When you call getchar(), you're asking from characters from stdin. Supposed you type A, that character is saved in the buffer and then the system waits for more input. If you type B, that character goes into the buffer next. Perhaps after that, you hit Enter, and a newline is put in the buffer. But the newline also interrupts the buffering process, so the original call to getchar() returns the first character in the buffer (A). On the next iteration, you call getchar() again, and it immediately returns the next character in the buffer (B). And so on.
So it's not that your while loop is running until you end the line, it's that the first call to getchar() (when the buffer is empty) is waiting until it has either a full buffer or it has seen a newline.
When you interleave output functions, like putchar(), most C runtime libraries will "flush" stdin when you do something that sends data to stdout (and vice versa). (The intent is to make sure the user sees a prompt before the program waits for input.) That's why you started seeing different behavior when you added the putchar() calls.
You can manually flush a buffer using the flush() function. You can also control the size of the buffer used by the standard streams using setvbuf().
As Han Passant pointed out in the comments, a newline doesn't "terminate the stream." To get an EOF on stdin, you have to type Ctrl+D (or, on some systems, Ctrl+Z). An EOF will also flush the buffer. If you've redirected a file or the output from another program to stdin, the EOF will happen once that input is exhausted.
While it's true that K&R C is very old, and even ANSI C isn't as common today as it was, everything about buffering with stdin and stdout is effectively the same in the current standards and even in C++. I think the only significant change is that the C standards now explicitly call out the desirability of having stdin and stdout cause the other to flush.
I appreciate your answer, and the buffering as you describe is very helpful and interesting.
Evidently, I also must have mis-read/understood, K&R. They define a text stream as ". . . consists of zero or more characters followed by a new line character," which I took to mean the return/enter key; ending it, and then allowing output.
Also, I would like to thank all of you who offered helpful comments.
By the way, I clearly understood that I had to enter ^D to generate EOF, which terminates the program. I appreciate that you are all top level programmers, and thank you for your time. I guess that I will need to find another place to discuss what the text that R&R wrote regarding this exercise is all about.
if (fgets(string, 35, stdin)==NULL)
It only works when I just use ctrl+D.
It will print
"what I want"
and exit.
But, if I type
'abc'
and don't press Enter, then ctrl+D.
It doesn't work, and continue to run again until I use ctrl+D second time.
Can you give me some idea?
This is normal behavior - when you type it after already typing some characters on the line, it allows the process to read the characters you've typed (you'll notice that you can't backspace past this point, and if you do it in e.g. cat the characters you typed may be echoed immediately)
The relevant Unix standard can be found here:
EOF
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a < newline >, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
Basically, ctrl-D in the middle of the line isn't "really" the end of the file, and there's no reason to expect to detect it. If you want to end standard input without a final newline, just press ctrl-D twice.