K&R Exercise 1.9 - program in infinite loop

K&R Exercise 1.9 - program in infinite loop - c

I am trying to complete exercise 1-9 in K&R and I came up with this:
#include <stdio.h>
/* this program will trim each group of spaces down to a single space */
int main()
{
int c, lastCharblank;
lastCharblank = 0;
while ((c = getchar()) != EOF) {
if (c != ' ' || !lastCharblank)
putchar(c);
lastCharblank = (c == ' ');
}
putchar('\n');
return 0;
}
While running the program through the bash command line I am able to enter text("fix these spaces") and then I enter cntl-d to signal an EOF. The program returns the line with all the spaces corrected, but it does not exit. It seems to be in an infinite loop. Why doesn't it exit?

This is due to the way read system call is specified in POSIX standard.
http://pubs.opengroup.org/onlinepubs/9699919799/
When EOF is received, all the bytes waiting to be read are immediately
passed to the process without waiting for a newline, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication
getchar in the I/O library is implemented using read system call which is line oriented. That means read() delivers one line at a time to the caller. The I/O library stores the bytes returned by read() in a buffer and in case of getchar, delivers one byte at a time to the process.
The fact that read() is line oriented leads to different behavior of Cntl-D depending upon whether the input is at the beginning of a line or not. According to spec, Cntl-D at the beginning of a line signals EOF immediately.
However, Cntl-D in the middle of a line acts differently. It sends the remaining bytes to the process and EOF is discarded. That is why another Cntl-D is required to signal EOF to the process.

I seems you are using a platform on which CTRL-D is not the right key binding for sending an EOF. E.g. on Windows, it's usually CTRL-Z.
EDIT: And yes, as others have already stated, the EOF needs to be at the beginning of a line, so you need to send a newline first.

Related

Why multiple EOF enters to end program?

Trying to understand the behavior of my code. I'm expecting Ctrl-D to lead to the program printing the array and exiting, however it takes 3 presses, and it enters the while loop after the second press.
#include <stdio.h>
#include <stdlib.h>
void unyon(int p, int q);
int connected(int p, int q);
int main(int argc, char *argv[]) {
int c, p, q, i, size, *ptr;
scanf("%d", &size);
ptr = malloc(size * sizeof(int));
while((c = getchar()) != EOF){
scanf("%d", &p);
scanf("%d", &q);
printf("p = %d, q = %d\n", p, q);
}
for(i = 0; i < size; ++i)
printf("%d\n", *ptr + i);
free(ptr);
return 0;
}
I read the post here, but I don't quite understand it.
How to end scanf by entering only one EOF
After reading that, I'm expecting the first Ctrl-D to clear the buffer, and then I'm expecting c = getchar() to pick up the second Ctrl-D and jump out. Instead the second Ctrl-D enters the loop and prints p and q, and it takes a third Ctrl-D to drop out.
This is made more confusing by the fact that the code below drops out on the first Ctrl-D-
#include <stdio.h>
main() {
int c, nl;
nl = 0;
while((c = getchar()) != EOF)
if (c == '\n')
++nl;
printf("%d\n", nl);
}

Let's just strip the program down to the calls which do input:
scanf("%d", &size); // Statement 1
while((c = getchar()) != EOF){ // 2
scanf("%d", &p); // 3
scanf("%d", &q); // 4
}
That is definitely not the way to go; we'll get to the correct usage in a bit. For now, let's just analyze what happens. It's important to understand precisely how scanf works. The %d format code causes it to first skip over any whitespace characters, and then read characters as long as the characters can be made into a decimal integer. Eventually some character will be read which is not part of a decimal integer; most likely a newline character. Because the format string is now finished, the unused character which has just been read will be reinserted into the stream.
So when the call to getchar is made, getchar will read and return the newline character which terminated the integer. Inside the loop, there are then two calls to scanf("%d"), each of which will behave as indicated above: skip whitespace if any, read a decimal integer, and reinsert the unused character back into the input stream.
Now, let's suppose that you run the program, and enter the number 42 followed by the enter key, and then Ctrl-D to close the input stream.
The 42 will be read by statement 1, and (as mentioned above) the newline will be read by statement 2. So when statement 3 is executed, there is no more data to be read. Because end-of-file is signaled before any digit is read, scanf will return EOF. However, the code does not test the return value of scanf; it goes on to statement 4.
What should happen at this point is that the scanf in statement 4 should immediately return EOF without attempting to read more input. That's what the C standard says should happen, and it is what Posix says should happen. Once end-of-file has been signaled on a stream, any input request should immediately return EOF until the end-of-file indicator is manually cleared. (See below for standards quotes.)
But glibc, for reasons we won't go into just yet, does not conform to the standard. It attempts another read. And so the user must enter another Ctrl-D, which will cause the scanf at statement 4 to return EOF. Again, the code does not check the return code, so it continues with the while loop and calls getchar again at statement 2. Because of the same bug, getchar does not immediately return EOF, but instead attempts to read a character from the terminal. So the user must now type a third Ctrl-D to cause getchar to return EOF. Finally, the code checks a return code, and the while loop terminates.
So that is the explanation of what is happening. Now, it is easy to see at least one mistake in the code: the return value of scanf is never checked. Not only does this mean that EOF is missed, it also means that input errors are ignored. (scanf would have returned 0 if the input could not be parsed as an integer.) That's serious, because if scanf cannot succesfully match the format code, the value of the corresponding argument is undefined and must not be used.
In short: Always check return values from *scanf. (And other I/O library functions.)
But there is a more subtle mistake as well, which makes little difference in this case but could, in general, be serious. The character read by getchar in statement 2 is simply discarded, regardless of what it was. Normally it will be whitespace, so it doesn't matter that it is discarded, but you don't actually know that because the character is discarded. Maybe it was a comma. Maybe it was a letter. Maybe it matters what it was.
It is bad style to rely on the assumption that whatever character is read by the getchar at statement 2 is unimportant. If you really need to peek at the next character, you should reinsert it into the input stream, just as scanf does:
while ((c = getchar()) != EOF) {
ungetc(c, stdin); /* Put c back into the input stream */
...
}
But actually, that test is not what you want at all. As we have already seen, it is extremely unlikely that getchar will return EOF at this point. (It's possible, but it's very unlikely). Much more more probable is that getchar will read a newline character, even though the next scanf will encounter the end-of-file. So there was absolutely no point peeking at the next character; the correct solution is to check the return code of scanf, as indicated above.
Putting that together, what you really want here is something more like:
/* No reason to use two scanf calls to read two consecutive numbers */
while ((count = scanf("%d%d", &p, &q)) == 2) {
/* Do something with p and q */
}
if (count != EOF) {
/* Invalid format. Issue an error message, at least */
}
/* Do whatever needs to be done at the end of input. */
Finally, let's examine glibc's behaviour. There is a very long-standing bug report linked to by an answer to the question cited in the OP. If you take the trouble to read through to the most recent post in the bugzilla thread, you'll find a link to a discussion on the glibc developer mailing list.
Let me give the TL;DR version, and save you the trouble of digital archaeology. Since C99, the standard has been clear that EOF is "sticky". §7.21.3/11 states that all input is performed as though successive bytes were read by fgetc:
...The byte input functions read characters from the stream as if by successive calls to the fgetc function.
And §7.21.7.1/3 states that fgetc returns EOF immediately if the stream's end-of-file indicator is set:
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
So once the end-of-file indicator is set, because either end of file was detected or some read error occurred, subsequent input operations must immediately return EOF without attempting to read from the stream. Various things can clear the end-of-file indicator, including clearerr, seek, and ungetc; once the end-of-file indicator has been cleared, the next input function call will again attempt to read from the stream.
However, it wasn't always like that. Before C99, the result of reading from a stream which had already returned EOF was unspecified. And different standard libraries chose to handle it in different ways.
So a decision was made to not change glibc to conform to the (then) new standard, but rather to maintain compatibility with certain other C libraries, notably Solaris. (A comment in the glibc source is quoted in the bug report.)
Although there is a compelling argument (at least, compelling to me) that fixing the bug is not likely to break anything important, there is still a certain reluctance to do anything about it. And so, here we are, ten years later, with a still-open bug report, and a non-conforming implementation.

If you run it through the debugger you will get a clearer picture. Here is the sequence of events.
scanf("%d", &size); is called.
A number is input followed by ENTER. The key here is that scanf does not consume the \n that results from the ENTER.
getchar is called. This consumes the \n.
scanf("%d", &p); is called. This consumes the first ctrl-D. If the return value were checked then it would be apparent that an error occured.
scanf("%d", &q); is called. This consumes the second ctrl-D.
Loop goes back to the top and calls getchar. The third ctrl-D then causes EOF to be returned by getchar and hence the loop breaks out at that point.
I'll leave it as an exercise for you to explain why the second program functions as expected.

There are different things messing here.
First of all, when you type Ctrl-D to the input terminal, the tty driver is processing your input, adding each character in a buffer and processing special characters. One of these special characters (Ctrl-D) means take up to the last char and make them all available to the system. This makes two things to happen: first, the Ctrl-D character is eliminated from the data stream and; second, all the characters typed up so far are made available to be read(2) by the process syscall. getchar() is a buffered library call that avoids making one read per character, allowing to store previously read characters in the buffer.
Other thing messing here is the way the system signals the end of file in posix systems (and all unix systems). When you make a read(2) system call, the return value is the actual number of characters read (or -1 in case of failure, but this has nothing to do with EOF, as will be explained soon). And the system marks the end of file condition by returning 0 characters. So, the operating system marks the end of file making read(2) return 0 bytes as a result (if you only hit the return key, that will make a \n to appear in the data stream).
The third thing messing up here is the type of return value from getchar(3) function. It doesn't return a char value. As all possible byte values are posible to be returned for getchar(3), there's no possibility to reserve a special value for signalling a EOF. The solution adopted a long, long, time ago (when getchar(3) was designed, that is in the first version of the C language, (see The C programming language by Brian Kernighan and Denis Ritchie, first ed.) was to use an int as return value to be able to return all the possible byte values (0..255) plus one extra value, called EOF. The exact value of EOF is implementation dependant, but normally defined as -1 (I think even the standard specifies now it must be defined as -1, but not sure)
So, making all things work together, EOF is an int constant defined to allow programers to write while ((c = getchar()) != EOF). You will never get -1 as a data value from the terminal. The system always marks the end of file condition by making read(2) to return 0. And the terminal driver on receiving Ctrl-D just eliminates it from the stream and makes data up to, but not including (as different from Ctrl-J or Ctrl-M, line feed and carry return, respectivelly, that are also interpreted and are input as \n in the data stream)
So, next the question is: Why there are needed normally two (or more) Ctrl-D chars to signal eof?
Right, as I've said, one only makes all thata up to the Ctrl-D (but not including it) available to the kernel, so the result from read(2) can be a number different than 0 for the first time. But what is sure is that if you enter the Ctrl-D char twice in sequence, after the first there were not be more chars in between the two chars, assuring a read() of zero chars. Normally, programs are in a loop, doing multiple reads
while ((n_read = read(fd, buffer, sizeof buffer)) > 0) {
/* NORMAL INPUT PROCESSING GOES HERE, for up to n_read bytes
* stored in buffer */
} /* while */
if (n_read < 0) {
/* ERROR PROCESSING GOES HERE */
} else {
/* EOF PROCESSING GOES HERE */
} /* if */
In the case of files, the behaviour is different, as Ctrl-D is not interpreted by any driver (it's stored in the disk file) so you'll get Ctrl-D as a normal character (it's value is \004)
When you read a file, normally this deals to reading a lot of complete buffers, then make a partial read (with less than the buffer size bytes input) and a final read of zero bytes, signalling that the file has ended.
Note
Depending on the configuration of the tty driver in some unices, the eof character can be changed and have different mean. Also happens to the return character and linefeed character. Se termios(3) manual page for a detailed documentation on this.

Why do I need to type Ctrl-D twice to mark end-of-file?

char **query;
query = (char**) malloc ( sizeof(char*) );
int f=0;
int i=0,j=0,c;
while((c=getchar())!=EOF)
{
if(!isalpha(c))
continue;
if(f==1)
query=(char**) realloc(query,(i+1)*sizeof(char*));
query[i]=(char*) malloc(sizeof(char));
query[i][j]=c;
j++;
while( (c=getchar())!=EOF&&c!=' '&&c!='\t' )
{
query[i]=(char*) realloc(query[i],(j+1)*sizeof(char));
query[i][j]=c;
++j;
}
query[i][j]='\0';
printf("%s\n",query[i]);
if(c==EOF){
break;
}
++i;
f=1;
j=0;
}
I want the above code snippet to read a line of strings separated by spaces and tabs until ONE EOF but it requires 2 EOFs to end the loop. Also, strings can consist of only alphabetic characters.
I am struggling on about 2 days.
Please, give some feedback.
EDIT: Most probably the reason is I hit CTRL+D keys after I write last string not the enter key, but now I hit enter and then CTRL+D, it works as expected.
But, how can I change it to finish after I hit CTRL+D once following the last string?

On Unix-like systems (at least by default), an end-of-file condition is triggered by typing Ctrl-D at the beginning of a line or by typing Ctrl-D twice if you're not at the beginning of a line.
In the latter case, the last line you read will not have a '\n' at the end of it; you may need to allow for that.
This is specified (rather indirectly) by POSIX / The Open Group Base Specifications Issue 7, in section 11, specifically 11.1.9:
EOF
Special character on input, which is recognized if the ICANON flag is
set. When received, all the bytes waiting to be read are immediately
passed to the process without waiting for a <newline>, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication. If
ICANON is set, the EOF character shall be discarded when processed.
The POSIX read() function indicates an end-of-file (or error) condition to its caller by returning a byte count of zero, indicating that there are no more bytes of data to read. (C's <stdio> is, on POSIX systems, built on top of read() and other POSIX-specific functions.)
EOF (not to be confused with the C EOF macro) is mapped by default to Ctrl-D. Typing the EOF character at the beginning of a line (either at the very beginning of the input or immediately after a newline) triggers an immediate end-of-file condition. Typing the EOF character other than at the beginning of a line causes the previous data on that line to be returned immediately by the next read() call that asks for enough bytes; typing the EOF character again does the same thing, but in that case there are no remaining bytes to be read and an end-of-file condition is triggered. A single EOF character in the middle of a line is discarded (if ICANON is set, which it normally is).

In the off chance that someone sees this that needs the help I've been needing... I had been searching, trying to figure out why I was getting this strange behavior with my while(scanf). Well, it turns out that I had while (scanf("%s\n", string) > 0). The editor I am using (Atom) automatically put a "\n" in my scan, with out me noticing. This took me hours, and luckily someone pointed it out to me.

The Return key doesn't produce EOF that's why the condition getchar() != EOF doesn't recognize it. You can do it by Pressing CTRL+Z in Windows or CTRL+D in Unix.

Why EOF is recognized if it is first character in line?

I wrote this C program:
#include <stdio.h>
main()
{
int numspaces, numtabs, numnl, c;
while ((c = getchar()) != EOF){
if (c == ' ') ++numspaces;
if (c == '\t') ++numtabs;
if (c == '\n') ++ numnl;
}
printf("Spaces:\t\t%d\nTabs:\t\t%d\nNew lines:\t\t%d\n", numspaces, numtabs, numnl);
}
I think this while loop must finish when I press Ctrl+D and "return". It does if Ctrl+D is the first thing I type in a line. But if I start a line with other character (letter, space, tab) and then Ctrl+D and then "return" - the loop continues.
I test this in Mac OS X terminal, I see it echoing ^D and it still continues the loop. What am I missing here?

CTRL-D or more precisely the character set to the VEOF entry of the c_cc field of a termios control structure (stty is the easiest way to query and modify those settings from the command line) is to be interpreted as (quote from the Single Unix Specification V4):
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a , and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
Thus to send a EOF indication from a terminal you hit CTRL-D once when there is no bytes waiting to be returned (typically at the start of a line), and twice when there are some.

If you type something, hit the return key, and then hit Ctrl+D, it will stop.
Ctrl+D will end input to the program, but only if there is nothing left in the terminal's buffer.
FYI, EOF is not really a character. getchar() just returns an integer value EOF (typically -1) when there is nothing left to read.
Also, note that this question only applies to terminals. If you piped it in from a file, things would be behave just as you expect.

Ctrl+D terminates the current read() call.
If you already have typed something, this "something" will be read back, and the return value of read() is the length of this "something".
Only if you are at the start of a line, read() will return 0, what is used as an indicator to EOF.

I'm trying to understand getchar() != EOF

I'm reading The C Programming Language and have understood everything so far.
However when I came across the getchar() and putchar(), I failed to understand what is their use, and more specifically, what the following code does.
main()
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
I understand the main() function, the declaration of the integer c and the while loop. Yet I'm confused about the condition inside of the while loop. What is the input in this C code, and what is the output.

This code can be written more clearly as:
main()
{
int c;
while (1) {
c = getchar(); // Get one character from the input
if (c == EOF) { break; } // Exit the loop if we receive EOF ("end of file")
putchar(c); // Put the character to the output
}
}
The EOF character is received when there is no more input. The name makes more sense in the case where the input is being read from a real file, rather than user input (which is a special case of a file).
[As an aside, generally the main function should be written as int main(void).]

getchar() is a function that reads a character from standard input. EOF is a special character used in C to state that the END OF FILE has been reached.
Usually you will get an EOF character returning from getchar() when your standard input is other than console (i.e., a file).
If you run your program in unix like this:
$ cat somefile | ./your_program
Then your getchar() will return every single character in somefile and EOF as soon as somefile ends.
If you run your program like this:
$ ./your_program
And send a EOF through the console (by hitting CTRL+D in Unix or CTRL+Z in Windows), then getchar() will also returns EOF and the execution will end.

The code written with current C standards should be
#include <stdio.h>
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
The loop could be rewritten as
int c;
while (1) {
c = getchar();
if (c != EOF)
putchar(c);
else
break;
}
this reads as
repeat forever
get the next character ("byte") of input from standard input and store it into c
if no exceptional condition occurred while reading the said character
then output the character stored into c into standard output
else
break the loop
Many programming languages handle exceptional conditions through raising exceptions that break the normal program flow. C does no such thing. Instead, functions that can fail have a return value and any exceptional conditions are signalled by a special return value, which you need to check from the documentation of the given function. In case of getchar, the documentation from the C11 standard says (C11 7.21.7.6p3):
The getchar function returns the next character from the input stream pointed to by stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getchar returns EOF. If a read error occurs, the error indicator for the stream is set and getchar returns EOF.
It is stated elsewhere that EOF is an integer constant that is < 0, and any ordinary return value is >= 0 - the unsigned char zero-extended to an int.
The stream being at end-of-file means that all of the input has been consumed. For standard input it is possible to cause this from keyboard by typing Ctrl+D on Unix/Linux terminals and Ctrl+Z in Windows console windows. Another possibility would be for the program to receive the input from a file or a pipe instead of from keyboard - then end-of-file would be signalled whenever that input were fully consumed, i.e.
cat file | ./myprogram
or
./myprogram < file
As the above fragment says, there are actually two different conditions that can cause getchar to return EOF: either the end-of-file was reached, or an actual error occurred. This cannot be deduced from the return value alone. Instead you must use the functions feof and ferror. feof(stdin) would return a true value if end-of-file was reached on the standard input. ferror(stdin) would return true if an error occurred.
If an actual error occurred, the variable errno defined by <errno.h> would contain the error code; the function perror can be used to automatically display a human readable error message with a prefix. Thus we could expand the example to
#include <stdio.h>
#include <errno.h> // for the definition of errno
#include <stdlib.h> // for exit()
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
if (feof(stdin)) {
printf("end-of-file reached\n");
exit(0);
}
else if (ferror(stdin)) {
printf("An error occurred. errno set to %d\n", errno);
perror("Human readable explanation");
exit(1);
}
else {
printf("This should never happen...\n");
exit('?');
}
}
To trigger the end-of-file, one would use Ctrl+D (here displayed as ^D) on a new line on Linux:
% ./a.out
Hello world
Hello world
^D
end-of-file reached
(notice how the input here is line-buffered, so the input is not interleaved within the line with output).
Likewise, we can get the same effect by using a pipeline.
% echo Hello world | ./a.out
Hello world
end-of-file reached
To trigger an error is a bit more tricky. In bash and zsh shells the standard input can be closed so that it doesn't come from anywhere, by appending <&- to the command line:
% ./a.out <&-
An error occurred. errno set to 9
Human readable explanation: Bad file descriptor
Bad file descriptor, or EBADF means that the standard input - file descriptor number 0 was invalid, as it was not opened at all.
Another fun way to generate an error would be to read the standard input from a directory - this causes errno to be set to EISDIR on Linux:
% ./a.out < /
An error occurred. errno set to 21
Human readable explanation: Is a directory
Actually the return value of putchar should be checked too - it likewise
returns EOF on error, or the character written:
while ((c = getchar()) != EOF) {
if (putchar(c) == EOF) {
perror("putchar failed");
exit(1);
}
}
And now we can test this by redirecting the standard output to /dev/full - however there is a gotcha - since standard output is buffered we need to write enough to cause the buffer to flush right away and not at the end of the program. We get infinite zero bytes from /dev/zero:
% ./a.out < /dev/zero > /dev/full
putchar failed: No space left on device
P.S. it is very important to always use a variable of type int to store the return value of getchar(). Even though it reads a character, using signed/unsigned/plain char is always wrong.

Maybe you got confused by the fact that entering -1 on the command line does not end your program? Because getchar() reads this as two chars, - and 1. In the assignment to c, the character is converted to the ASCII numeric value. This numeric value is stored in some memory location, accessed by c.
Then putchar(c) retrieves this value, looks up the ASCII table and converts back to character, which is printed.
I guess finding the value -1 decimal in the ASCII table is impossible, because the table starts at 0. So getchar() has to account for the different solutions at different platforms. maybe there is a getchar() version for each platform?
I just find it strange that this EOF is not in the regular ascii. It could have been one of the first characters, which are not printable. For instance, End-of-line is in the ASCII.
What happens if you transfer your file from windows to linux? Will the EOF file character be automatically updated?

getchar() function reads a character from the keyboard (ie, stdin)
In the condition inside the given while loop, getchar() is called before each iteration and the received value is assigned to the integer c.
Now, it must be understood that in C, the standard input (stdin) is like a file. ie, the input is buffered. Input will stay in the buffer till it is actually consumed.
stdin is actually the standard input stream.
getchar() returns the the next available value in the input buffer.
The program essentially displays whatever that was read from the keyboard; including white space like \n (newline), space, etc.
ie, the input is the input that the user provides via the keyboard (stdin usually means keyboard).
And the output is whatever we provide as input.
The input that we provide is read character by character & treated as characters even if we give them as numbers.
getchar() will return EOF only if the end of file is reached. The ‘file’ that we are concerned with here is the stdin itself (standard input).
Imagine a file existing where the input that we provide via keyboard is being stored. That’s stdin.
This ‘file’ is like an infinite file. So no EOF.
If we provide more input than that getchar() can handle at a time (before giving it as input by pressing enter), the extra values will still be stored in the input buffer unconsumed.
The getchar() will read the first character from the input, store it in c and printcwithputchar(c)`.
During the next iteration of the while loop, the extra characters given during the previous iteration which are still in stdin are taken during while ((c = getchar()) != EOF) with the c=getchar() part.
Now the same process is repeated till there is nothing left in the input buffer.
This makes it look as if putchar() is returning a string instead of a single character at a time if more than one character is given as input during an iteration.
Eg: if input was
abcdefghijkl
the output would’ve been the same
abcdefghijkl
If you don’t want this behaviour, you can add fflush(stdin); right after the putchar(c);.
This will cause the loop to print only the first character in the input provided during each iteration.
Eg: if input was
adgbad
only a will be printed.
The input is sent to stdin only after you press enter.
putchar() is the opposite of getchar(). It writes the output to the standard output stream (stdout, usually the monitor).
EOF is not a character present in the file. It’s something returned by the function as an error code.
You probably won’t be able to exit from the give while loop normally though. The input buffer will emptied (for displaying to the output) as soon as something comes into it via keyboard and the stdin won't give EOF.
For manually exiting the loop, EOF can be sent using keyboard by pressing
ctrl+D in Linux and
ctrl+Z in Windows
eg:
while ((c = getchar()) != EOF)
{
putchar(c);
fflush(stdin);
}
printf("\nGot past!");
If you press the key combination to give EOF, the message Got past! will be displayed before exiting the program.
If stdin is not already empty, you will have to press this key combination twice. Once to clear this buffer and then to simuate EOF.
EDIT: The extra pair of parenthesis around c = getchar() in while ((c = getchar()) != EOF) is to make sure that the value returned by getchar() is first assigned to c before that value is compared with EOF.
If this extra parenthesis were not there, the expression would effectively have been while (c = (getchar() != EOF) ) which would've meant that c could have either of 2 values: 1 (for true) or 0 (for false) which is obviously not what is intended.

getchar()
gets a character from input.
c = getchar()
The value of this assignment is the value of the left side after the assignment, or the value of the character that's been read. Value of EOF is by default -1.
((c = getchar()) != EOF)
As long as the value stays something other than EOF or, in other words, as long as the condition stays true, the loop will continue to iterate. Once the value becomes EOF the value of the entire condition will be 0 and it will break the loop.
The additional parentheses around c = getchar() are for the compiler, to emphasize that we really wanted to do an assignment inside the condition, because it usually assumes you wanted to type == and warns you.
main() {
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
So the entire code actually echoes back what you input. It assigns the value of the characters to c inside the condition and then outputs it back in the body of the loop, ending only when the end of file is detected.

In a similar manner to the | pipe command above you can use redirection on your system to utilize the above code to display all the character contents of a file, till it reaches the end (EOF) represented by CTRL-Z or CTRL-D usually.
In console:
ProgramName < FileName1.txt
And to create a copy of what is read from FileName1 you can:
ProgramName < FileName1.txt > CopyOfInput.txt
This demonstrates your program in multiple ways to hopefully aid your understanding.
-Hope that helps.

main(){
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
Actually c=getchar() provides the character which user enters on the console and that value is checked with EOF which represents End Of File . EOF is encountered at last of file. (c = getchar()) != EOF is equivalent to c != EOF . Now i think this is much easier . If you any further query let me know.

Where does `getchar()` store the user input?

I've started reading "The C Programming Language" (K&R) and I have a doubt about the getchar() function.
For example this code:
#include <stdio.h>
main()
{
int c;
c = getchar();
putchar(c);
printf("\n");
}
Typing toomanychars + CTRL+D (EOF) prints just t. I think that's expected since it's the first character introduced.
But then this other piece of code:
#include <stdio.h>
main()
{
int c;
while((c = getchar()) != EOF)
putchar(c);
}
Typing toomanychars + CTRL+D (EOF) prints toomanychars.
My question is, why does this happens if I only have a single char variable? where are the rest of the characters stored?
EDIT:
Thanks to everyone for the answers, I start to get it now... only one catch:
The first program exits when given CTRL+D while the second prints the whole string and then waits for more user input. Why does it waits for another string and does not exit like the first?

getchar gets a single character from the standard input, which in this case is the keyboard buffer.
In the second example, the getchar function is in a while loop which continues until it encounters a EOF, so it will keep looping and retrieve a character (and print the character to screen) until the input becomes empty.
Successive calls to getchar will get successive characters which are coming from the input.
Oh, and don't feel bad for asking this question -- I was puzzled when I first encountered this issue as well.

It's treating the input stream like a file. It is as if you opened a file containing the text "toomanychars" and read or outputted it one character at a time.
In the first example, in the absence of a while loop, it's like you opened a file and read the first character, and then outputted it. However the second example will continue to read characters until it gets an end of file signal (ctrl+D in your case) just like if it were reading from a file on disk.
In reply to your updated question, what operating system are you using? I ran it on my Windows XP laptop and it worked fine. If I hit enter, it would print out what I had so far, make a new line, and then continue. (The getchar() function doesn't return until you press enter, which is when there is nothing in the input buffer when it's called). When I press CTRL+Z (EOF in Windows), the program terminates. Note that in Windows, the EOF must be on a line of its own to count as an EOF in the command prompt. I don't know if this behavior is mimicked in Linux, or whatever system you may be running.

Something here is buffered. e.g. the stdout FILE* which putchar writes to might be line.buffered. When the program ends(or encounters a newline) such a FILE* will be fflush()'ed and you'll see the output.
In some cases the actual terminal you're viewing might buffer the output until a newline, or until the terminal itself is instructed to flush it's buffer, which might be the case when the current foreground program exits sincei it wants to present a new prompt.
Now, what's likely to be the actual case here, is that's it's the input that is buffered(in addition to the output :-) ) When you press the keys it'll appear on your terminal window. However the terminal won't send those characters to your application, it will buffer it until you instruct it to be the end-of-input with Ctrl+D, and possibly a newline as well.
Here's another version to play around and ponder about:
int main() {
int c;
while((c = getchar()) != EOF) {
if(c != '\n')
putchar(c);
}
return 0;
}
Try feeding your program a sentence, and hit Enter. And do the same if you comment out
if(c != '\n') Maybe you can determine if your input, output or both are buffered in some way.
THis becomes more interesting if you run the above like:
./mytest | ./mytest
(As sidecomment, note that CTRD+D isn't a character, nor is it EOF. But on some systems it'll result closing the input stream which again will raise EOF to anyone attempting to read from the stream.)

Your first program only reads one character, prints it out, and exits. Your second program has a loop. It keeps reading characters one at a time and printing them out until it reads an EOF character. Only one character is stored at any given time.

You're only using the variable c to contain each character one at a time.
Once you've displayed the first char (t) using putchar(c), you forget about the value of c by assigning the next character (o) to the variable c, replacing the previous value (t).

the code is functionally equivalent to
main(){
int c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
you might find this version easier to understand. the only reason to put the assignment in the conditional is to avoid having to type 'c=getchar()' twice.

For your updated question, in the first example, only one character is read. It never reaches the EOF. The program terminates because there is nothing for it to do after completing the printf instruction. It just reads one character. Prints it. Puts in a newline. And then terminates as it has nothing more to do. It doesn't read more than one character.
Whereas, in the second code, the getchar and putchar are present inside a while loop. In this, the program keeps on reading characters one by one (as it is made to do so by the loop) until reaches reaches the EOF character (^D). At that point, it matches c!=EOF and since the conditions is not satisfied, it comes out of the loop. Now there are no more statements to execute. So the program terminates at this point.
Hope this helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

K&R Exercise 1.9 - program in infinite loop - c

I seems you are using a platform on which CTRL-D is not the right key binding for sending an EOF. E.g. on Windows, it's usually CTRL-Z. EDIT: And yes, as others have already stated, the EOF needs to be at the beginning of a line, so you need to send a newline first.

Related

Why multiple EOF enters to end program?

Why do I need to type Ctrl-D twice to mark end-of-file?

Why EOF is recognized if it is first character in line?

I'm trying to understand getchar() != EOF

Where does `getchar()` store the user input?

Categories

Resources