Does this character counting program include the newline character? - c

#include <stdio.h>
/* count characters in input; 2nd version */
main()
{
double nc;
for (nc = 0; getchar() != EOF; ++nc)
;
printf("%.0f\n", nc);
}
When I compile and run the program and type in a character (example: abcd), hit enter, then hit the escape character CTRL+Z, it says 5. Is that because of the "hidden" newline character or does it count the EOF command? Because when I type the EOF command alone it stays as 0.

Yes.
getchar waits for you to enter "something"; and Enter is 'something' (that is, it has a defined character value; other keys, Shift for example, may not).
By that same token, the key combo Ctrl+Z would be "something" as well -- the value 26 on most systems -- but the standard input/output library you are using treats this particular code as a special command: EOF. On my OS, Mac OSX, that would be Ctrl+D (for reasons unknown to me, other than "historically, Ctrl+D is used to signal EOF on Unix-like systems").
The 'newline character' is by no means "hidden" or "invisible", it's just another number that gets read and stored into a variable or string if you instruct it to do so. The reason you cannot see it is because putchar and other text printing functions do something else than "display the associated character": it moves the cursor to the next line. That also is part of the standard functionality, and a good thing it is too. After all you wouldn't want to press Space to "move to the next line". (In fact it's such a common function of this code that most fonts don't even bother to have a displayable item for it.)
See also What does getchar() exactly do? for more background information.

Related

getchar() and conditioning a loop to break with a specific char

I am working on the infamous book "Prentice Hall Software Series" and trying out the Code they write and modifying it to learn more about C.
I am working with VIM on Fedora 25 in the console. The following code is a quote from the book, I know "int" is missing as well as argc and argv etc.
Kernighan and Ritchie - The C Programming Language: Page 20
#include <stdio.h>
/* copy input to output; 1st version */
main(){
int c;
c = getchar();
while (c != EOF) {
putchar(c);
c = getchar();
}
}
With this code I couldn't manage to get the "EOF" to work. I am not sure if "ctr + z" really is the real thing to do, since it quits any console program in console.
Well since I was unsure i changed the condition to
...
while (c != 'a') {
...
So normally if I enter 'a' the while condition should break and the programm should terminate. Well it does not when I try to run it and enter 'a'. What is the problem here?
Thank you guys!
There's nothing wrong with the code (except the archaic declaration of main).
Usually, on Unixes end of file is signalled to the program by ctrl-D. If you hit ctrl-D either straight away or after hitting new-line, your program will read EOF.
However, the above short explanation hides a lot of subtleties.
In Unix, terminal input can operate in one of two modes (called IIRC raw and cooked). In cooked mode - the default - the OS will buffer input from the terminal until it either reads a new line or a ctrl-D character. It then sends the buffered input to your program.
Ultimately, your program will use the read system call to read the input. read will return the number of characters read but normally will block until it has some characters to read. getchar then passes them one by one to its caller. So getchar will block until a whole line of text has been received before it processes any of the characters in that line. (This is why it still didn't work when you used a).
By convention, the read system call returns 0 when it gets end of file. This is how getchar knows to return EOF to the caller. ctrl-D has the effect of forcing read to read and empty buffer (if it is sent immediately after a new line) which makes it look to getchar like it's reached EOF even though nobody has closed the input stream. This is why ctrl-D works if it is pressed straight after new line but not if it is pressed after entering some characters.

C - inputting correct code but receiving no output

When I run the following code and input a sentence I am not given any output. The cursor just goes to a new line.
I copied this straight off the book and double checked it for mistakes (1st edition C programming language by kernighan & ritchie)
#include <stdio.h>
int main()
{
int c,i,nwhite,nother;
int ndigit[10];
nwhite=nother=0;
for(i=0;i<10;++i)
ndigit[i] = 0;
while (( c=getchar()) != EOF)
if(c>= '0' && c<= '9')
++ndigit[c-'0'];
else if (c==' ' || c == '\n' || c == '\t')
++nwhite;
else
++nother;
printf("digits =");
for( i=0; i<10; ++i)
printf("%d",ndigit[i]);
printf(", white space = %d, other = %d\n", nwhite,nother);
return 0;
}
Since you are testing a program copied from another source, I suppose that you don't want to change it, but understanding it.
getchar() obtains exactly 1 character from the standard input, which is a file named stdin in the standard header <stdio.h>.
The standard input, stdin, is considered a file.
Formally speaking, the end-of-file is a "mark" and not a "character".
However, in general, a specific "character" is used to mark the "end-of-file" of a text file.
In Windows the "end-of-file" mark is the character CTRL-Z (whose ASCII code is 26).
In Linxu the mark is CTRL-D (whose ASCII code is 4).
On the other hand, the standard input commonly has the following behaviour:
Wait for user enter characters until an Intro/Enter key is pressed.
If the user does not press Enter, then the standard input does not give back the control to the program. This happens even if you enter an "end-of-file" character (say, CTRL-Z).
However, other behaviours are possible.
For example, in Ubuntu console I obtain that CTRL-D is recognized without waiting for the Enter key be pressed.
In any case, you must explicitely type the end-of-file mark in the console of your system.
So, CTRL-Z (perhaps followed by Enter) or CTRL-D have to be pressed for yourself.
ABOUT ENTER and EOF
After Enter is pressed, your program test for EOF, that is, the "end-of-file" mark in your system.
However, the Enter keyword does not prints "end-of-file" marks, but only "end-of-line" ones, which corresponds to the standard character newline '\n'.
Thus, if it is desired that the while() loop terminates after pressing Intro/Enter, the test must be done against '\n', and not EOF.
AN OBSERVATION
It can be observed that getchar() doesn't retrieve the character CTRL-Z, because the ASCII code for CTRL-Z is 26, but getchar() retrieves a negative value (in general -1).
This means that getchar() recognizes the character ASCII 26 as and end-of-file mark, and then it converts to a value with meaning in C, provided by the macro EOF, which is not 26.
What I mean is that EOF is not CTRL-Z, and then one cannot naively send EOF under the assumption that the ASCII 26 (CTRL-Z) will be sent to a text file.
Summarizing, I think that it is important to delucidate the abstract concept of "end-of-file", the role of EOF, and the difference between a "mark" and a "character".
(Another example: in Windows the "mark" for "end-of-line" is the couple of characters CTRL-M CTRL-J, which is not only 1 character, but 2).
Quoted from the standard:
The getchar function returns the next character from the input stream pointed to by
stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and
getchar returns EOF. If a read error occurs, the error indicator for the stream is set and
getchar returns EOF.

Why do I need to type Ctrl-D twice to mark end-of-file?

char **query;
query = (char**) malloc ( sizeof(char*) );
int f=0;
int i=0,j=0,c;
while((c=getchar())!=EOF)
{
if(!isalpha(c))
continue;
if(f==1)
query=(char**) realloc(query,(i+1)*sizeof(char*));
query[i]=(char*) malloc(sizeof(char));
query[i][j]=c;
j++;
while( (c=getchar())!=EOF&&c!=' '&&c!='\t' )
{
query[i]=(char*) realloc(query[i],(j+1)*sizeof(char));
query[i][j]=c;
++j;
}
query[i][j]='\0';
printf("%s\n",query[i]);
if(c==EOF){
break;
}
++i;
f=1;
j=0;
}
I want the above code snippet to read a line of strings separated by spaces and tabs until ONE EOF but it requires 2 EOFs to end the loop. Also, strings can consist of only alphabetic characters.
I am struggling on about 2 days.
Please, give some feedback.
EDIT: Most probably the reason is I hit CTRL+D keys after I write last string not the enter key, but now I hit enter and then CTRL+D, it works as expected.
But, how can I change it to finish after I hit CTRL+D once following the last string?
On Unix-like systems (at least by default), an end-of-file condition is triggered by typing Ctrl-D at the beginning of a line or by typing Ctrl-D twice if you're not at the beginning of a line.
In the latter case, the last line you read will not have a '\n' at the end of it; you may need to allow for that.
This is specified (rather indirectly) by POSIX / The Open Group Base Specifications Issue 7, in section 11, specifically 11.1.9:
EOF
Special character on input, which is recognized if the ICANON flag is
set. When received, all the bytes waiting to be read are immediately
passed to the process without waiting for a <newline>, and the EOF is
discarded. Thus, if there are no bytes waiting (that is, the EOF
occurred at the beginning of a line), a byte count of zero shall be
returned from the read(), representing an end-of-file indication. If
ICANON is set, the EOF character shall be discarded when processed.
The POSIX read() function indicates an end-of-file (or error) condition to its caller by returning a byte count of zero, indicating that there are no more bytes of data to read. (C's <stdio> is, on POSIX systems, built on top of read() and other POSIX-specific functions.)
EOF (not to be confused with the C EOF macro) is mapped by default to Ctrl-D. Typing the EOF character at the beginning of a line (either at the very beginning of the input or immediately after a newline) triggers an immediate end-of-file condition. Typing the EOF character other than at the beginning of a line causes the previous data on that line to be returned immediately by the next read() call that asks for enough bytes; typing the EOF character again does the same thing, but in that case there are no remaining bytes to be read and an end-of-file condition is triggered. A single EOF character in the middle of a line is discarded (if ICANON is set, which it normally is).
In the off chance that someone sees this that needs the help I've been needing... I had been searching, trying to figure out why I was getting this strange behavior with my while(scanf). Well, it turns out that I had while (scanf("%s\n", string) > 0). The editor I am using (Atom) automatically put a "\n" in my scan, with out me noticing. This took me hours, and luckily someone pointed it out to me.
The Return key doesn't produce EOF that's why the condition getchar() != EOF doesn't recognize it. You can do it by Pressing CTRL+Z in Windows or CTRL+D in Unix.

Input for examples from Kernighan and Ritchie

In section 1.5.2 of the 2nd ed. K&R introduce getchar() and putchar() and give an example of character counting, then line counting, and others throughout the chapter.
Here is the character counting program
#include <stdio.h>
main() {
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n",nc);
}
where should the input come from? typing into the terminal command window and hitting enter worked for the file copying program but not for this. I am using XCode for Mac.
It seems like the easiest way would be to read a text file with pathway "pathway/folder/read.txt" but I am having trouble with that as well.
From the interactive command line, press ctrl-D after a newline, or ctrl-D twice not after newline, to terminate the input. Then the program will see EOF and show you the results.
To pass a file by path, and avoid the interactive part, use the < redirection operator of the shell, ./count_characters < path/to/file.txt.
Standard C input functions only start processing what you type in when you press the Enter key IOW.Every key you press adds a character to the system buffer (shell).Then when the line is complete (ie, you press Enter), these characters are moved to C standard buffer. getchar() reads the first character in the buffer, which also removes it from the buffer.Each successive call to getchar() reads and removes the next char, and so on. If you don't read every character that you had typed into the keyboard buffer, but instead enter another line of text, then the next call to getchar() after that will continue reading the characters left over from the previous line; you will usually witness this as the program blowing past your second input. BTW, the newline from the Enter key is also a character and is also stored in the keyboard buffer, so if you have new input to read in you first need to clear out the keyboard buffer.

Where does `getchar()` store the user input?

I've started reading "The C Programming Language" (K&R) and I have a doubt about the getchar() function.
For example this code:
#include <stdio.h>
main()
{
int c;
c = getchar();
putchar(c);
printf("\n");
}
Typing toomanychars + CTRL+D (EOF) prints just t. I think that's expected since it's the first character introduced.
But then this other piece of code:
#include <stdio.h>
main()
{
int c;
while((c = getchar()) != EOF)
putchar(c);
}
Typing toomanychars + CTRL+D (EOF) prints toomanychars.
My question is, why does this happens if I only have a single char variable? where are the rest of the characters stored?
EDIT:
Thanks to everyone for the answers, I start to get it now... only one catch:
The first program exits when given CTRL+D while the second prints the whole string and then waits for more user input. Why does it waits for another string and does not exit like the first?
getchar gets a single character from the standard input, which in this case is the keyboard buffer.
In the second example, the getchar function is in a while loop which continues until it encounters a EOF, so it will keep looping and retrieve a character (and print the character to screen) until the input becomes empty.
Successive calls to getchar will get successive characters which are coming from the input.
Oh, and don't feel bad for asking this question -- I was puzzled when I first encountered this issue as well.
It's treating the input stream like a file. It is as if you opened a file containing the text "toomanychars" and read or outputted it one character at a time.
In the first example, in the absence of a while loop, it's like you opened a file and read the first character, and then outputted it. However the second example will continue to read characters until it gets an end of file signal (ctrl+D in your case) just like if it were reading from a file on disk.
In reply to your updated question, what operating system are you using? I ran it on my Windows XP laptop and it worked fine. If I hit enter, it would print out what I had so far, make a new line, and then continue. (The getchar() function doesn't return until you press enter, which is when there is nothing in the input buffer when it's called). When I press CTRL+Z (EOF in Windows), the program terminates. Note that in Windows, the EOF must be on a line of its own to count as an EOF in the command prompt. I don't know if this behavior is mimicked in Linux, or whatever system you may be running.
Something here is buffered. e.g. the stdout FILE* which putchar writes to might be line.buffered. When the program ends(or encounters a newline) such a FILE* will be fflush()'ed and you'll see the output.
In some cases the actual terminal you're viewing might buffer the output until a newline, or until the terminal itself is instructed to flush it's buffer, which might be the case when the current foreground program exits sincei it wants to present a new prompt.
Now, what's likely to be the actual case here, is that's it's the input that is buffered(in addition to the output :-) ) When you press the keys it'll appear on your terminal window. However the terminal won't send those characters to your application, it will buffer it until you instruct it to be the end-of-input with Ctrl+D, and possibly a newline as well.
Here's another version to play around and ponder about:
int main() {
int c;
while((c = getchar()) != EOF) {
if(c != '\n')
putchar(c);
}
return 0;
}
Try feeding your program a sentence, and hit Enter. And do the same if you comment out
if(c != '\n') Maybe you can determine if your input, output or both are buffered in some way.
THis becomes more interesting if you run the above like:
./mytest | ./mytest
(As sidecomment, note that CTRD+D isn't a character, nor is it EOF. But on some systems it'll result closing the input stream which again will raise EOF to anyone attempting to read from the stream.)
Your first program only reads one character, prints it out, and exits. Your second program has a loop. It keeps reading characters one at a time and printing them out until it reads an EOF character. Only one character is stored at any given time.
You're only using the variable c to contain each character one at a time.
Once you've displayed the first char (t) using putchar(c), you forget about the value of c by assigning the next character (o) to the variable c, replacing the previous value (t).
the code is functionally equivalent to
main(){
int c;
c = getchar();
while(c != EOF) {
putchar(c);
c = getchar();
}
}
you might find this version easier to understand. the only reason to put the assignment in the conditional is to avoid having to type 'c=getchar()' twice.
For your updated question, in the first example, only one character is read. It never reaches the EOF. The program terminates because there is nothing for it to do after completing the printf instruction. It just reads one character. Prints it. Puts in a newline. And then terminates as it has nothing more to do. It doesn't read more than one character.
Whereas, in the second code, the getchar and putchar are present inside a while loop. In this, the program keeps on reading characters one by one (as it is made to do so by the loop) until reaches reaches the EOF character (^D). At that point, it matches c!=EOF and since the conditions is not satisfied, it comes out of the loop. Now there are no more statements to execute. So the program terminates at this point.
Hope this helps.

Resources