Understanding getchar() in the character counting program in C - c

This is a follow-up question of my previous question. There is already a similar question asked(question). But I don't get what I want to know from that answer.
From the previous question I come to know that if I type a lot of characters, then they are not made available to getchar(), until I press Enter. So at the very point when I press Enter, all the characters will be made available to getchar()s. Now consider the following program for character counting:
#include<stdio.h>
main()
{
long nc;
nc=0;
while(getchar()!=EOF)
++nc;
printf(" Number of chars are %ld ",nc);
}
If I input characters from the command line in the following sequence: {1,2,3,^Z,4,5,Enter}, then in the next line {^Z,Enter}. The output that I expect is: Number of chars are 6. But the output that I am getting is Number of chars are 4.
This answer explains that when we input1,2,3,^Z, then ^Z acts like Enter and 1,2,3 are sent to getchar()s. The while loop of the above written code runs three times. ^Z is not given to getchar(), so the program doesn't terminate yet. My input was {1,2,3,^Z,4,5,Enter}. After ^Z I had pressed 4,5 and then Enter. Now when I press Enter the characters 4,5 and Enter, should be given to getchar()s and the while loop should execute three times more. Then in the last line I input {^Z,Enter}, since there is no text behind ^Z, it is consider as a character and when I press Enter, this ^Z is given as the input to getchar() and the while loop terminates. In all this, the while loop has executed 6 times, so the variable nc should become 6.
Why am I getting 4 as the value of nc, rather than 6.

Adding some output will help you:
#include <stdio.h>
int
main (void)
{
int c, nc = 0;
while ((c = getchar ()) != EOF)
{
++nc;
printf ("Character read: %02x\n", (unsigned) c);
}
printf ("Number of chars: %d\n", nc);
}
The Windows console views the ^Z input as "send input before ^Z to stdin, discard remaining input on the line (including the end-of-line delimiter), and send ^Z" unless it is at the beginning of a line, in which case it sends EOF instead of ^Z:
123^Z45
Character read: 31
Character read: 32
Character read: 33
Character read: 1a
^Z12345
Number of chars: 4
Also, Windows always waits for the Enter/Return key, with the exception of very few key sequences like ^C or ^{Break}.

^Z, or Ctrl-Z, means end-of file for text files (old MS-DOS). getchar() is equivalent to fgetc(stdin) and is often a macro. "fgetc returns the character read as an int or returns EOF to indicate an error or end of file."
See also _set_fmode, however, I am not sure if that changes the behaviour right away or whether you have to close/reopen the file. Not sure either if you can close/reopen stdin (don't do much console programming anymore).

Related

C programming — loop

I am following the exercises in the C language book. I am in the first chapter where he introduces loops. In this code:
#include <stdio.h>
/* copy input to output; 1st version */
int main() {
int c, n1;
n1 = 0;
while ((c = getchar()) != EOF) {
if (c == '\n') {
++n1;
}
printf("%d\n", n1);
}
}
In here I am counting the number of lines. When I just hit enter without entering anything else I get the right number of lines but when I enter a character then hit enter key the loop runs twice without asking for an input the second time. I get two outputs.
this how the output looks like:
// I only hit enter
1
// I only hit enter
2
// I only hit enter
3
g // I put char 'g' then hit enter
3
4
3 and 4 print at the same time. why is 4 printing after the loop has been iterated already? I thought the loop would restart and ask me for input before printing 4.
The getchar function reads one character at a time. The number of lines will be printed for every character in the input read by getchar, whether that character is newline or not, but the counter will only be incremented when there is a newline character in the input.
When you enter g then the actual input that goes to the standard input is g\n, and getchar will read this input in two iterations and that's the reason it is printing number of lines twice.
If you put the print statement inside the if block then it will print only for newline characters. If you put the print statement outside the loop, then it will only print the count of the number of lines at the end of the input.
To be clear this is the terminal that you are dealing with.
By default, the terminal will not get input from the user \n is entered. Then the whole line is placed in the stdin.
Now as I said earlier here the program is not affected by the buffering of stdin. And then the characters will be taken as input and it is processed as you expect it to be. The only hitch was the terminals buffering - line buffering.
And here from standard you will see how getchar behaves:-
The getchar function returns the next character from the input stream pointed to by stdin. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getchar returns EOF. If a read error occurs, the error indicator for the stream is set and getchar returns EOF.
Now what are those characters - those charaacters include \n - the \n is what you put in the terminal and then to stdin via pressing the ENTER. Here earlier you were entering the characters earlier which were \n. This time you entered two characters. That's why the behavior you saw.

how putchar works with while loop?

I am new to c programming, so hope you guys can help me out with such questions.
1. I thought putchar() only print 1 char each time, while when I enter several char like 'hello' it print 'hello' before allow me to enter a next input? I thought that it should print only 'h' and then allow me to enter other input because getchar() only return one character each time.
2. how to make the loop stops? I know EOF has value of -1, but when I enter -1, the loop still runs.
#include <stdio.h>
main()
{
int c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
}
After the first getchar() has completed reading one character, the next getchar(); is inside the while() loop, so as per the logic, it will keep reading the input one-by-one, until in encounters EOF.
Following the same logic, putchar(c); is under the while loop, so it will print all the characters [one character per loop basis] read by getchar() and stored in c.
In linux, EOF is produced by pressing CTRL+D. When waiting for input, if you press this key combination, the terminal driver will transform this to EOF and while loop will break.
I'm not very sure about windows, but the key combination should be CTRL+Z.
Note: even if it seems entering -1 should work in accordance with EOF, actually it won't. getchar() cannot read -1 all at a time. It will be read as - and 1, in two consecutive iterations. Also worthy to mention, a character 1 is not equal to an integer 1. A character 1, once read, will be encoded accordingly [mostly ASCII] and the corresponding value will be stored.
getchar() gets the input from the console. In a while loop, it will read all the characters from the input including the return key.
-1 is "-1". It's not a value but just another combination of characters. EOF occurs when there is no more char in the buffer. i.e. when you press Enter (or Ctrl-Z or Ctrl-D depending on your OS)

It seems I must first press enter before getchar() gets EOF

I'm starting to learn about EOF and I've written the following simple program :
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void main()
{
int i=0;
while(getchar()!=EOF)
{
i++;
}
printf("number of characters : %d \n",i);
}
The thing is, when I write a string, press enter and then press Ctrl+Z the output is the number of the characters I wrote plus 1 (for the EOF). However, if I write a string and, without changing line, press Ctrl+Z the while loop does not terminate. Why is that?
First things first, EOF is signalled only when Ctrl + Z is at the very beginning of a line. With that on mind:
On your first try with your input (10 characters) and then Enter, you actually push your input\n to the input stream, which gets read character by character through getchar, and there are 11 characters now with the addition of that new line at the end thanks to your Enter.
On the new line, you then use the Ctrl + Z combination to signal the EOF, and you indeed do that properly there; signal the EOF and get 11 as the result.
It's strange that you were expecting to see 10 here. What if you were to have an input of multiple lines? Would you like it to not count for new lines? Then you could use something like:
int onechar;
while ((onechar = getchar( )) != EOF)
{
if (onechar != '\n')
i++;
}
Or even more further, are you always expecting a single line of input? Then you might want to consider changing your loop condition into following:
while(getchar( ) != '\n')
{
i++;
}
Oooor, would you like it to be capable of getting multi-line input, as well as it to count the \n characters, and on top of all that, just want it to be able to stop at Ctrl + Z combinations that are not necessarily at the beginning of a line? Well then, here have this:
// 26 is the ASCII value of the Substitute character
for (int onechar = getchar( ); onechar != EOF && onechar != 26; onechar = getchar( ))
{
i++;
}
26, as commented, is the Substitute character, which, at least on my machine, is what the programme gets when I use Ctrl + Z inappropriately. Using this, if you were to input:
// loop terminated by the result of (onechar != 26) comparison
your input^Z
You would get 10 as the result and if you were to input:
// loop terminated by the result of (onechar != EOF) comparison
your input
^Z
You would get 11, counting that new-line which you did input along with all the other 10 characters before that. Here, ^Z has been used to display the Ctrl + Z key combination as an input.
Input uses buffers. The first getchar requests a system-level read. When you press enter or ctrl-z the read returns the buffer to the program. When you press enter the system also adds a newline character to the buffer before returning it. Eof is not an actual character but results from reading an empty buffer.
After the control is returned to the program, getchar sequentially reads each character in the returned buffer and when it's finished it requests another read.
In the first case, getchar reads the buffer including the newline character. Then since the buffer is empty getchar requests another read which is interrupt by pressing ctrl-z, returning an empty buffer and resulting in EOF.
In the second case, pressing ctrl-z simply returns the buffer and after getchar is finished reading it, it requests another read which isn't finished since you never press ctrl-z or enter again.
It's not your while loop that never finishes but merely the read call. Try to press ctrl-z twice in the second case.
I saw here topic related to this. What you need to do is to set pts/tty into non-cannonical mode and do it with somekind of TCSANOW(do attr changes immediately).
You do it using functions from termios.h , operating on struct termios;

Counting chars in C on Mac OS X

I am running an example from The C Programming Language by Kernighan and Ritchie.
#include <stdio.h>
main() {
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
I'm on Mac OS X, so I compile it, run it, type in "12345", press enter for newline (I believe newline is the sixth character?) and then press ctrl-D to send an EOF.
It prints out "6D".
Why is the "D" there?
How do I write a program to just count the 5 chars in "12345" and not the newline?
Should I just subtract one at the end?
How do I get it to stop printing the "D"?
What happens is that the terminal actually echoes your control-D (and prints it out as ^D) when you type it, but then your program overwrites that line with a number and a line-feed. So your one-digit number overwrites the ^, but the D stays there.
If you enter more than 10 characters, or if change the code by adding a space at the end of your format string ("%ld \n"), then your program would overwrite the ^D (though it would still have been echoed by your terminal)
The program is working correctly: it's printing 6.
You see 6D because the terminal window printed ^D then went back to the beginning of the line, and your program prints 6 over the ^ leaving the following D. Try redirecting the output to a file, or giving it enough input that the answer has more than one digit, and you won't see the D.
The answer is 6 instead of 5 because of the newline, yes. If you don't want to count the newline at the end, subtract 1. If you don't want to count any newlines, subtract 1 whenever you see a newline.

How do I enter data in a program to count the entered characters?

I wrote two programs in C to count characters and print the value.
One uses the while loop and the other uses for. No errors are reported while compiling, but neither print anything.
Here's the code using while:
#include <stdio.h>
/* count characters and input using while */
main()
{
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
And here's the code using for:
#include <stdio.h>
/* count characters and input using for */
main()
{
long nc;
for (nc = 0; getchar() != EOF; ++nc)
;
printf("%ld\n", nc);
}
Both compile and run.
When I type input and hit enter, a blank newline prints. When I hit enter without inputting anything, again a blank newline prints. I think it should at least print zero.
You need to terminate the input to your program (i.e. trigger the EOF test).
You can do this on most unix terminals with Ctrl-D or Ctrl-Z at the start of a new line on most windows command windows.
Alternately you can redirect stdin from a file, e.g.:
myprogram < test.txt
Also, you should give main a return type; implicit int is deprecated.
int main(void)
Your programs will only output after seeing an end of file (EOF). Generate this on UNIX with CTRL-D or on Windows with CTRL-Z.
You wait for an EOF character (end of file) here. This will only happen in two scenarios:
The user presses Ctrl+Break (seems to work on Windows here, but I wouldn't count on this).
The user enters a EOF character (can be done with Ctrl+Z for example).
A better way to do this would be to check for a newline instead.
for your program to work correctly. after you press the enter key, u have to ensure that uyou terminate your loop, so that the program flos correctly. this is done by typing the ctrl+Z from your keyboard. these are the keys that correspond to the EOF.
Pressing Enter on your keyboard, adds the character \n to your input. In order to and the program and have it print out the number of characters you would need to add an 'End of File' (EOF) character.
In order to inject an EOF character, you should press CTRL-D in Unix or CTRL-Z on Windows.
You do realize that newline is a normal character, and not an EOF indicator? EOF would be Ctrl-D or Ctrl-Z on most popular OSes.

Resources