EOF in the middle of an input - c

my question is about how EOF is interpreted in the middle of an input, here is an example:
int main() {
int a, b;
printf("enter something >\n");
scanf("%d", &a);
while((b = getchar()) != EOF) {
printf("%i\n", b);
}
return b;
}
I run the program and enter:
1hello^Z(control+z)abc
the output is:
104 (ascii number for h)
101 (for e)
108 (l)
108 (l)
111 (o)
26 (what is this?)
The digit 1 is read by scanf, the remaining stays in the buffer, getchar() gets all of them until ^Z, which is expected behavior, as the control z closes stdin.
however where does 26 come from? If the last thing getchar() reads is EOF why isn't -1 the last value? Also why doesn't this program get out of the loop when it reads ^Z, why do I need to invoke EOF one more time with control z to terminate the loop? 26 is the ascii for SUB, I don't know what to make of this.
Thank you.

When the loop ends, b=26 because you entered ctrl+z and this is interpreted as SUB while returning .http://en.wikipedia.org/wiki/Substitute_character
In the ASCII and Unicode character sets, this character(SUB) is encoded by the number 26 (1A hex). Standard keyboards transmit this code when the Ctrl and Z keys are pressed simultaneously (Ctrl+Z, by convention often described as ^Z).

Related

Two EOF required to actually end reading from terminal

I have been trying to understand how EOF works. In my code (on Windows) invoking EOF (Ctrl+Z and Enter) doesn't work the first time and I have to provide two EOF for it to actually stop reading input. Also, the first EOF gets read as some garbage character which gets displayed when I print the input. (We can see the garbage characters being display at the end in the output provided).
This is my code:-
#include<stdio.h>
#define Max 1000
int main()
{
char c, text[Max];
int i = 0;
while((c = getchar()) != EOF)
{
text[i] = c;
i++;
}
printf("\nEntered Text: \n");
puts(text);
return 0;
}
My Output:
I have this doubt:-
Why are two EOFs being required? and how do I prevent the first one from being read (as some garbage) and stored as part of my input?
Control-Z is only recognized as EOF when at the start of a new line. Therefore, if you want to detect it in the middle of a line, you'll need to do so yourself.
So change this line:
while((c = getchar()) != EOF)
to this:
while((c = getchar()) != EOF && c != CTRL_Z)
and then add:
#define CTRL_Z ('Z' & 0x1f)
at the top of your program.
You may still need to type a return after the Ctrl-z to get the buffered input to be read by the program, but it should discard everything after the ^Z.
The following solution fixes the Ctrl+Z problem and the garbage output and also blocks a buffer overrun. I have commented the changes:
#include <stdio.h>
#define Max 1000
#define CTRL_Z 26 // Ctrl+Z is ASCII/ANSI 26
int main()
{
int c ; // getchar() returns int
char text[Max + 1] ; // +1 to acommodate terminating nul
int i = 0;
while( i < Max && // Bounds check
(c = getchar()) != EOF &&
c != CTRL_Z ) // Check for ^Z when not start of input buffer
{
text[i] = c;
i++;
}
text[i] = 0 ; // Terminate string after last added character
printf( "\nEntered Text:\n" );
puts( text );
return 0;
}
The reason for this behavior is somewhat arcane, but end-of-file is not the same as Ctrl-Z. The console generates an end-of-file causing getchar() to return EOF (-1) if and only if the console input buffer is empty, otherwise it inserts the ASCII SUB (26) character into the stream. The use of SUB was originally to do with MS-DOS compatibility with the even earlier CP/M operating system. In particular CP/M files were composed of fixed length records, so a ^Z in the middle of a record, was used to indicate the end of valid data for files that were not an exact multiple of the record length. In the console, the SUB is readable rather than generating an EOF if it is not at the start of the input buffer and all characters after the SUB are discarded. It is all a messy hangover from way-back.
Try changing the type of c to int as EOF can be a negative number and commonly it is defined as -1. char might or might not be able to store -1. Also, do not forget to end the string with \0 before passing it to puts.
The logic that Windows terminals follow with regard to ^Z in keyboard input (at least in their default configuration) is as follows:
The Ctrl-Z combination itself does not cause the input line buffer to get pushed to the waiting application. This key combination simply generates ^Z character in the input buffer. You have to press Enter to finish that line buffer and send it to the application.
You can actually keep entering additional characters after ^Z and before pressing Enter.
If the input line does not begin with ^Z, but contains ^Z inside, then the application will receive that line up to and including the first ^Z character (read as \x1A character). The rest of the input is discarded.
E.g. if you type in
Hello^Z World^Z123
and press Enter your C program will actually read Hello\x1A sequence. EOF condition will not arise.
If the input line begins with ^Z, the whole line is discarded and EOF condition is set.
E.g. if you input
^ZHello World
and press Enter your program will read nothing and immediately detect EOF.
This is the behavior you observe in your experiments. Just keep in mind that the result of getchar() should be received into an int variable, not a char variable.

Write a program that loops, reading characters from standard input and writing their decimal values to standard output one per line, until EOF occurs

I've tried to write a program that reads characters from the standard input inside a loop and write their decimal values to standard output one per line until EOF occurs.
My code is:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char c;
while (c != '\n' && c != EOF) {
scanf("%c",&c);
printf("%d\n",c);
}
}
When enter:
Hi!
Line 2
The output should be:
72
105
33
10
76
105
110
101
32
50
10
How can I get 2 lines as input and read Enter as a character?
How can I get 2 lines as input and read "Enter" as character?
Well, you are already reading the "Enter" (i.e. \n). In fact you are also using it to terminate the while loop. Therefore your program will stop after reading the first line. In other words - if you want to be able to read multiple lines remove the check for \n in the while
Also notice that your program uses c before it's initialize. Something which you should never do.
BTW: I think getc is a better function for this than scanf.
Something like this should do:
#include <stdio.h>
int main(void)
{
int c;
while((c = getc(stdin)) != EOF)
{
printf("%d\n",c);
}
}

Understanding getchar() in the character counting program in C

This is a follow-up question of my previous question. There is already a similar question asked(question). But I don't get what I want to know from that answer.
From the previous question I come to know that if I type a lot of characters, then they are not made available to getchar(), until I press Enter. So at the very point when I press Enter, all the characters will be made available to getchar()s. Now consider the following program for character counting:
#include<stdio.h>
main()
{
long nc;
nc=0;
while(getchar()!=EOF)
++nc;
printf(" Number of chars are %ld ",nc);
}
If I input characters from the command line in the following sequence: {1,2,3,^Z,4,5,Enter}, then in the next line {^Z,Enter}. The output that I expect is: Number of chars are 6. But the output that I am getting is Number of chars are 4.
This answer explains that when we input1,2,3,^Z, then ^Z acts like Enter and 1,2,3 are sent to getchar()s. The while loop of the above written code runs three times. ^Z is not given to getchar(), so the program doesn't terminate yet. My input was {1,2,3,^Z,4,5,Enter}. After ^Z I had pressed 4,5 and then Enter. Now when I press Enter the characters 4,5 and Enter, should be given to getchar()s and the while loop should execute three times more. Then in the last line I input {^Z,Enter}, since there is no text behind ^Z, it is consider as a character and when I press Enter, this ^Z is given as the input to getchar() and the while loop terminates. In all this, the while loop has executed 6 times, so the variable nc should become 6.
Why am I getting 4 as the value of nc, rather than 6.
Adding some output will help you:
#include <stdio.h>
int
main (void)
{
int c, nc = 0;
while ((c = getchar ()) != EOF)
{
++nc;
printf ("Character read: %02x\n", (unsigned) c);
}
printf ("Number of chars: %d\n", nc);
}
The Windows console views the ^Z input as "send input before ^Z to stdin, discard remaining input on the line (including the end-of-line delimiter), and send ^Z" unless it is at the beginning of a line, in which case it sends EOF instead of ^Z:
123^Z45
Character read: 31
Character read: 32
Character read: 33
Character read: 1a
^Z12345
Number of chars: 4
Also, Windows always waits for the Enter/Return key, with the exception of very few key sequences like ^C or ^{Break}.
^Z, or Ctrl-Z, means end-of file for text files (old MS-DOS). getchar() is equivalent to fgetc(stdin) and is often a macro. "fgetc returns the character read as an int or returns EOF to indicate an error or end of file."
See also _set_fmode, however, I am not sure if that changes the behaviour right away or whether you have to close/reopen the file. Not sure either if you can close/reopen stdin (don't do much console programming anymore).

It seems I must first press enter before getchar() gets EOF

I'm starting to learn about EOF and I've written the following simple program :
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void main()
{
int i=0;
while(getchar()!=EOF)
{
i++;
}
printf("number of characters : %d \n",i);
}
The thing is, when I write a string, press enter and then press Ctrl+Z the output is the number of the characters I wrote plus 1 (for the EOF). However, if I write a string and, without changing line, press Ctrl+Z the while loop does not terminate. Why is that?
First things first, EOF is signalled only when Ctrl + Z is at the very beginning of a line. With that on mind:
On your first try with your input (10 characters) and then Enter, you actually push your input\n to the input stream, which gets read character by character through getchar, and there are 11 characters now with the addition of that new line at the end thanks to your Enter.
On the new line, you then use the Ctrl + Z combination to signal the EOF, and you indeed do that properly there; signal the EOF and get 11 as the result.
It's strange that you were expecting to see 10 here. What if you were to have an input of multiple lines? Would you like it to not count for new lines? Then you could use something like:
int onechar;
while ((onechar = getchar( )) != EOF)
{
if (onechar != '\n')
i++;
}
Or even more further, are you always expecting a single line of input? Then you might want to consider changing your loop condition into following:
while(getchar( ) != '\n')
{
i++;
}
Oooor, would you like it to be capable of getting multi-line input, as well as it to count the \n characters, and on top of all that, just want it to be able to stop at Ctrl + Z combinations that are not necessarily at the beginning of a line? Well then, here have this:
// 26 is the ASCII value of the Substitute character
for (int onechar = getchar( ); onechar != EOF && onechar != 26; onechar = getchar( ))
{
i++;
}
26, as commented, is the Substitute character, which, at least on my machine, is what the programme gets when I use Ctrl + Z inappropriately. Using this, if you were to input:
// loop terminated by the result of (onechar != 26) comparison
your input^Z
You would get 10 as the result and if you were to input:
// loop terminated by the result of (onechar != EOF) comparison
your input
^Z
You would get 11, counting that new-line which you did input along with all the other 10 characters before that. Here, ^Z has been used to display the Ctrl + Z key combination as an input.
Input uses buffers. The first getchar requests a system-level read. When you press enter or ctrl-z the read returns the buffer to the program. When you press enter the system also adds a newline character to the buffer before returning it. Eof is not an actual character but results from reading an empty buffer.
After the control is returned to the program, getchar sequentially reads each character in the returned buffer and when it's finished it requests another read.
In the first case, getchar reads the buffer including the newline character. Then since the buffer is empty getchar requests another read which is interrupt by pressing ctrl-z, returning an empty buffer and resulting in EOF.
In the second case, pressing ctrl-z simply returns the buffer and after getchar is finished reading it, it requests another read which isn't finished since you never press ctrl-z or enter again.
It's not your while loop that never finishes but merely the read call. Try to press ctrl-z twice in the second case.
I saw here topic related to this. What you need to do is to set pts/tty into non-cannonical mode and do it with somekind of TCSANOW(do attr changes immediately).
You do it using functions from termios.h , operating on struct termios;

printf vs putchar - different output

I have this simple code (trying to do an exercise in KandR):-
#include <stdio.h>
int main(){
int c = EOF;
while(c=(getchar() != EOF)){
printf("%d",c);
}
return 0;
}
When i run this and enter any character (a single character), i get the output as 11. If i enter multiple characters for example 'bbb' i get the output as 1111. I understand that i have explicitly added brackets to give precendence to the condition check of getchar() != EOF which should either result in 1 or 0. But i don't understand why am i getting multiple 1's.
Another case is:
#include <stdio.h>
int main(){
int c = EOF;
while(c=(getchar() != EOF)){
putchar(c);
}
return 0;
}
No matter which character i enter, i always get the output as a square box with 1's and 0's in it (shown at the bottom of the screenshot below)
1) In the first case, why is the output printing more than 1 1's?
2) Why isn't the output of case 2 same as case 1?
Until unless you press EOF, (getchar() != EOF) will return true which assigns 1 to c. That's why you are getting output always as 11, first 1 for the character you entered and second 1 is for \n passed to the input buffer on pressing Enter key.
Similarly in case of putchar it prints the character corresponding to the returned value 1 which is non-printable (printable characters start from 32) and you will get some weird output, one for input character and another for \n.
Now change the parentheses in conditional expression to
while( (c=getchar()) != EOF ){...}
Now it will work as it should but will give you two ASCII code in first case (one for \n).
1) In the first case, why is the output printing more than 1 1's?
Because you are looking for an EOF. In order to send your program EOF from the keyboard, press Ctrl+Z
2) Why isn't the output of case 2 same as case 1?
Because %d produces a decimal representation of the character code, while putchar produces the character itself. For example, if you print 'A' using printf's %d format, you would see 65 - ASCII code of the uppercase character A. On the other hand, if you print it using putchar, you would see character A itself.
Demo on ideone.

Resources