Understanding how EOF and Ctrl + D work [duplicate] - c

I am studying for an exam and I am confused as to how canonical vs. non-canonical input/output works in Unix (e.g., curses). I understand that there is a buffer to which "line disciplines" are applied for canonical input. Does this mean that the buffer is bypassed for non-canonical input, or does it simply mean that no line disciplines are applied? How does this process differ for input and output operations?
In the curses programs I have worked with that demonstrate canonical input, the input typed by a user is automatically entered either after a certain number of characters have been typed or a certain amount of time has passed. Are either of these things considered "line disciplines" or is this something else entirely?

For canonical input — think shell; actually, think good old-fashioned Bourne shell, since Bash and relatives have command-line editing. You type a line of input; if you make a mistake, you use the erase character (default is Backspace, usually; sometimes Delete) to erase the previous character. If you mess up completely, you can cancel the whole line with the line kill character (not completely standardized, often Control-X). On some systems, you get a word erase with Control-W. All this is canonical input. The entire line is gathered and edited up until the end of line character — Return — is pressed. Thereupon, the whole line is made available to waiting programs. Depending on the read() system calls that are outstanding, the whole line will be available to be read (by one or more calls to read()).
For non-canonical input — think vi or vim or whatever — you press a character, and it is immediately available to the program. You aren't held up until you hit return. The system does no editing of the characters; they are made available to the program as soon as they are typed. It is up to the program to interpret things appropriately. Now, vim does do a number of things that look a bit like canonical input. For example, backspace moves backwards, and in input mode erases what was there. But that's because vim chooses to make it behave like that.
Canonical and non-canonical output is a much less serious business. There are a few bits and pieces of difference, related to things like whether to echo carriage-return before line-feed, and whether to do delays (not necessary with electronics; important in the days when the output device might have been a 110-baud teletype). It can also do things like handle case-insensitive output devices — teletypes, again. Lower-case letters are output in caps, and upper-case letters as backslash and caps.
It used to be that if you typed all upper-case letters to the login prompt, then the login program would automatically convert to the mode where all caps were output with a backslash in front of each actual capital. I suspect that this is no longer done on electronic terminals.
In a comment, TitaniumDecoy asked:
So with non-canonical input, is the input buffer bypassed completely? Also, where do line disciplines come in?
With non-canonical input, the input buffer is still used; if there is no program with a read() call waiting for input from the terminal, the characters are held in the input buffer. What doesn't happen is any editing of the input buffer.
Line disciplines are things like the set of manipulations that the input editing does. So, one aspect of the line discipline is that the erase character erases a prior character in canonical input mode. If you have icase (input case-mapping) set, then upper-case characters are mapped to lower-case unless preceded by a backslash; that is a line discipline, I believe, or an aspect of a line discipline.
I forgot to mention that EOF processing (Control-D) is handled in canonical mode; it actually means 'make the accumulated input available to read()'; if there is no accumulated input (if you type Control-D at the beginning of a line), then the read() will return zero bytes, which is then interpreted as EOF by programs. Of course, you can merrily type more characters on the keyboard after that, and programs that ignore EOF (or run in non-canonical mode) will be quite happy.
Of course, in canonical mode, the characters typed at the keyboard are normally echoed to the screen; you can control whether that echoing occurs. However, this is somewhat tangential to canonical input; the normal editing occurs even when echo is off.
Similarly, the interrupt and quit signals are artefacts of canonical mode processing. So too are the job control signals such as Control-Z to suspend the current process and return to the shell. Likewise, flow control (Control-S, Control-Q to stop and start output) is provided by canonical mode.
Chapter 4 of Rochkind's Advanced Unix Programming, 2nd Edn covers terminal I/O and gives much of this information — and a whole lot more. Other UNIX programming books (at least, the good ones) will also cover it.

Related

How to write to input part in C?

I'm trying to write a mini shell in C for a school project, and what I want to do is doing a sort of command history (like in shell), when I press the UP key it writes the previous input into the input part, DOWN does the opposite, etc..., and you can edit it before pressing enter to send it to the program, like this (sorry for the bad english): [] represents the user cursor
my_shell$ some input wrote by me
my_shell$ []
my_shell$ some other input
my_shell$ []
and now if I press UP
my_shell$ some other input[]
If I press UP again
my_shell$ some input wrote by me[]
I'm allowed to use termcaps and some other functions isatty, ttyname, ttyslot, ioctl, getenv, tcsetattr, tcgetattr, tgetent, tgetflag, tgetnum, tgetstr, tgoto, tputs.
The problem is I can't understand the documentation of ioctl and tty functions, and I can't find well explained tutorials on these functions with examples, and I can't find the documentation for what I'm trying to do with them.
Can someone explain me those functions in an understandable way ? And how should I apply them for what I'm trying to do (I'm searching for a Linux-MacOs compatibility way)
Thanks for your help.
What you are asking is non trivial and will require a fair amount of work. Basically, what you need to do is use tcsetattr to put the terminal into non-canonical mode where the terminal's input line buffering is disabled and every keystroke will be returned immediately rather than waiting for a newline. You will then have to process every keystroke yourself, including backspace/delete and up/down arrows. Because of what you want to do with line editing, you'll probably also have to disable echoing and do the echo yourself.
You'll need to maintain the current line buffer yourself, and you'll also need a data structure that stores all the older lines of input (the history) and when an up-arrow is hit, you'll need to erase what you've currently buffered for the input, and erase it from the screen, then copy the history into your current buffer and echo it to the terminal.
Another complication is that keys like up-arraow and down-arrow are not part of ascii, so won't be read as a single byte -- instead they'll be multibyte escape sequences (likely beginning with an ESC ('\x1b') character). You can use tgetstr to quesry the terminal database to figure out what they are, or just hardcode your shell to use the ANSI sequences which are what almost all terminals you will see these days use.

Guarantee that getchar receives newline or EOF (eventually)?

I would like to read characters from stdin until one of the following occurs:
an end-of-line marker is encountered (the normal case, in my thinking),
the EOF condition occurs, or
an error occurs.
How can I guarantee that one of the above events will happen eventually? In other words, how do I guarantee that getchar will eventually return either \n or EOF, provided that no error (in terms of ferror(stdin)) occurs?
// (How) can we guarantee that the LABEL'ed statement will be reached?
int done = 0;
while (!0) if (
(c = getchar()) == EOF || ferror(stdin) || c == '\n') break;
LABEL: done = !0;
If stdin is connected to a device that always delivers some character other than '\n', none of the above conditions will occur. It seems like the answer will have to do with the properties of the device. Where can those details be found (in the doumentation for compiler, device firmware, or device hardware perhaps)?
In particular, I am interested to know if keyboard input is guaranteed to be terminated by an end-of-line marker or end-of-file condition. Similarly for files stored on disc / SSD.
Typical use case: user enters text on the keyboard. Program reads first few characters and discards all remaining characters, up to the end-of-line marker or end-of-file (because some buffer is full or after that everything is comments, etc.).
I am using C89, but I am curious if the answer depends on which C standard is used.
You can't.
Let's say I run your program, then I put a weight on my keyboard's "X" key and go on vacation to Hawaii. On the way there, I get struck by lightning and die.
There will never be any input other than 'x'.
Or, I may decide to type the complete story of Moby Dick, without pressing enter. It will probably take a few days. How long should your program wait before it decides that maybe I won't ever finish typing?
What do you want it to do?
Looking at all the discussion in the comments, it seems you are looking in the wrong place:
It is not a matter of keyboard drivers or wrapping stdin.
It is also not a matter of what programming language you are using.
It is a matter of the purpose of the input in your software.
Basically, it is up to you as a programmer to know how much input you want or need, and then decide when to stop reading input, even if valid input is still available.
Note, that not only are there devices that can send input forever without triggering EOF or end of line condition, but there are also programs that will happily read input forever.
This is by design.
Common examples can be found in POSIX style OS (like Linux) command line tools.
Here is a simple example:
cat /dev/urandom | hexdump
This will print random numbers for as long as your computer is running, or until you hit Ctrl+C
Though cat will stop working when there is nothing more to print (EOF or any read error), it does not expect such an end, so unless there is a bug in the implementation you are using it should happily run forever.
So the real question is:
When does your program need to stop reading characters and why?
If stdin is connected to a device that always delivers some character other than '\n', none of the above conditions will occur.
A device such as /dev/zero, for example. Yes, stdin can be connected to a device that never provides a newline or reaches EOF, and that is not expected ever to report an error condition.
It seems like the answer will have to do with the properties of the device.
Indeed so.
Where can those details be found (in the doumentation for compiler, device firmware, or device hardware perhaps)?
Generally, it's a question of the device driver. And in some cases (such as the /dev/zero example) that's all there is anyway. Generally drivers do things that are sensible for the underlying hardware, but in principle, they don't have to do.
In particular, I am interested to know if keyboard input is guaranteed to be terminated by an end-of-line marker or end-of-file condition.
No. Generally speaking, an end-of-line marker is sent by a terminal device if and only if the <enter> key is pressed. An end-of-file condition might be signaled if the terminal disconnects (but the program continues), or if the user explicitly causes one to be sent (by typing <-<D> on Linux or Mac, for example, or <-<Z> on Windows). Neither of those events need actually happen on any given run of a program, and it is very common for the latter not to do.
Similarly for files stored on disc / SSD.
You can generally rely on data read from an ordinary file to contain newlines where they are present in the file itself. If the file is open in text mode, then the system-specific text line terminator will also be translated to a newline, if it differs. It is not necessary for a file to contain any of those, so a program reading from a regular file might never see a newline.
You can rely on EOF being signaled when a read is attempted while the file position is at or past the and of the file's data.
Typical use case: user enters text on the keyboard. Program reads first few characters and discards all remaining characters, up to the end-of-line marker or end-of-file (because some buffer is full or after that everything is comments, etc.).
I think you're trying too hard.
Reading to end-of-line might be a reasonable thing to do in some cases. Expecting a newline to eventually be reached is reasonable if the program is intended to support interactive use. But trying to ensure that invalid data cannot be fed to your program is a losing cause. Your objective should be to accept the widest array of inputs you reasonably can, and to fail gracefully when other inputs are presented.
If you need to read input in a line-by-line mode then by all means do that, and document that you do it. If only the first n characters of each line are significant to the program then document that, too. Then, if your program never terminates when a user connects its input to /dev/zero that's on them, not on you.
On the other hand, try to avoid placing arbitrary constraints, especially on sizes of things. If there is not a natural limit on the size of something, then no artificial limit you introduce will ever be enough.

C programming getch(), getchar(), getche() [duplicate]

What is the exact difference between the getch and getchar functions?
getchar() is a standard function that gets a character from the stdin.
getch() is non-standard. It gets a character from the keyboard (which may be different from stdin) and does not echo it.
The Standard C function is is getchar(), declared in <stdio.h>. It has existed basically since the dawn of time. It reads one character from standard input (stdin), which is typically the user's keyboard, unless it has been redirected (for example via the shell input redirection character <, or a pipe).
getch() and getche() are old MS-DOS functions, declared in <conio.h>, and still popular on Windows systems. They are not Standard C functions; they do not exist on all systems. getch reads one keystroke from the keyboard immediately, without waiting for the user to hit the Return key, and without echoing the keystroke. getche is the same, except that it does echo. As far as I know, getch and getche always read from the keyboard; they are not affected by input redirection.
The question naturally arises, if getchar is the standard function, how do you use it to read one character without waiting for the Return key, or without echoing? And the answers to those questions are at least a little bit complicated. (In fact, they're complicated enough that I suspect they explain the enduring popularity of getch and getche, which if nothing else are very easy to use.)
And the answer is that getchar has no control over details like echoing and input buffering -- as far as C is concerned, those are lower-level, system-dependent issues.
But it is useful to understand the basic input model which getchar assumes. Confusingly, there are typically two different levels of buffering.
As the user types keys on the keyboard, they are read by the operating system's terminal driver. Typically, in its default mode, the terminal driver echoes keystrokes immediately as they are typed (so the user can see what they are typing). Typically, in its default mode, the terminal driver also supports some amount of line editing -- for example, the user can hit the Delete or Backspace key to delete an accidentally-typed character. In order to support line editing, the terminal driver is typically collecting characters in an input buffer. Only when the user hits Return are the contents of that buffer made available to the calling program. (This level of buffering is present only if standard input is in fact a keyboard or other serial device. If standard input has been redirected to a file or pipe, the terminal driver is not in effect and this level of buffering does not apply.)
The stdio package reads characters from the operating system into its own input buffer. getchar simply fetches the next character from that buffer. When the buffer is empty, the stdio package attempts to refill it by reading more characters from the operating system.
So, if we trace what happens starting when a program calls getchar for the first time: stdio discovers that its input buffer is empty, so it tries to read some characters from the operating system, but there aren't any characters available yet, so the read call blocks. Meanwhile, the user may be typing some characters, which are accumulating in the terminal driver's input buffer, but the user hasn't hit Return yet. Finally, the user hits Return, and the blocked read call returns, returning a whole line's worth of characters to stdio, which uses them to fill its input buffer, out of which it then returns the first one to that initial call to getchar, which has been patiently waiting all this time. (And then if the program calls getchar a second or third time, there probably are some more characters -- the next characters on the line the user typed -- available in stdio's input buffer for getchar to return immediately. For a bit more on this, see section 6.2 of these C course notes.)
But in all of this, as you can see, getchar and the stdio package have no control over details like echoing or input line editing, because those are handled earlier, at a lower level, in the terminal driver, in step 1.
So, at least under Unix-like operating systems, if you want to read a character without waiting for the Return key, or control whether characters are echoed or not, you do that by adjusting the behavior of the terminal driver. The details vary, but there's a way to turn echo on and off, and a way (actually a couple of ways) to turn input line editing on and off. (For at least some of those details, see this SO question, or question 19.1 in the old C FAQ list.)
When input line editing is turned off, the operating system can return characters immediately (without waiting for the Return key), because in that case it doesn't have to worry that the user might have typed a wrong keystroke that needs to be "taken back" with the Delete or Backspace key. (But by the same token, when a program turns off input line editing in the terminal driver, if it wants to let the user correct mistakes, it must implement its own editing, because it is going to see --- that is, successive calls to getchar are going to return -- both the user's wrong character(s) and the character code for the Delete or Backspace key.)
getch() it just gets an input but never display that as an output on the screen despite of us pressing an enter key.
getchar() it gets an input and display it on the screen when we press the enter key.
getchar is standard C, found in stdio.h. It reads one character from stdin(the standard input stream = console input on most systems). It is a blocking call, since it requires the user to type a character then press enter. It echoes user input to the screen.
getc(stdin) is 100% equivalent to getchar, except it can also be use for other input streams.
getch is non-standard, typically found in the old obsolete MS DOS header conio.h. It works just like getchar except it isn't blocking after the first keystroke, it allows the program to continue without the user pressing enter. It does not echo input to the screen.
getche is the same as getch, also non-standard, but it echoes input to the screen.

Why doesn't getchar() read characters such as backspace?

This is a very basic C question, coming from page 18 of Kernighan and Ritchie.
I've compiled this very simple code for counting characters input from the keyboard:
#include <stdio.h>
/* count characters in input; 1st version */
main()
{
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%1d\n", nc);
}
This compiles fine, runs fine, and behaves pretty much as expected i.e. if I enter "Hello World", it returns a value of 11 when I press CTRLD to give the EOF character.
What is confusing me is if I make a mistake, I can use backspace to delete the characters and re-enter them, and it returns only the number of characters displayed by the terminal when I invoke EOF.
If the code is counting each character, including special characters, if I type four characters, delete two, and type another two, shouldn't that output as 8 characters (4 char + 2 del + 2 char), not 4?
I'm obviously misunderstanding how C handles backspace, and how/when the code is incrementing the variable nc?
Typically, your terminal session is running in "line mode", that is, it only passes data to your program when a line is complete (eg, you pressed Return, etc). So you only see the line as it is complete (with any editing having been done before your program ever sees anything). Typically this is a good thing, so every program doesn't need to deal with delete/etc.
On most systems (eg Unix-based systems, etc), it is possible to put the terminal into "raw" mode -- that is, each character is passed as received to the program. For example, screen-oriented text editors commonly do this.
It's not that getchar() doesn't count the "deletions" but it doesn't even see the input until it's passed to your program by the terminal driver.
When you input something, it doesn't reach your C program until you press \n or send EOF (or EOL). This is what POSIX defines as Canonical Mode Input Processing - which is typically the default mode.
Backspace characters are normally used to edit input in cooked tty mode (see canonical input mode in tty(4) in BSD and termios(3) in linux systems), so they are consumed at the tty driver, and don't get to the input the process gets after that. The same applies to Ctrl-D as the end of file or to Ctrl-K as the kill input character. There are several things the driver does behind the scenes that your process doesn't get finally. These are directed to make life easier to users and programmers, as you normally don't want erased input in your life (that's the reason of erasing it), or want line endings to be \n and not \r as the tty normally generates when you press the [RETURN] key. But if you read from a file that happens to have backspaces, you'll get them as normal input anyway, just create a file with backspaces and try to read redirecting input from it, and you'll see those characters in your input.
By the way, if you want to generate backspaces at the terminal, just prepend a Ctrl-V character before each (this is also managed at the tty driver and will not happen when reading from a file) and you'll see your backspace chars as normal input in your file (to send a Ctrl-V just double it)

What is the difference between getch() and getchar()?

What is the exact difference between the getch and getchar functions?
getchar() is a standard function that gets a character from the stdin.
getch() is non-standard. It gets a character from the keyboard (which may be different from stdin) and does not echo it.
The Standard C function is is getchar(), declared in <stdio.h>. It has existed basically since the dawn of time. It reads one character from standard input (stdin), which is typically the user's keyboard, unless it has been redirected (for example via the shell input redirection character <, or a pipe).
getch() and getche() are old MS-DOS functions, declared in <conio.h>, and still popular on Windows systems. They are not Standard C functions; they do not exist on all systems. getch reads one keystroke from the keyboard immediately, without waiting for the user to hit the Return key, and without echoing the keystroke. getche is the same, except that it does echo. As far as I know, getch and getche always read from the keyboard; they are not affected by input redirection.
The question naturally arises, if getchar is the standard function, how do you use it to read one character without waiting for the Return key, or without echoing? And the answers to those questions are at least a little bit complicated. (In fact, they're complicated enough that I suspect they explain the enduring popularity of getch and getche, which if nothing else are very easy to use.)
And the answer is that getchar has no control over details like echoing and input buffering -- as far as C is concerned, those are lower-level, system-dependent issues.
But it is useful to understand the basic input model which getchar assumes. Confusingly, there are typically two different levels of buffering.
As the user types keys on the keyboard, they are read by the operating system's terminal driver. Typically, in its default mode, the terminal driver echoes keystrokes immediately as they are typed (so the user can see what they are typing). Typically, in its default mode, the terminal driver also supports some amount of line editing -- for example, the user can hit the Delete or Backspace key to delete an accidentally-typed character. In order to support line editing, the terminal driver is typically collecting characters in an input buffer. Only when the user hits Return are the contents of that buffer made available to the calling program. (This level of buffering is present only if standard input is in fact a keyboard or other serial device. If standard input has been redirected to a file or pipe, the terminal driver is not in effect and this level of buffering does not apply.)
The stdio package reads characters from the operating system into its own input buffer. getchar simply fetches the next character from that buffer. When the buffer is empty, the stdio package attempts to refill it by reading more characters from the operating system.
So, if we trace what happens starting when a program calls getchar for the first time: stdio discovers that its input buffer is empty, so it tries to read some characters from the operating system, but there aren't any characters available yet, so the read call blocks. Meanwhile, the user may be typing some characters, which are accumulating in the terminal driver's input buffer, but the user hasn't hit Return yet. Finally, the user hits Return, and the blocked read call returns, returning a whole line's worth of characters to stdio, which uses them to fill its input buffer, out of which it then returns the first one to that initial call to getchar, which has been patiently waiting all this time. (And then if the program calls getchar a second or third time, there probably are some more characters -- the next characters on the line the user typed -- available in stdio's input buffer for getchar to return immediately. For a bit more on this, see section 6.2 of these C course notes.)
But in all of this, as you can see, getchar and the stdio package have no control over details like echoing or input line editing, because those are handled earlier, at a lower level, in the terminal driver, in step 1.
So, at least under Unix-like operating systems, if you want to read a character without waiting for the Return key, or control whether characters are echoed or not, you do that by adjusting the behavior of the terminal driver. The details vary, but there's a way to turn echo on and off, and a way (actually a couple of ways) to turn input line editing on and off. (For at least some of those details, see this SO question, or question 19.1 in the old C FAQ list.)
When input line editing is turned off, the operating system can return characters immediately (without waiting for the Return key), because in that case it doesn't have to worry that the user might have typed a wrong keystroke that needs to be "taken back" with the Delete or Backspace key. (But by the same token, when a program turns off input line editing in the terminal driver, if it wants to let the user correct mistakes, it must implement its own editing, because it is going to see --- that is, successive calls to getchar are going to return -- both the user's wrong character(s) and the character code for the Delete or Backspace key.)
getch() it just gets an input but never display that as an output on the screen despite of us pressing an enter key.
getchar() it gets an input and display it on the screen when we press the enter key.
getchar is standard C, found in stdio.h. It reads one character from stdin(the standard input stream = console input on most systems). It is a blocking call, since it requires the user to type a character then press enter. It echoes user input to the screen.
getc(stdin) is 100% equivalent to getchar, except it can also be use for other input streams.
getch is non-standard, typically found in the old obsolete MS DOS header conio.h. It works just like getchar except it isn't blocking after the first keystroke, it allows the program to continue without the user pressing enter. It does not echo input to the screen.
getche is the same as getch, also non-standard, but it echoes input to the screen.

Resources