Check if stdin has bytes pending (for escape sequences) - c

In plain C, how can check if standard input has bytes pending without blocking?
The reason for this is handling escape sequences which are multi-byte.
For example, if I use getchar() and the user presses Ctrl-Up Arrow, then 3 bytes are immediately placed in standard input: 1B 4F 41, however getchar() only reads ONE of those bytes. I need to read all three before continuing on. Different escape sequences are of different length, so the logic is that if the first character is an escape character then I read ALL characters currently in the buffer and then process that as an escape unit. But I can't do that with getchar() because it will block when it reaches the end of the buffer. I need to know how many characters are in the buffer.

There's no such provision in standard C; you need to use OS-specific calls to either do non-blocking reads or, as I think is more appropriate here, to read bytes from the input immediately rather than wait for a newline. For the latter, see
setvbuf not able to make stdin unbuffered
Note:
the logic is that if the first character is an escape character then I read ALL characters currently in the buffer and then process that as an escape unit
That really isn't the way to do it, and will fail if keys are pressed rapidly or are autorepeated, or, say, you're reading from a key log file. Each byte determines how many more bytes of escape sequence there are; you should read conditionally based on what you have already seen.

Related

Different behaviour of Ctrl-D (Unix) and Ctrl-Z (Windows)

As per title I am trying to understand the exact behavior of Ctrl+D / Ctrl+Z in a while loop with a gets (which I am required to use). The code I am testing is the following:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char str[80];
while(printf("Insert string: ") && gets(str) != NULL) {
puts(str);
}
return 0;
}
If my input is simply a Ctrl+D (or Ctrl+Z on Windows) gets returns NULL and the program exits correctly. The unclear situation is when I insert something like house^D^D (Unix) or house^Z^Z\n (Windows).
In the first case my interpretation is a getchar (or something similar inside the gets function) waits for read() to get the input, the first Ctrl+D flushes the buffer which is not empty (hence not EOF) then the second time read() is called EOF is triggered.
In the second case though, I noticed that the first Ctrl+Z is inserted into the buffer while everything that follows is simply ignored. Hence my understanding is the first read() call inserted house^Z and discarded everything else returning 5 (number of characters read). (I say 5 because otherwise I think a simple Ctrl+Z should return 1 without triggering EOF). Then the program waits for more input from the user, hence a second read() call.
I'd like to know what I get right and wrong of the way it works and which part of it is simply implementation dependent, if any.
Furthermore I noticed that in both Unix and Windows even after EOF is triggered it seem to reset to false in the following gets() call and I don't understand why this happens and in which line of the code.
I would really appreciate any kind of help.
(12/20/2016) I heavily edited my question in order to avoid confusion
The CTRL-D and CTRL-Z "end of file" indicators serve a similar purpose on Unix and Windows systems respectively, but are implemented quite differently.
On Unix systems (including Unix clones like Linux) CTRL-D, while officially described as the end-of-file character, is actually a delimiter character. It does almost the same thing as the end-of-line character (usually carriage return or CTRL-M) which is used to delimit lines. Both characters tell the operating system that the input line is finished and to make it available the program. The only difference is that with end-of-line character a line feed (CTRL-J) character is inserted at the end of the input buffer to mark the end of the line, while with the end-of-file character nothing is inserted.
This means when you enter house^D^D on Unix the read system call will first return a buffer of length 5 with the 5 characters house in it. When read is called again to obtain more input, it will then returns of a buffer of length 0 with no characters in it. Since a zero length read on a normal file indicates that the end of file has been reached the gets library function also interprets this as the end of file and stops reading the input. However since it filled the buffer with 5 characters it doesn't return NULL to indicate that it reached end of the file. And since it hasn't actually actually reached end of file, as terminal devices aren't actually files, further calls to gets after this will make further calls to read which will return any subsequent characters that the user types.
On Windows CTRL-Z is handled much differently. The biggest difference is that it's not treated specially by the operating system at all. When you type house^Z^Z^M on Windows only the carriage return character is given special treatment. Just like on Unix, the carriage return makes the typed line available to the program, though in this case a carriage return and a line feed are added to the buffer to mark the end of the line. So the result is that ReadFile function returns a 9 byte long buffer with the 9 characters house^Z^Z^M^J in it.
It actually the program itself, specifically the C runtime library, that treats CTRL-Z specially. In the case of the Microsoft C runtime library when it sees the CTRL-Z character in the buffer returned by ReadFile it treats it as an end-of-file marker and ignores everything else after it. Using the example in the previous paragraph, gets ends up calling ReadFile to get more input because the fact its seen the CTRL-Z character isn't remembered when reading from the console (or other device) and it hasn't yet seen the end-of-line (which was ignored). If you then press enter again, gets will return with the buffer filled with the 7 bytes house^Z\0 (adding a 0 byte to indicate the end of the string). By default, it does the much same thing when reading from normal files, if a CTRL-Z character appears in a file, it and everything after it is ignored. This is for backward-compatibility with CP/M which only supported files in lengths that were multiples of 128 and used CTRL-Z to mark where text files really were supposed to end.
Note that both the Unix and Windows behaviours described above are only the normal default handling of user input. The Unix handling of CTRL-D only occurs when reading from a terminal device in canonical mode and it's possible to change the "end-of-file" character to something else. On Windows the operating system never treats CTRL-Z specially, but whether the C runtime library does or not depends on whether the FILE stream being read is in text or binary mode. This is why in portable programs you should always include the character b in the mode string when opening binary files (eg. fopen("foo.gif", "rb")).

File input and output stream in c

Suppose i open a text file in write mode using c language. Now I add some text data to it.
1.Internally how is data stored in file ? Is each character stored as 8 bit ascii code ?
We Never add EOF at the end of writing to file and we use fclose() to close the file .
2.How is then EOF added to file ? How is it stored in file ?
When we read character by character of that file using getchar() , We are able to detect EOF. Now EOF if is ctrl+z , these are 2 characters ^z are saved at end of file. So getchar() will get ^ and then z . so,
3.How does getchar() detects EOF ?
EOF is not a character that gets stored in a file, it is a special return code that you get when you read a file. The file I/O system knows how many characters there are in a file, because it stores the exact length of the file. When your program tries to read a character after the last available character, the file I/O system returns a special value EOF, which is outside the range of char (it is for that reason that character reading routines such as getchar() return an int instead of a char).
The Ctrl+Z sequence is not an EOF character either. It is a special sequence of keys that tells the shell to close the console input stream associated with the program. Once the stream is closed, the next read returns EOF to your program. It is important to understand, however, that Ctrl+Z is merely a keyboard sequence that is interpreted by the command line processor - in the same way that Ctrl+C is a sequence that tells the command line processor to terminate the program.
Finally, ^Z is not two characters that get stored in a file, it's a screen representation of the Ctrl+Z sequence produced by the command line processor to confirm visually that the keyboard sequence has been accepted.
Typically C will be using Latin-1 or some other single byte
encoding, but it should be possible to use UTF-8 locale setting.
Note that most C character/string handling routines will not
properly handle UTF-8 or any other multibyte encoding -- you have to use special libraries.
It depends on the Operating System used, but most will simply store
a continuous stream of characters, with a Line-End (CR-LF in
Windows, \n in Unixy systems) character to mark the end of the line
(YOU have to explicitly put it there).
Some Operating Systems, such as MS-DOS, may explicitly write an EOF
character to the end of the file, but most don't. They simply run
off the end of the file and report a status of EOF.
See 2.

Misunderstand line-buffer in Unix

I'm reading Advanced Programming in the UNIX Environment, 3rd Edition and misunderstanding a section in it (page 145, Section 5.4 Buffering, Chapter 5).
Line buffering comes with two caveats. First, the size of the buffer that the
standard I/O library uses to collect each line is fixed, so I/O might take place if
we fill this buffer before writing a newline. Second, whenever input is
requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel),
all line-buffered output streams are flushed. The reason for the qualifier on (b)
is that the requested data may already be in the buffer, which doesn’t require
data to be read from the kernel. Obviously, any input from an unbuffered
stream, item (a), requires data to be obtained from the kernel.
I can't get the bold lines. My English isn't good. So, could you clarify it for me? Maybe in an easier way. Thanks.
The point behind the machinations described is to ensure that prompts appear before the system goes into a mode where it is waiting for input.
If an input stream is unbuffered, every time the standard I/O library needs data, it has to go to the kernel for some information. (That's the last sentence.) That's because the standard I/O library does not buffer any data, so when it needs more data, it has to read from the kernel. (I think that even an unbuffered stream might buffer one character of data, because it would need to read up to a space character, for example, to detect when it has reached the end of a %s format string; it has to put back (ungetc()) the extra character it read so that the next time it needs a character, there is the character it put back. But it never needs more than the one character of buffering.)
If an input stream is line buffered, there may already be some data in its input buffer, in which case it may not need to go to the kernel for more data. In that case, it might not flush anything. This can occur if the scanf() format requested "%s" and you typed hello world; it would read the whole line, but the first scan would stop after hello, and the next scanf() would not need to go to the kernel for the world word because it is already in the buffer.
However, if there isn't any data in the buffer, it has to ask the kernel to read the data, and it ensures that any line-buffered output streams are flushed so that if you write:
printf("Enter name: ");
if (scanf("%63s", name) != 1)
…handle error or EOF…
then the prompt (Enter name:) appears. However, if you'd previously typed hello world and previously read just hello, then the prompt wouldn't necessarily appear because the world was already waiting in the (line buffered) input stream.
This may explain the point.
Let's imagine that you have a pipe in your program and you use it for communication between different parts of your program (single thread program writing and reading from this single pipe).
If you write to the writing end of the pipe, say the letter 'A', and then call the read operation to read from the reading end of the pipe. You would expect that the letter 'A' is read. However, read operation is a system call to the kernel. To be able to return the letter 'A' it must be written to the kernel first. This means that the writing of 'A' must be flushed, otherwise it would stay in your local writing buffer and your program would be locked forever.
In consequence, before calling a read operation all write buffers are flushed. This is what the section (b) says.
The size of the buffer that the standard I/O library is using to collect each line is fixed.
with the help of the fgets function we are getting the line continuously, during that time it will read the content with the specified buffer size or up to newline.
Second, whenever input is requested through the standard I/O library, it can use an unbuffered stream or line-buffered stream.
unbuffered stream - It will not buffer the character, flush the character regularly.
line-buffered - It will store the character into the buffer and then flush when the operation is completed.
lets take without using \n we are going to print the content in printf statement, that time it will buffer all the content until we flush or printing with new line. Like that when the operation is completed the stream buffer is flushed internally.
(b) is that the requested data may already be in the buffer, which doesn't require data to be read from the kernel
In line oriented stream the requested buffer may already in the buffer because the data can be buffered, so we can't required data to read from the kernel once again.
(a) requires data to be obtained from the kernel.
Any input from unbuffered stream item, a data to be get from the kernel due to the unbuffered stream can't store anything in the buffer.

What is the role of '\n' in fprintf C?

I am confused about the role of '\n' in fprintf. I understand that it copies characters into an array and the \n character signals when to stop reading the character in the current buffer. But then, when does it know to make the system call write.
for example, fprintf(stdout,"hello") prints but I never gave the \n character so how does it know to make the system call.
The system call is made when the channel is synced/flushed. This can be when the buffer is full (try writing a LOT without a \n and you'll see output at some point), when a \n is seen (IF you have line buffering configured for the channel, which is the default for tty devices), or when a call to fflush() is made.
When the path is closed, it will be flushed as well. When the process terminates, the operating system will close any open paths it has. Each of these events will lead to the system call to emit the output happening.
First of all, fprintf("hello"); isn't correct, it should be for instance fprintf(stdout, "hello");. Next, \n doesn't implies to stop reading or writing characters, because \n is itself a character, the linefeed (10th in ascii table).
The assumptions you've stated about the purpose and use of \n are wrong. \n is simply one of 255 ASCII characters that are read, i.e. they are not commands, or directives, that cause anything to happen, they are just passive characters that when used in various C functions (eg. sprintf(), printf(), fprintf(), etc.) are interpreted such that presentation of output is manipulated to appear in the desired way.
By the way, \n (new line) is in good company with many other formatting codes which you can see represented HERE

What is the difference between getch() and getchar()?

What is the exact difference between the getch and getchar functions?
getchar() is a standard function that gets a character from the stdin.
getch() is non-standard. It gets a character from the keyboard (which may be different from stdin) and does not echo it.
The Standard C function is is getchar(), declared in <stdio.h>. It has existed basically since the dawn of time. It reads one character from standard input (stdin), which is typically the user's keyboard, unless it has been redirected (for example via the shell input redirection character <, or a pipe).
getch() and getche() are old MS-DOS functions, declared in <conio.h>, and still popular on Windows systems. They are not Standard C functions; they do not exist on all systems. getch reads one keystroke from the keyboard immediately, without waiting for the user to hit the Return key, and without echoing the keystroke. getche is the same, except that it does echo. As far as I know, getch and getche always read from the keyboard; they are not affected by input redirection.
The question naturally arises, if getchar is the standard function, how do you use it to read one character without waiting for the Return key, or without echoing? And the answers to those questions are at least a little bit complicated. (In fact, they're complicated enough that I suspect they explain the enduring popularity of getch and getche, which if nothing else are very easy to use.)
And the answer is that getchar has no control over details like echoing and input buffering -- as far as C is concerned, those are lower-level, system-dependent issues.
But it is useful to understand the basic input model which getchar assumes. Confusingly, there are typically two different levels of buffering.
As the user types keys on the keyboard, they are read by the operating system's terminal driver. Typically, in its default mode, the terminal driver echoes keystrokes immediately as they are typed (so the user can see what they are typing). Typically, in its default mode, the terminal driver also supports some amount of line editing -- for example, the user can hit the Delete or Backspace key to delete an accidentally-typed character. In order to support line editing, the terminal driver is typically collecting characters in an input buffer. Only when the user hits Return are the contents of that buffer made available to the calling program. (This level of buffering is present only if standard input is in fact a keyboard or other serial device. If standard input has been redirected to a file or pipe, the terminal driver is not in effect and this level of buffering does not apply.)
The stdio package reads characters from the operating system into its own input buffer. getchar simply fetches the next character from that buffer. When the buffer is empty, the stdio package attempts to refill it by reading more characters from the operating system.
So, if we trace what happens starting when a program calls getchar for the first time: stdio discovers that its input buffer is empty, so it tries to read some characters from the operating system, but there aren't any characters available yet, so the read call blocks. Meanwhile, the user may be typing some characters, which are accumulating in the terminal driver's input buffer, but the user hasn't hit Return yet. Finally, the user hits Return, and the blocked read call returns, returning a whole line's worth of characters to stdio, which uses them to fill its input buffer, out of which it then returns the first one to that initial call to getchar, which has been patiently waiting all this time. (And then if the program calls getchar a second or third time, there probably are some more characters -- the next characters on the line the user typed -- available in stdio's input buffer for getchar to return immediately. For a bit more on this, see section 6.2 of these C course notes.)
But in all of this, as you can see, getchar and the stdio package have no control over details like echoing or input line editing, because those are handled earlier, at a lower level, in the terminal driver, in step 1.
So, at least under Unix-like operating systems, if you want to read a character without waiting for the Return key, or control whether characters are echoed or not, you do that by adjusting the behavior of the terminal driver. The details vary, but there's a way to turn echo on and off, and a way (actually a couple of ways) to turn input line editing on and off. (For at least some of those details, see this SO question, or question 19.1 in the old C FAQ list.)
When input line editing is turned off, the operating system can return characters immediately (without waiting for the Return key), because in that case it doesn't have to worry that the user might have typed a wrong keystroke that needs to be "taken back" with the Delete or Backspace key. (But by the same token, when a program turns off input line editing in the terminal driver, if it wants to let the user correct mistakes, it must implement its own editing, because it is going to see --- that is, successive calls to getchar are going to return -- both the user's wrong character(s) and the character code for the Delete or Backspace key.)
getch() it just gets an input but never display that as an output on the screen despite of us pressing an enter key.
getchar() it gets an input and display it on the screen when we press the enter key.
getchar is standard C, found in stdio.h. It reads one character from stdin(the standard input stream = console input on most systems). It is a blocking call, since it requires the user to type a character then press enter. It echoes user input to the screen.
getc(stdin) is 100% equivalent to getchar, except it can also be use for other input streams.
getch is non-standard, typically found in the old obsolete MS DOS header conio.h. It works just like getchar except it isn't blocking after the first keystroke, it allows the program to continue without the user pressing enter. It does not echo input to the screen.
getche is the same as getch, also non-standard, but it echoes input to the screen.

Resources