How is this buffer really working? - c

as a Linux system programming exercise I've written my own version of the tree command, which is to read from stdin and write to stdout using only the basic read() and write() C library functions. I've done it so that when an asterisk (*) is entered, the program is terminated. I have managed to get it to work properly, my problem is that I don't really understand why it works the way it does. What confuses me is the buffer. First of all, here is the code portion in question:
char buf[1];
...
do {
read(STDIN_FILENO, buf, 1);
if( buf[0] == '*') break;
write(STDOUT_FILENO, buf, 1);
} while( buf[0] != '*');
...
My idea was to read from stdin char by char, thereby storing the char in buf, check if it was an asterisk, then write the char from buf to stdout.
The behaviour is the following: I type a string of any number of chars, press ENTER, that string gets output to stdout, at which point I can type a new char string. If the string ends with an asterisk, the string is output up until the asterisk, then the program is terminated.
My problems are:
1) buf is sopposed to contain only one char. How is it possible that I enter any number of chars und upon pressing ENTER all of them are output to stdout? I would expect one char at a time to be output, or only the last one. How does a one-char buffer store all of those chars? Or do many one-char buffers get created? By whom?
2) What is so special about the newline character that prompts the string to be output? Why is it not just another char within the string? Is it just a matter of definition within the function read()?
Thank you for any help in understanding the working of the buffer!

This is based upon the way the IO calls - read and write will work on most OS's.
You are reading only 1 byte, so while you are typing, stuff will be held by an io buffer (not yours), until your loop reads it. Since you have no sleeps, it will be reading, or waiting to read faster than you can humanly type.
Also as R Sahu suggests - the input buffer may not be presented to your program until you press enter on the console you are typing at. This depends on the console and its config - but most will buffer lines and wait for enter too. This would be different if you were piping into stdin.
The last parameter to read, the '1', is what instructs it to read one byte here.
The second part is that your output is also buffered, and newline is commonly used by console output buffers to flush and show the line. Until that case, it is being written by your code to that output buffer. If you do not want this behaviour, then an fflush call after the write should output character by character instead.

When you type in your input at a console, the input characters are not immediately fed to stdin. After you press the Enter button, the entire line you typed, including the newline character, are is fed to stdin by the run time environment.

Related

C confusion of printf, and gets method

Im new to C. I have a problem in understanding a piece of code.
What I don't understand are two things. The second argument of fgets is the maximum length of bytes can be stored in the buffer.
Why if I type more letters in the terminal and hit enter still the string is printed back in full. I am assuming that if the length of the string inserted in the console is larger than the buffer will overflow and the printf will work because it stops on the termination of the string, but then what it is the point of setting a max limit as second argument to fgets?
#define buff_size 4
//3.71
void good_echo(){
char buf[buff_size];
while(1) {
char* p = fgets(buf, 8, stdin);
if(p == NULL) {
break;
}
printf("%s", p);
}
return;
}
fgets will stop when it reads the enter key OR when the buffer is full.
If you say the buffer is 8 characters long, and you type abcdef<enter> it will put in the buffer a, b, c, d, e, f, \n and \0 (8 characters).
If you say the buffer is 8 characters long, and you type abcdefgh<enter>, it will put in the buffer a, b, c, d, e, f, g and \0 (8 characters). There is still h<enter> left over, which will be read the next time you call fgets (or gets or getchar or scanf etc)
The reason because you're still getting back what you write is that stdin is buffered, that means when you read less characters than you actually wrote, considering you're in a while loop and fgets doesn't give any error(so you don't break in the if statement) you do once more the fgets and get the remaining chars.
Why if I type more letters in the stdin still the string is printed back. I am assuming that if the length of the string inserted in the console is larger than the buffer will overflow and the printf will work because it stops on the termination of the string,
You get the full input echoed back by the program because the fgets() and printf() calls are inside a loop. Each fgets() call will read and store as many characters as will fit in the buffer, up to and including a newline if one is in fact encountered before the available buffer space is exhausted. If there is more data than will fit in the buffer -- because you typed more characters than can fit at once, or because you typed ahead multiple lines at hyperspeed -- then whatever is not read by fgets() remains in the stream, waiting to be read via some future call to an I/O function.
In your program, the printf() echoes back the characters that were read and stored, and control then loops back to fgets(). If there are more characters available to read, then it will read them, up to, again, the buffer capacity or a newline, whichever comes first. In this way, one long line may be consumed from the standard input and echoed to the standard output over multiple iterations of the loop.
but then what it is the point of setting a max limit as second argument to fgets?
It does exactly what it is advertized to do. On any call, fgets() will write up to that many bytes into the provided buffer.
I believe the confusion is related to the behavior of the terminal rather than the behavior of the program. When you type a single character into the terminal, it is displayed on the screen by the terminal driver. That behavior has nothing to do with your program. Eventually, the fgets function will read some data from stdin, but that may happen many seconds or minutes (or hours, if you go on a long lunch) after you initially typed the key. Generally, the terminal driver will hold on to the data until you hit 'enter' (or 'return'), at which point a line of text will be sent to the program. It might be easier to see if you actually write some data and see what your program is doing. eg:
$ cat a.c
#include <signal.h>
#include <stdio.h>
#define buff_size 4
void
good_echo(void)
{
char buf[buff_size];
char *p;
while( (p = fgets(buf, sizeof buf, stdin)) != NULL ){
printf("good_echo read: %s\n", p);
}
}
int
main(int argc, char **argv)
{
good_echo();
return 0;
}
$ gcc a.c
$ ./a.out
this is some text that is typed
good_echo read: thi
good_echo read: s i
good_echo read: s s
good_echo read: ome
good_echo read: te
good_echo read: xt
good_echo read: tha
good_echo read: t i
good_echo read: s t
good_echo read: ype
good_echo read: d
In the above, you can see that each call to fgets only consumes 3 bytes.

puts(), gets(), getchar(), putchar() function simultaneously use in the program

I have a confusion related to using puts(), gets(), putchar() and getchar() simultaneously use in the code.
When I have run the below code, it is doing all steps:
taking the input, printing the output, again taking the input, printing the output.
#include <stdio.h>
int main() {
char ch[34];
gets(ch);
puts(ch);
char g;
g = getchar();
putchar(g);
}
Output:
Priyanka
Priyanka
J
J
But, when I am using this code:
It is only doing two steps:
taking the input, printing the input, then one line space. I am not getting why it behaves like this.
Code:
#include <stdio.h>
int main() {
char g;
g = getchar();
putchar(g);
char ch[34];
gets(ch);
puts(ch);
getch();
}
Output:
P
P
There are some problems in the code and the input mechanisms are more complex than you infer:
you should not read input with gets(): this function cannot be used safely because it does not receive information about the destination array size so any sufficiently long input line will cause a buffer overflow. It has been removed from the C Standard. You should use fgets() instead and deal with the newline at the end of the buffer.
g should have type int to accommodate for all the values returned by getc(), namely all values of type unsigned char (in most current systems 0 to 255) and the special negative value EOF (usually -1).
Here is a modified version:
#include <stdio.h>
int main() {
char ch[34];
if (fgets(ch, sizeof ch, stdin))
fputs(ch, stdout);
int g = getchar();
if (g != EOF)
putchar(g);
return 0;
}
Output:
Priyanka
Priyanka
J
J
Regarding the behavior of the console in response to your program's input requests, it is implementation defined but usually involves 2 layers of buffering:
the FILE stream package implements a buffering scheme where data is read from or written to the system in chunks. This buffering can be controlled with setvbuf(). 3 settings are available: no buffering (which is the default for stderr), line buffered (usually the default for stdin and stdout when attached to a character device) and fully buffered with a customisable chunk size (common sizes are 512 and 4096).
when you call getchar() or more generally getc(stream), if a byte is available in the stream's buffer, it is returned and the stream position is incremented, otherwise a request is made to the system to fill the buffer.
if the stream is attached to a file, filling the buffer performs a read system call or equivalent, which succeeds unless at the end of file or upon a read error.
if the stream is attached to a character device, such as a terminal or a virtual tty like a terminal window on the graphics display, another layer of buffering gets involved where the device driver reads input from the input device and handles some keys in a special way such as Backspace to erase the previous character, cursor movement keys to move inside the input line, Ctrl-D (unix) or Ctrl-Z (windows) to signal the end of file. This layer of buffering can be controlled via the tcsetattr() system call or other system specific APIs. Interactive applications such as text editors typically disable this and retrieve raw input directly from the input device.
the keys typed by the user are handled by the terminal to form an input line, send back to the C stream API when the user types Enter (which is translated as a system specific end of line sequence), the stream functions perform another set of transformations (ie: converting CR/LF to '\n' on legacy systems) and the line of bytes is stored in the stream buffer. When getc() finally gets a chance to return the first available byte, the full line has already been typed and entered by the user and is pending in the stream or the device buffers.
In both programs, getchar() does not return the next byte read from stdin until a full line has been read from the terminal and stored in the stream buffer. In the first program, the rest of this line is ignored as the program exits, but in the second program, the rest of this line is available for the subsequent gets() to read. If you typed J and Enter, the line read is J\n and getchar() returns the 'J', leaving the newline [ending in the input stream, then gets() will read the newline and return an empty line.
Between the lines of the statements putchar() and gets() that I don't recommend using, discarding the input stream until EOF or till a newline occurs solves the problem:
.
.
int c;
while ((c = getchar()) != EOF && c != '\n')
;
.
.
I would recommend using fgets(3) which is quite safer to use, for example:
char str[1024];
if (fgets(str, sizeof str, stdin) == NULL) {
// Some problem, handle the error...
}
// or, Input is okay...
Well, you have a problem here. You use a function in your second sample code that is not part of the stdio package.
You call getch() which is not a stdio function. It is part of the ncurses library, and, if you don't specify on compilation that you will use it, then you cannot get an executable program. So this make me thing you are not telling all the truth.
Just taking the function getch() of of the program you get the full line
Priyanka
output, and the program terminated. I guess you used getch() to stop the output until you press a character. But as curses library requires you to call initscr() before calling any other curses library function, it is not correctly initialized, and the output you get can be wrong due to this.
I'll not repeat what others have already told you about the use of gets(), it is still in the standard library, and knowing what you do, you can still use it in properly controlled environments. Despite of that, the recommendation others have given to you is not applicable here, as you have not overflowed the short buffer you have used (of only 34 chars, too short, too easy to hang your program or to crash it)
The functions from stdio use a buffer, and the unix tty driver is also interferring here. Your terminal will not make available any character you input to the program until you press the <ENTER> key, then all those characters are read by the program into a buffer. They are consumed from the buffer, until it is empty, so it doesn't matter if you read them one by one (with fgetch(), or all at once (with fgets() ---i'll use this, more secure, function, from this point on) Everything just happens once you press the <ENTER> key.
fgetch() only takes one character, so if more than one are available, only one character is taken from the buffer, and the rest wait their turn. But fgets() reads all (and fills the buffer) until a \n is read (this is why gets() is so dangerous, because it doesn't know the size of your buffer /it doesn't have a parameter indicating the size of the buffer, as fgets() has/ and cannot control the read to stop before overflowing it)
So, in your case, as you press a series of characters, then hit return, the first sample reads the full string, and then the second getchar() takes the first of the second line (but you need to input two complete lines at that point) The second sample read the first char when you called getchar(), and the rest of the line when you called gets().
To read one character at a time, without waiting for a full line to be input, the terminal driver has to be programmed to read characters in raw mode. Cookied mode (the default) is used by unix to read complete lines, this allows you to edit, erase characters on the line, and only input it when you are ready and hit the <ENTER> key.
If you are interested in reading chars one by one from the terminal, read the manual page termios(4) which explains the interface and iocontrols to the tty device. The curses library does the necessary housekeeping to put the terminal in raw mode to allow programs like vi(1) to read the input char by char, but you need then not to use stdio directly, as its buffering system will eat the characters you try to get to eat with curses.

using fgets() with stdin as input: ^D fails to signal EOF

I'm writing a program in C on my MacBook which uses Mojave and I'm trying to use fgets() to get a string from stdin.
My code compiles - the only issue is that when I run the program in the terminal, after fgets() is called and I type in the desired input, I can't figure out how to signal the end of the input so that the program can continue running.
I recognise many people have had this issue and that there are many pages on this site addressing it. But none of the solutions (that I have understood) have worked for me. I've read this and this but these aren't helping.
I've checked out the documentation for fgets() which says:
"fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an *EOF* or a newline. If a newline is read, it is stored into the buffer. A terminating null byte (\0) is stored after the last character in the buffer." - from this page.
Entering 'stty all' in the terminal shows that EOF indeed corresponds to ^D. I've tried entering ^D twice, three times, pressing Enter then ^D, ^D then Enter, etc. etc. Nothing seems to work.
What am I doing wrong? Here's the relevant bit of the code (originally from here, under the 'Pointers to Structures Containing Pointers' section):
#include <stdio.h>
typedef struct
{
char name[21];
char city[21];
char phone[21];
char *comment;
} Address;
int main(void)
{
Address s;
char comm[100];
fgets(s.name, 20, stdin);
fgets(s.city, 20, stdin);
fgets(s.phone, 20, stdin);
fgets(comm, 100, stdin);
return 0;
}
You do not test the return value of fgets(): if you indeed signal an end of file from the terminal, the subsequent calls to fgets() will return NULL and the destination arrays will be left uninitialized.
There is nothing in your code that prevents program operation at end of file. Just hit enter after each piece of input. Why do you think you need to signal end of file?

About the mechanism of using fgets() and stdin together

I would like to have a better understanding of using fgets() and stdin.
The following is my code:
int main()
{
char inputBuff[6];
while(fgets(inputBuff, 6, stdin))
{
printf("%s", inputBuff);
}
return 0;
}
Let's say my input is aaaabbbb and I press Enter. By using a loopcount, I understand that actually the loop will run twice (including the one I input aaaabbbb) before my next input.
Loop 1: After I have typed in the characters, aaaabbbb\n will be stored in the buffer of stdin file stream. And fgets() is going to retrieve a specific number of data from the file stream and put them in inputBuff. In this case, it will retrieve 5 (6 - 1) characters at a time. So that when fgets() has already run once, inputBuff will store aaaab, and then be printed.
Loop 2: Then, since bbb\n are left in the file stream, fgets() will execute for the second time so that inputBuff contains bbb\n, and then be printed.
Loop 3: The program will ask for my input (the 2nd time) as the file stream has reached the end (EOF).
Question: It seems that fgets() will only ask for my keyboard input after stdin stream has no data left in buffer (EOF). I am just wondering why couldn't I use keyboard to input anything in loop 2, and fgets() just keep on retrieving 5 characters from stdin stream and left the excess data in the file stream for next time retrieval. Do I have any misunderstanding about stdin or fgets()? Thank you for your time!
The behavior of your program is somewhat more subtle than you expect:
fgets(inputBuff, 6, stdin) reads at most 5 bytes from stdin and stops reading when it gets a newline character, which is stored into the destination array.
Hence as you correctly diagnose, the first call reads the 5 bytes aaab and prints them and the second call reads 4 bytes bbb\n and prints them, then the third call gets an empty input stream and waits for user input.
The tricky part is how stdin gets input from the user, also known as console input.
Both console input and stdin are usually line buffered by default, so you can type a complete line of input regardless of the size of the buffer passed to fgets(). Yet if you can set stdin as unbuffered and console input as uncooked, the first fgets() would indeed read the first 5 bytes as soon as you type them.
Console input is an intricate subject. Here is an in depth article about its inner workings: https://www.linusakesson.net/programming/tty/
Everything is there in manual page of fgets() whatever you are asking. Just need to read it properly, It says
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than sizecharacters
from stream and stores them into the buffer pointed to by s. Reading
stops after an EOF or a newline. If a newline is read, it is
stored into the buffer. A terminating null byte (aq\0aq) is stored
after the last character in the buffer.
If input is aaaabbbb and in fgets() second argument you specified size as 6 i.e it will read one less 5 character and terminating \0 will be added so first time inputBuff holds aaaab and since still EOF or \n didn't occur so next time inputBuff holds bbb\n as new line also get stored at last.
Also you should check the return type of fgets() and check if \n occurs then break the loop. For e.g
char *ptr = NULL;
while( (ptr = fgets(inputBuff, 6, stdin))!= NULL){
if(*ptr == '\n')
break;
printf("%s", inputBuff);
}
fgets() does only read until either '\n' or EOF. Everything after that will be left in stdin and therefore be read when you call fgets() again. You can however remove the excess chars from stdin by for example using getc() until you reach '\0'. You might want to look at the manpages for that.

Clarification regarding functioning of getchar()/putchar()

In this code:
#include<stdio.h>
int main()
{
int i,p=0;
while(i!=EOF)
{
i=getchar();
putchar(i);
printf("\n");
}
return 0;
}
When I enter hello as input in one go, the output is h then in the next line e and so on. But when h is printed then before printing e why getchar() doesn't take pause to take input from me just like it did in the first time?
getchar() returns either any successfully read character from stdin or some error, so which function is demanding terminal input and then sending it to stdin?
Input from a terminal is generally buffered. This means it is held in memory waiting for your program to read it.
This buffer is performed by multiple pieces of software. The software that is actually reading your input in the terminal window generally accumulates characters you type until you press enter or press certain other keys or combinations that end the current input. Then the line that has been read is made available to your program.
Inside your program, the C standard library, of which getchar is a part, reads the data that has been sent to it and holds it in a buffer of its own. The getchar routine reads the next character from this buffer. (If the buffer is empty when getchar wants another character, getchar will block, waiting for new data to arrive from the terminal software.)
It's because of the loop condition. You are continuing to loop until EOF is received. When you type "hello", it works exactly as you expect except STDIN has more characters in the buffer and none of them are EOF. The program prints out "h", then a newline, and goes back to check the loop condition. EOF has not been found, so then it gets the next character from STDIN (which you have already provided) and the cycle repeats.
If you remove the loop it will only print one character.

Resources