I really confused with how exactly a buffer work. So I write a little snippet to verify:
#include<stdio.h>
#define BUF_SIZE 1024
char buf[BUF_SIZE];
char arr[20];
int main()
{
FILE* fs=fopen("test.txt","r");
setvbuf(fs,buf,_IOFBF,1024);
fread(arr,1,1,fs);
printf("%s",arr);
getchar();
return 0;
}
As you see, I set the file stream fs to fully buffered stream(I know most of the time it would default to fully-buffered. just making sure). And I also set its related buffer to be size 1024, which mean that the stream would not be flushed until it contain 1024 bytes of stuff(right?).
In my opinion, the routine of fread() is that, it read data from the file stream, store it at its buffer buf,and then the data in the buf would be send to the arr as soon as it is full of 1024 bytes of data(right?).
But now, I read only one character from the stream!!And also, there is are only four characters in the file test.txt. why can I find something in the arr in case that there is only one char(I can print that one character out)
The distinctions between fully-buffered, line-buffered, and unbuffered really only matter for output streams. I'm pretty sure that input streams are pretty much always act like they're fully buffered.
But even for fully-buffered input streams, there's at least one case where the buffer won't be fully full, and as you've discovered, that's where there aren't enough characters left in the input to fill the buffer. If there are only 4 characters in the file, then when the system goes to fill the buffer, it gets those 4 characters and puts them in the buffer, and then you can start taking them out, as usual.
(The same situation would arise any time the file contains a number of characters that's not an exact multiple of the buffer size. For example, if the input file contained 1028 characters, then after filling the buffer with the first 1024 characters and letting you read them, the next time it filled the buffer, it'd end up with 4 again.)
What were you expecting it to do in this case? Block waiting to read 1,020 more characters from the file (that were never going to come)?
P.S. You said "the stream would not be flushed until it contained 1024 bytes of stuff, right?" But flushing is only defined for output streams; it doesn't mean anything for input streams.
From what I understand, an input buffer works different to what you suggested: if you request one Byte to be read, the system reads 1023 more Bytes into the buffer, so on the next 1023 subsequent read calls it can return data directly from the buffer instead of having to read from the file.
Related
I wanted to know how to deal with a situation when I set the size of buffer in fgets to n bytes and the user enters 2n bytes? fgets will reaf the first n bytes, and the other n bytes will be left in stdin, right? Is it a good idea to flush stdin after each fgets, is there any other optiob to deal with that situation?
You know when the input was truncated by the absence of a '\n' newline at the end of the input string.
Except possibly for the last line in a text file, which might not contain any newline.
If there is more input, you can realloc the buffer (originally acquired with malloc) and repeat with the appropriate buffer pointer (and size) to append to what you already have.
No need to flush anything, just pass the right pointer and additional buffer size to fgets.
In any case, flushing input with fflush() is not well defined.
Consider the following, albeit very messy, code in C:
#include<stdio.h>
int main() {
char buf[3]; //a new, small buffer
FILE *fp = fopen("test.txt", "r"); //our test file, with the contents "123abc"
setvbuf(fp, buf, _IOFBF, 2); //we assign our small buffer as fp's buffer \
//in fully buffered mode
char character = fgetc(fp); // get the first character...
character = fgetc(fp); // and the next...
character = fgetc(fp); // and the next... (third character, '3')
buf[2] = '\0'; //add a terminating line for display
fputs(buf, stderr); //write our buffer to stderr, should show up immediately
}
Compiling and running the code will print '3a' as the contents of our self-designated buffer, buf. My question is: how does this buffer get filled? Does a call to fgetc() mean several calls until the buffer is full and then stops (we only made three calls to fgetc, which should not include the present 'a')? The first buffer was "12", so does this mean when another fgetc() call is made and the pointer references something outside of the scope of the buffer, is the buffer purged and then filled with the next block of data, or simply overwritten? I understand buffer sizes are platform dependent so I'm more concerned with how, in general, an fopen()ed stream in a read mode pulls characters into it's buffer.
The buffer, and exactly how and when it is filled, is an implementation detail inside the stdio package. But the way it is likely to be implemented is that fgetc gets one character from the buffer, if there are characters available in the buffer. If the buffer is empty, it fills it by reading (in your case) two more characters from the file.
So your first fgetc will read 12 from the file and put it in the buffer, and then return '1'. Your second fgetc will not read from the file, since a character is available in the buffer, and return '2'. Your third fgetc will find that the buffer is empty, so it will read 3a from the file and put it in the buffer, and then return '3'. Therefore, when you print the content of the buffer, it will be 3a.
Note that there are two levels of "reading" happening here. First you have your fgetc calls, and then, below that level, code inside the stdio packade which is reading from the file. If we assume this is on a Unix or Linux system, the second type of reading is done using the system call read(2).
The lower-level reading fills the entire buffer at once, so you don't need as many calls to read as calls to fgetc. (Which is the entire point of having the buffer.)
I am having difficulty with a feature of a segment of code that is designed to illustrate the fgets() function for input. Before I proceed, I would like to make sure that my understanding of I/O and streams is correct and that I'm not completely off base:
Input and Output in C has no specific viable function for working with strings. The one function specific for working with strings is the 'gets()' function, which will accept input beyond the limits of the char array to store the input (thus making it effectively illegal for all but backward compatibility), and create buffer overflows.
This brings up the topic of streams, which to the best of my understanding is a model to explain I/O in a program. A stream is considered 'flowing water' on which the data utilized by programs is conveyed. See links: (also as a conveyor belt)
Can you explain the concept of streams?
What is a stream?
In the C language, there are 3 predefined ANSII streams for standard input and output, and 2 additional streams if using windows or DOS which are as follows:
stdin (keyboard)
stdout (screen)
stderr (screen)
stdprn (printer)
stdaux (serial port)
As I understand, to make things manageable it is okay to think of these as rivers that exist in your operating system, and a program uses I/O functions to put data in them, take data out of them, or change the direction of where the streams are flowing (such as reading or writing a file would require). Never think of the 'beginning' or 'end' of the streams: this is handled by the operating system. What you need to be concerned with is where the water takes your data, and that is mediated by use of specific functions (such as printf(), puts(), gets(), fgets(), etc.).
This is where my questions start to take form. Now I am interested in getting a grasp on the fgets() function and how it ties into streams. fgets() uses the 'stdin' stream (naturally) and has the built in fail safe (see below) that will not allow user input to exceed the array used to store the input. Here is the outline of the fgets() function, rather its prototype (which I don't see why one would ever need to declare it?):
char *fgets(char *str , int n , FILE *fp);
Note the three parameters that the fgets function takes:
p1 is the address of where the input is stored (a pointer, which will likely just be the name of the array you use, e.g., 'buffer')
p2 is the maximum length of characters to be input (I think this is where my question is!)
p3 specifies the input stream, which in this code is 'stdin' (when would it ever be different?)
Now, the code I have below will allow you to type characters until your heart is content. When you hit return, the input is printed on the screen in rows of the length of the second parameter minus 1 (MAXLEN -1). When you enter a return with no other text, the program terminates.
#include <stdio.h>
#define MAXLEN 10
int main(void)
{
char buffer[MAXLEN];
puts("Enter text a line at a time: enter a blank line to exit");
while(1)
{
fgets(buffer, MAXLEN, stdin); //Read comments below. Note 'buffer' is indeed a pointer: just to array's first element.
if(buffer[0] == '\n')
{
break;
}
puts(buffer);
}
return 0;
}
Now, here are my questions:
1) Does this program allow me to input UNLIMITED characters? I fail to see the mechanism that makes fgets() safer than gets(), because my array that I am storing input in is of a limited size (256 in this case). The only thing that I see happening is my long strings of input being parsed into MAXLEN - 1 slices? What am I not seeing with fgets() that stops buffer overflow that gets() does not? I do not see in the parameters of fgets() where that fail-safe exists.
2) Why does the program print out input in rows of MAXLEN-1 instead of MAXLEN?
3) What is the significance of the second parameter of the fgets() function? When I run the program, I am able to type as many characters as I want. What is MAXLEN doing to guard against buffer overflow? From what I can guess, when the user inputs a big long string, once the user hits return, the MAXLEN chops up the string in to MAXLEN sized bites/bytes (both actually work here lol) and sends them to the array. I'm sure I'm missing something important here.
That was a mouthful, but my lack of grasp on this very important subject is making my code weak.
Question 1
You can actually type as much character as your command line tool will allow you per input. However, you call to fgets() will handle only MAXLEN in your example because you tell him to do so.
Moreover, there is no safe check inside fgets(). The second parameter you gave to fgets is the "safety" argument. Try to give to change your call to fgets to fgets(buffer, MAXLEN + 10, stdin); and then type more than MAXLEN characters. Your program will crash because you are accessing unallocated memory.
Question 2
When you make a call to fgets(), it will read MAXLEN - 1 characters because the last one is reserved to the character code \0 which usually means end of string
The second parameter of fgets() is not the number of character you want to store but the maximum capacity of your buffer. And you always have to think about string termination character \0
Question 3
If you undestood the 2 answer before, you will be able to answer to this one by yourself. Try to play with this value. And use a different value than the one used for you buffer size.
Also, you said
p3 specifies the input stream, which in this code is 'stdin' (when would it ever be different?)
You can use fgets to read files stored on your computer. Here is an example :
char buffer[20];
FILE *stream = fopen("myfile.txt", "r"); //Open the file "myfile.txt" in readonly mode
fgets(buffer, 20, stream); //Read the 19 first characters of the file "myfile.txt"
puts(buffer);
When you call fgets(), it lets you type in as much as you want into stdin, so everything stays in stdin. It seems fgets() takes the first 9 characters, attaches a null character, and assigns it to buffer. Then puts() displays buffer then creates a newline.
The key is it's in a while loop -- the code loops again then takes what was remaining in stdin and feeds it into fgets(), which takes the next 9 characters and repeats. Stdin just still had stuff "in queue".
Input and Output in C has no specific viable function for working with strings.
There are several functions for outputting strings, such as printf and puts.
Strings can be input with fgets or scanf; however there is no standard function that both inputs and allocates memory. You need to pre-allocate some memory, and then read some characters into that memory.
Your analogy of a stream as a river is not great. Rivers flow whether or not you are taking items out of them, but streams don't. A better analogy might be a line of people at the gates to a stadium.
C also has the concept of a "line", lines are marked by having a '\n' character at the end. In my analogy let's say the newline character is represented by a short person.
When you do fgets(buf, 20, stdin) it is like "Let the next 19 people in, but if you encounter a short person during this, let him through but not anybody else". Then the fgets function creates a string out of these 0 to 19 characters, by putting the end-of-string marker on the end; and that string is placed in buf.
Note that the second argument to fgets is the buffer size , not the number of characters to read.
When you type in characters, that is like more people joining the queue.
If there were fewer than 19 people and no short people, then fgets waits for more people to arrive. In standard C there's no way to check if people are waiting without blocking to wait for them if they aren't.
By default, C streams are line buffered. In my analogy, this is like there is a "pre-checking" gate earlier on than the main gate, where all people that arrive go into a holding pen until a short person arrives; and then everyone from the holding pen plus that short person get sent onto the main gate. This can be turned off using setvbuf.
Never think of the 'beginning' or 'end' of the streams: this is handled by the operating system.
This is something you do have to worry about. stdin etc. are already begun before you enter main(), but other streams (e.g. if you want to read from a file on your hard drive), you have to begin them.
Streams may end. When a stream is ended, fgets will return NULL. Your program must handle this. In my analogy, the gate is closed.
I am having a confusion regarding the following code,
#include<stdio.h>
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
read(stdin,buf,4); //attaching to stdin
printf("buffer is %s\n",buf);
return 1;
}
suppose on runtime I provided with the input 10abcd so as per my understanding following should happen:
scanf should place 10 in data
and abcd will still be on the stdin buffer
when read tries to read the stdin (already abcd is there) it should place the abcd into the buf
so printf should print abcd
but it is not happening ,printf showing no o/p
am I missing something here?
First of all read (stdin, ...) should give warnings (if you have them enabled) which you would be wise to heed. read() takes an integer as the first parameter specifying which channel to read from. stdin is of type FILE *.
Even if you changed it to read(0,..., this is not recommended practice. scanf is reading from FILE *stdin which is buffered from file handle 0. read (0, ...) reads directly from the underlying file handle and ignore any characters which were buffered. This will cause strange results unless stdin is set unbuffered.
Ignoring mechanical issues related to the syntax of the read() function call, there are two cases to consider:
Input is from a terminal.
Input is from a file.
Terminal
No data will be available for reading until the user hits return. At that point, the standard I/O library will read all the available data into the buffer associated with stdin (that would be "10abcd\n"). It will then parse the number, leaving the a in the buffer to be read later by other standard I/O functions.
When the read() occurs, it will also wait for the user to provide some input. It has no clue about the data in the stdin buffer. It will hang until the user hits return, and will then read the next lot of data, returning up to 4 bytes in the buffer (no null termination unless it so happens that the fourth character is an ASCII NUL '\0').
File
Actually, this isn't all that much different, except that instead of reading a line of data into the buffer, the standard I/O library will probably read an entire buffer full, (BUFSIZ bytes, which might be 512 or larger). It will then convert the 10 and leave the a for later use. (If the file is shorter than the buffer size, it will all be read into the stdin buffer.)
The read will then collect the next 4 bytes from the file. If the whole file was read already, then it will return nothing — 0 bytes read.
You need to record and check the return value from read(). You should also check the return value from scanf() to ensure it did actually read a number.
try... man read first.
read is declared as ssize_t read(int fd, void *buf, size_t count);
and stdin is declared as FILE *. thats the issue. use fread() instead and you will be sorted.
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
fread(buf, 1, 4, stdin);
printf("buffer is %s\n",buf);
return 1;
}
EDIT: Your understanding is almost correct but not totally.
To address your question properly, i will agree with Jonathen Laffer.
how your code works,
1) scanf should place 10 in data.
2) abcd will still be on the stdin buffer when you press ENTER.
3) then read() will again wait for entry and you have to again press ENTER to run program further.
4)now if you have entered anything before pressing ENTER for 2nd time the printf should print it else you will not get anything on output other than your printf statement.
Thats why i asked you to use fread instead. hope it helps.
/*
Low Level I/O - Read and Write
Chapter 8 - The C Programming Language - K&R
Header file in the original code is "syscalls.h"
Also BUFSIZ is supposed to be defined in the same header file
*/
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#define BUFSIZ 1
int main() /* copy input to output */
{
char buf[BUFSIZ];
int n;
while ((n = read(0, buf, BUFSIZ)) > 0)
write(1, buf, n);
return 0;
}
When I feed "∂∑∑®†¥¥¥˚π∆˜˜∫∫√ç tu 886661~EOF" as input the same is copied.
How so many non ASCII characters are stored at the same time?
BUFSIZ is number of bytes to be transferred.
How is BUFSIZ limiting byte transfer if for any value, anything can be copied from input to output?
How char buf[BUFSIZ] is storing non-ASCII characters ?
You read by little chunks until EOF:
while ((n = read(0, buf, BUFSIZ)) > 0)
That's why. You literally, byte by byte, copy input to output. How convert it back to unicode, is problem of console, not your. I guess, It do not output anything until it can recognize data as symbol.
Since you are calling read in a loop until 'end of file' is reached on an error in encountered, you are getting precisely 1 character in buf after each call of read. After that that character is printed via write system call. It is guaranteed that read system call will read no more than it's specified in the last argument. If you pass 10, for example, in your case, read will go ahead and try to copy the data read beyond the array bounds.
As for the characters you have fed - these seem to be extended ASCII characters (codes 128-255), so no problem here.
When you call read from standard input you are reading from the pipe, that bound to terminal or to another program. Of course there is a buffer(s) between writer (terminal or other program) and your program. When this buffer is underflow reader (your program) is blocking on read. When the buffer is overflow than writer (terminal etc) in blocking on write and vice versa.
When you write to the standard output you writing to the pipe, that bound to terminal or to another program.
So if your program is run by the shell from the terminal, than your program input and output is bound to the (pseudo)terminal. (Pseudo)terminal is program that can convert user's key presses to the characters and convert some encoded strings (ISO8859-1, UTF-8 etc) to the symbols on the screen.
Characters are stored in the terminal program before you press the EOF of EOL. This is canonical mode of the terminal. After your press enter the bytes are wrote to the pipe bound to your program.
BUFSIZ is number of bytes that you trying to read from the input per one operation. n return value is number of bytes that really have read when operation complete. So BUFSIZ is maximum bytes that can be read by your program from the pipe.
char buf[BUFSIZ] is array of bytes (not the characters of some charset), so it can handle any values (including non-printable and even zero).