My question is regarding the following paragraph on page 15 (Section 1.5) of The ANSI C Programming Language (2e) by Kernighan and Ritchie (emphasis added):
The model of input and output supported by the standard library is very simple.
Text input or output, regardless of where it originates or where it goes to,
is dealt as a stream of characters. A text stream is a sequence of characters divided
into lines; each line consists of zero or more characters followed by a newline character.
It is the responsibility of the library to make each input or output stream conform to
this model; the C programmer using the library need not worry about how lines are
represented outside the program.
I'm unsure of what is meant by the text in bold, especially the line "it is the responsibility of the library to make each input or ouptput stream conform to this model." Could someone please help me understand what this means?
At first, I thought it had something to do with the line-buffering of stdin I was seeing when I call getchar() when stdin is empty, but then learned that the buffering mode varies across implementations (see here). So I don't think this is what the text in bold is referring to when it talks about conforming to the text stream model.
Consider running code like printf("hello world"); in the firmware of a USB device. Suppose that whatever characters you pass to printf are sent over USB from the device to the computer. The way the USB protocol works, the characters must be split up into groups of characters called packets. There is a maximum packet size depending on how your USB hardware and descriptors are configured. Also, for efficiency, you want to fill up the packets whenever possible, because sending a packet that is less than the maximum size means the computer will stop letting you send more data for a while. Also, if the computer doesn't receive your packet, you might need to re-send it. Also, if your USB packet buffers are already filled, you might need to wait a while until one of them gets sent.
To make programming in C a manageable task, the implementation of printf needs to handle all of these details so the user doesn't need to worry about them when they are calling printf. For example, it would be really bad if printf was only able to send a single packet of 1 to 8 bytes whenever you call it, and thus it returns an error whenever you give it more than 8 characters.
This is called an abstraction: the underlying system has some complexity (like USB endpoints, packets, buffers, retries). You don't want to think about that stuff all the time so you make a library that transforms that stuff into a more abstract interface (like a stream of characters). Or you just use a "standard library" written by someone else that takes care of that for you.
If you want a more PC-centric example... I believe that printf is implemented on many systems by calling the write system call. Since write isn't always guaranteed to actually write all of the data you give it, the implementation of printf needs to try multiple times to write the data you give it. Also, for efficiency, the printf implementation might buffer the data you give it in RAM for a while before passing it to the kernel with write. You don't generally have to worry about retrying or buffering details while programming in C because once your program terminates or you flush the buffer, the standard library makes sure all your data has been written.
Related
I was reading a book from 1997 that teaches how to program in C, and it always uses the word “usually” when specifying that functions like scanf take input from the keyboard. Because of this, I'm curious as to if functions like scanf can take input from other devices, or if it used to.
Because of this, I'm curious as to if functions like scanf can take input from other devices, or if it used to.
scanf takes input from the program's standard input. What this is connected to is a matter of the operating environment and the way the program is launched. (Look up "I/O redirection"). It is not unusual for a program's standard input to be connected to a file on disk or to the output of another program. It sometimes is connected to a socket. More rarely, it is connected to a serial port, or to a null device, or a source of zeroes or random bytes.
Historically, it might have been connected to a card or paper tape reader.
In principle, it can be connected to any device that produces data -- a mouse, for example -- but just because something is possible doesn't make it useful.
After freopen( ..., ...., stdin), scanf() input can come from many possible sources.
This seems like a simple question, but I have had a really hard time finding an answer. I am writing a program in C where this seems possible (though remotely so) on some systems, as it appears there are situations where stdin has a buffer of only 4k.
So, my question is, is there a standard way an OS deals with stdin filling up (i.e., a de facto standard, a posix requirement, etc)? How predictable is the outcome, if there is in fact some sort of standard way to deal with the situation?
The OS will have a buffer that stores the unread stdin input. In general things writing to stdin will be using blocking calls so that if the buffer fills up they will simply stall until room is available, so no data will be lost. If this is the undesirable behaviour (you don't want to be blocking the writer) then you need to make sure you are reading the buffer in time so that it doesn't fill up.
One thing you could do is create a worker thread that simply sits in a tight loop reading the stdin as fast as it can and puts the data somewhere else (in a much larger buffer for example) and then the main program accesses the data from your new buffer rather than reading from stdin itself.
I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen
One of the purposes of the library of which I am developing is to retrieve a specified amount of bytes from a file, in this specific case I am wishing for access to /dev/random to retrieve entropy based random sequences.
My main issue with fread is that it will hang indefinitely when waiting for more entropy, and this is unwanted. My next choice would have been wrapping fread with feof to take bytes in chunks, then I could at least provide percentages complete for a better experience, although from what I could gather iteration 1, 2, 3, 4..'s bytes will be hard to track to equal exactly the amount needed.
Is there a method in a C standard that allows for what I am looking for, exact amount wanted and in chunks? If I were to look for timeouts of this, would threading the data request be a good option to look at?
Define "standard". Do you mean the ISO C standard? POSIX? Linux standards base (LSB)? For POSIX, the read call lets you specify the size of the buffer that you are trying to read. You can use pselect or poll to determine if there are bytes available to be read, with a timeout instead of blocking. On Linux, it is possible to use the "FIONREAD" ioctl call to obtain the exact number of bytes available for reading.
That said, you should ask yourself if you need that level of entropy. You might (or might not) be able to get away with reading from "/dev/urandom". Of course, you would have to determine if that is the case.
Try this
Here is the man page for a function I think will solve your problem.
http://www.manpagez.com/man/3/fgets/
I just saw that fread wasnt working, fgets reads a certain number of byes from file stream into buffer
I was wondering if there was any resources available online that explains what happens with something, like printf of C, that explains what's going on in the very low level (BIOS/kernel calls)
Linux:
printf() ---> printf() in the C library ---> write() in C library ---> write() system call in kernel.
To understand the interface between user space and kernel space, you will need to have some knowledge of how system calls work.
To understand what is going on at the lowest levels, you will need to analyze the source code in the kernel.
The Linux system call quick reference (pdf link) may be useful as it identifies where in the kernel source you might begin looking.
Something like printf, or printf specifically? That is somewhat vague.
printf outputs to the stdout FILE* stream; what that is associated with is system dependent and can moreover be redirected to any other stream device for which the OS provides a suitable device driver. I work in embedded systems, and most often stdout is by default directed to a UART for serial I/O - often that is the only stream I/O device supported, and cannot be redirected. In a GUI OS for console mode applications, the output is 'drawn' graphically in the system defined terminal font to a window, in Windows for example this may involve GDI or DirectDraw calls, which in turn access the video hardware's device driver. On a modern desktop OS, console character output does not involve the BIOS at all other than perhaps initial bootstrapping.
So in short, there typically is a huge amount of software between a printf() call and the hardware upon which it is output.
This is very platform-specific. From a hardware perspective, the back-end implementation of printf() could be directed to a serial port, a non-serial LCD, etc. You're really asking two questions:
How does printf() interpret arguments and format string to generate correct output?
How does output get from printf() to your target device?
You must remember that an OS, kernel, and BIOS are not required for an application to function. Embedded apps typically have printf() and other IO routines write to a character ring buffer. An interrupt may then poll that buffer and manipulate output hardware (LCD, serial port, laser show, etc) to send the buffered output to the correct destination.
By definition, BIOS and kernel calls are platform-specific. What platform are you interested in? Several links to Linux-related information have already been posted.
Also note that printf may not even result in any BIOS or kernel calls, as your platform may not have a kernel or BIOS present (embedded systems are a good example of this).
The printf() takes multiple arguments (variable length arguments function). The user supplies a string and input arguments.
The printf() function creates an internal buffer for constructing output string.
Now, printf() iterates through each character of user string and copies the character to the output string. Printf() only stops at "%".
"%" means there is an argument to convert(Arguments are in the form of char, int, long, float, double or string). It converts it to string and appends to the output buffer. If the argument is a string then it does a string copy.
Finally, printf() may reach at the end of user sting and it copies the entire buffer to the stdout file.