What is exactly a stream in C language? - c

I can't understand the meaning of "stream" in C language. Is it an abstraction ( just a name describe many operations)? Is it an object (monitor, keyboard, file on hard drive) which a program exchange data with it ? Or it 's a memory space in the RAM holding temporarly the exchanged data ?.
Thinks for help.

A stream is an abstraction of an I/O channel. It can map to a physical device such as a terminal or tape drive or a printer, or it can map to a file in a file system, or a network socket, or something else completely. How that mapping is accomplished is not exposed to you, the programmer.
From the perspective of your code, a stream is simply a source (input stream) or sink (output stream) of characters (text stream) or bytes (binary stream). Streams are managed through FILE objects and the stdio routines.
As far as your code is concerned, all streams behave the same way, regardless of what they are mapped to. It's a uniform interface to operations that can have wildly different implementations.

Stream is just the sequence of data available over the time. It is distinct from the file for example because you cant set the position. Examples: data coming/going through the RS232, USB, Ethernet, IP newworks etc etc.
but my questions are what are exactly a stream on the machine level
Nothing special. Machine level does not know anything about the streams.
What is exactly a stream in C language?
Same - C language does not know anything about the streams.

In C when we use the term stream, we indicate any input source or output destination.
Some example may be:
stdin (standard input which is the keyboard by default)
stdout (standard output which by default is the screen)
stderr (standard error which is the screen by default)
Functions such as printf, scanf, gets, puts and getchar, are functions that have the keyboard as input stream and the screen as output stream.
But we can create streams to files to!
The stdio.h library supports two types of files, text files and binary files. Within a text file, the bytes represent characters, which makes it possible for a human to read what the file contains. By contrast, in a binary file, bytes do not necessarily represent characters. In summary, text files have two things that binary files do not: Text files are divided into lines, and each line ends with one or two special characters. The code obviously depends on the operating system. In addition, text files can contain the file terminator (END OF FILE).

Streams are specific to the running program as well. Let me explain this further.
When you run a program through the terminal (Unix-like/Windows) what essentially it does is:
The terminal forks into a child process and runs your specified program (./name_of_program).
All the printf statements are given to stdout of the parent process which forked. Same for, scanf statements but now to stdin of the parent process that forked.
The operating system handles the characteristics of the streams, i.e. how many bytes can be streamed to stdin/out at once. Generally in Unix it is 4096 bytes. (Hint: Use pipes to overcome this issue).
There are three types of streams in C or any Programming language, Buffered, Line-buffered and Unbuffered. (Hint: use delay() function between each printf() call to know what this mean)
Now, the read and write access to files is handled by other service of the OS which is file descriptor. They are positive integers used by OS to keep track of the opened files and ports (like Serial Port).

Related

Difference between stream and direct I/O in C?

In C, I believe (correct me if I'm wrong) there are two different types of input/output functions, direct and stream, which result in binary and ASCII files respectively.
What is the difference between stream (ASCII) and direct (Binary) I/O in terms of retrieving (read/write) and printing data?
No, yes, sort of, maybe…
In C, … there are two different types of input/output functions, direct and stream, which result in binary and ASCII files respectively.
In Standard C, there are only file streams, FILE *. In POSIX C, there are what might be termed 'direct' file access functions, mainly using file descriptors instead of file streams. AFAIK, Windows also provides alternative I/O functions, mainly using handles instead of file streams. So "No" — Standard C has one type of I/O function; but POSIX (and Windows) provide alternatives.
In Standard C, you can create a binary files and text files using:
FILE *bfp = fopen("binary-file.bin", "wb");
FILE *tfp = fopen("regular-file.txt", "w");
On Windows (and maybe other systems for Windows compatibility), you can be explicit about opening a text file:
FILE *tcp = fopen("regular-file.txt", "wt");
So the standard distinguishes between text and binary files, but file streams can be used to access either type of file. Further, on Unix systems, there is no difference between a text file and a binary file; they will be treated the same. On Windows, a text file will have its CRLF (carriage return, line feed) line endings mapped to newline on input, and newlines mapped to CRLF line endings on output. That translation does not occur with binary files.
Note that there is also a concept 'direct I/O' on Linux, activated using the O_DIRECT flag, which is probably not what you're thinking of. It is a refinement of file descriptor I/O.
What is the difference between stream (ASCII) and direct (Binary) I/O in terms of retrieving (read/write) and printing data?
There are multiple issues.
First, the dichotomy between text files and binary files is separate from the dichotomy between stream I/O and direct I/O.
With stream I/O, the mapping of line endings from native (e.g. CRLF) to newline when processing text files compared with no such mapping when processing binary files.
With text I/O, it is assumed that there will be no null bytes, '\0' in the data. Such bytes in the middle of a line mess up text processing code that expects to read up to a null. With binary I/O, all 256 byte values are expected; code that breaks because of a null byte is broken.
Complicating this is the distinction between different code sets for encoding text files. If you have a single-byte code set, such as ISO 8859-15, then null bytes don't generally appear. If you have a multi-byte code set such as UTF-8, again, null bytes don't generally appear. However, if you have a wide character code set such as UTF-16 (whether big-endian or little-endian), then you will often get zero bytes in the body of the file — it is not intended to be read or written as a byte stream but rather as a stream of 16-bit units.
The major difference between stream I/O and direct I/O is that the stream library buffers data for both input and output, unless you override it with setvbuf(). That is, if you repeatedly read a single character in the user code (getchar() for example), the stream library first reads a chunk of data from the file and then doles out one character at a time from the chunk, only going back to the file for more data when the previous chunk has been delivered completely. By contrast, direct I/O reading a single byte at a time will make a system call for each byte. Granted, the kernel will buffer the I/O (it does that for the stream I/O too — so there are multiple layers of buffering here, which is part of what O_DIRECT I/O attempts to avoid whenever possible), but the overhead of a system call per byte is rather substantial.
Generally, you have more fine-grained control over access with file descriptors; there are operations you can do with file descriptors that are simply not feasible with streams because the stream interface functions simply don't cover the possibility. For example, setting FD_CLOEXEC or O_CLOEXEC on a file descriptor means that the file descriptor will be closed automatically by the system when the program executes another one — the stream library simply doesn't cover the concept, let alone provide control over it. The cost of gaining the fine-grained control is that you have to write more code — or, at least, different code that does what is handled for you by the stream library functions.
Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O. A Stream is a file or a physical device (like monitor) which is manipulated with a pointer to the stream.
This is BUFFERED that is to say a fixed chunk is read from or written to a file via some temporary storage area (the buffer). But data written to a buffer does not appear in a file (or device) until the buffer is flushed or written out. (\n does this).
In Direct or low-level I/O-
This form of I/O is UNBUFFERED -- each read/write request results in accessing disk (or device) directly to fetch/put a specific number of bytes.
There are no formatting facilities -- we are dealing with bytes of information.
This means we are now using binary (and not text) files.

What is meant by stream buffering?

I had started learning C programming, so I'm a beginner, while learning about standard streams of text, I came up with the lines "stdout" stream is buffered while "stderr" stream is not buffered, but I am not able to make sense with this lines.
I already have read about "buffer" on this forum and I like candy analogy, but I am not able to figure out what is meant when one says: "This stream is buffered and the other one is not." What is the effect?
What is the difference?
Update: Does it affect the speed of processing?
Buffer is a block of memory which belongs to a stream and is used to hold stream data temporarily. When the first I/O operation occurs on a file, malloc is called and a buffer is obtained. Characters that are written to a stream are normally accumulated in the buffer (before being transmitted to the file in chunks), instead of appearing as soon as they are output by the application program. Similarly, streams retrieve input from the host environment in blocks rather than on a character-by-character basis. This is done to increase efficiency, as file and console I/O is slow in comparison to memory operations.
GCC provides three types of buffering - unbuffered, block buffered, and line buffered. Unbuffered means that characters appear on the destination file as soon as written (for an output stream), or input is read from a file on a character-by-character basis instead of reading in blocks (for input streams). Block buffered means that characters are saved up in the buffer and written or read as a block. Line buffered means that characters are saved up only till a newline is written into or read from the buffer.
stdin and stdout are block buffered if and only if they can be determined not to refer to an interactive device else they are line buffered (this is true of any stream). stderr is always unbuffered by default.
The standard library provides functions to alter the default behaviour of streams. You can use fflush to force the data out of the output stream buffer (fflush is undefined for input streams). You can make the stream unbuffered using the setbuf function.
Buffering is collecting up many elements before writing them, or reading many elements at once before processing them. Lots of information out there on the Internet, for example, this
and other SO questions like this
EDIT in response to the question update: And yes, it's done for performance reasons. Writing and reading from disks etc will in any case write or read a 'block' of some sort for most devices, and there's a fair overhead in doing so. So batching these operations up can make for a dramatic performance difference
A program writing to buffered output can perform the output in the time it takes to write to the buffer which is typically very fast, independent of the speed of the output device which may be slow.
With buffered output the information is queues and a separate process deals with the output rendering.
With unbuffered output, the data is written directly to the output device, so runs at the speed on the device. This is important for error output because if the output were buffered it would be possible for the process to fail before the buffered output has made it to the display - so the program might terminate with no diagnostic output.

How OS control files?

I know that each block of OS array contains one FCB. but I don't understand how OS uses them to control files. I don't understand the relation. PLZ explain simply.
C views each file simply as a sequential stream of bytes. Each file ends either with an end-of-file marker or at a specific byte number recorded in a system-maintained, administrative data structure. When a file is opened, a stream is associated with the file. Three files and their associated streams are automatically opened when program execution begins- the standard input, the standard output and the standard error. Opening a file returns a pointer to a FILE structure (defineed in <sdtio.>) that contains information used to process the file. This structure includes a file descriptor, i.e., an index into an operating system array called open file table. Each array element contains a file control block (FCB) that the operating system uses to administer a particular file. the standard input, standard output and standard error are maniulated using file pointers stdin, stdout and stderr.
Deitel, how to program C, 6th eidition, page 420

How are files written? Why do I not see my data written immediately?

I understand the general process of writing and reading from a file, but I was curious as to what is happening under the hood during file writing. For instance, I have written a program that writes a series of numbers, line by line, to a .txt file. One thing that bothers me however is that I don't see the information written until after my c program is finished running. Is there a way to see the information written while the program is running rather than after? Is this even possible to do? This is a hard question to phrase in one line, so please forgive me if it's already been answered elsewhere.
The reason I ask this is because I'm writing to a file and was hoping that I could scan the file for the highest and lowest values (the program would optimally be able to run for hours).
Research buffering and caching.
There are a number of layers of optimisation performed by:
your application,
your OS, and
your disk driver,
in order to extend the life of your disk and increase performance.
With the careful use of flushing commands, you can generally make things happen "quite quickly" when you really need them to, though you should generally do so sparingly.
Flushing can be particularly useful when debugging.
The GNU C Library documentation has a good page on the subject of file flushing, listing functions such as fflush which may do what you want.
You observe an effect solely caused by the C standard I/O (stdio) buffers. I claim that any OS or disk driver buffering has nothing to do with it.
In stdio, I/O happens in one of three modes:
Fully buffered, data is written once BUFSIZ (from <stdio.h>) characters were accumulated. This is the default when I/0 is redirected to a file or pipe. This is what you observe. Typically BUFSIZ is anywhere from 1k to several kBytes.
Line buffered, data is written once a newline is seen (or BUFSIZ is reached). This is the default when i/o is to a terminal.
Unbuffered, data is written immediately.
You can use the setvbuf() (<stdio.h>) function to change the default, using the _IOFBF, _IOLBF or _IONBF macros, respectively. See your friendly setvbuf man page.
In your case, you can set your output stream (stdout or the FILE * returned by fopen) to line buffered.
Alternatively, you can call fflush() on the output stream whenever you want I/O to happen, regardless of buffering.
Indeed, there are several layers between the writing commands resp. functions and the actual file.
First, you open the file for writing. This causes the file to be either created or emptied. If you write then, the write doesn't actually occur immediately, but the data are cached until the buffer is full or the file is flushed or closed.
You can call fflush() for writing each portion of data, or you can actually wait until the file is closed.
Yes, it is possible to see whats written in the file(s). If you programm under Linux you can open a new Terminal and watch the progress with for example "less Filename".

buffer confusion

Could anyone clarify on the types of buffers used by a program?
For eg:
I have a C program that reads from a stdin to stdout.
What are the buffers involved here? I'm aware that there are 2.
One provided by the kernel on which a user don't have any control.
One provided with standard streams namely stdout, stdin and stderr. Each having a separate buffer.
Is my understanding correct?
Thanks,
John
If you are working on linux/unix then you could more easily understand that there are three streams namely
1.STDIN: FILE DESCRIPTOR VALUE 0 (IN unix)
2.STDOUT :FILE DESCRIPTOR VALUE 1
3.STDERR :FILE DESCRIPTOR VALUE 2
By default these streams correspond to keyboard and monitor.In unix we can change these streams to read input from file instead of keyboard.To display output on a file rather than monitor using close(),dup() system calls.Yes there are 3 buffers involved.To clear the contents of input buffer in c we use fflush() function.
If you want to know more about handling these streams in UNIX then let me Know.
The kernel (or other underlying system) could have any number of layers of buffering, depending on what device is being read from and the details of the kernel implementation; in some systems there is no buffering at that level, with the data being read directly into the userspace buffer.
The stdio library allocates a buffer for stdin; the size is implementation-dependent but you can control the size and even use your own buffer with setvbuf. It also allows you to control whether I/O is fully buffered (as much data is read into the buffer as is available), line buffered (data is is only read until a newline is encountered), or unbuffered. The default is line buffering if the system can determine that the input is a terminal, else fully buffered.
The story is similar for stdout. stderr is by default unbuffered.

Resources