I was reading a bit about z/OS's concepts of blocked IO. It states:
Blocked I/O is an extension to the ISO standard. For files opened in block format, z/OS® XL C/C++ reads and writes one block at a time. If you try to write more data to a block than the block can hold, the data is truncated. For blocked I/O, z/OS XL C/C++ allows only the use of fread() and fwrite() to read and write to files.
Then it goes to say:
The fflush() function has no effect for blocked I/O files.
However, in another article, it says:
For terminals, because I/O is always unblocked, line buffering is
equivalent to full buffering.
For record I/O files, buffering is
meaningful only for blocked files or for record I/O files in z/OS UNIX
file system using full buffering. For unblocked files, the buffer is
full after every write and is therefore written immediately, leaving
nothing to flush. For blocked files or fully-buffered UNIX file system
files, however, the buffer can contain one or more records that have
not been flushed and that require a flush operation for them to go to
the system.
For blocked I/O files, buffering is always meaningless.
I'm extremely confused by all this. If I/O is unblocked, how would line buffering be equivalent to full buffering? Why wouldn't flush make a difference in block I/O? In addition, what does it mean that blocked I/O cause buffering to be always meaningless? Any intuition regarding what's happening here with blocked vs unblocked I/O and how it plays into the effects of buffering would be much appreciated.
My take on what you provided is that this is referring to Blocked I/O for MVS datasets. These would be different than files stored in Unix System Services HFS / ZFS. And different than terminal I/O.
I'm extremely confused by all this. If I/O is unblocked, how would
line buffering be equivalent to full buffering?
I think your referring to the reference to terminal I/O which indicates that a line is a record and is the same as block size so every record is a full block of data. Which is to say an LRECL = BLKSIZE == 1 record per block so its not buffered, or, the buffer is the record.
Why wouldn't flush make a difference in block I/O?
Where there is more than one record per block fflush will not write a block until the block is full. I suspect it has to do with the I/O implementation in z/OS wich predated C on the platform so they made a design decision to no cause different behaviours for different languages in how I/O is conducted.
In addition, what does it mean that blocked I/O cause buffering to be always meaningless?
Again, z/OS writes full blocks except for the last block in a file which may be short because it does not contain enough records for a full block.
There was a lot of history in z/OS before C came to the platform and z/OS goes to great lengths to provide consistency.
Related
I am going through Mark Burgess's "The GNU C Programming Tutorial". I have come across the following information:
Even though low-level fle routines do not use buffering, and once you call write, your data can be read from the file immediately, it may take up to a minute before your data is physically written to disk. (Page:142)
Firstly, is "it may take up to a minute(some time) before your data is written to disk" true?
Secondly, when low level file routines are not using buffering why will the delay take place?
There are two places where I/O buffering can occur (at least — it could be more than just two).
One is in the application; the standard I/O functions using FILE * use buffered I/O unless you use setvbuf() to prevent it.
The other is in the kernel. Disk I/O normally goes into the kernel buffer pool, and eventually gets written by the kernel to disk. There are ways around that (O_DIRECT on Linux; raw devices on classic Unix; etc). The key point is that the write() system call normally writes to he kernel buffer pool. The kernel takes responsibility for ensuring that the data is written to disk safely and correctly (journalling, …).
The kernel doesn't write everything to disk immediately because (a) you may add more changes to the data, (b) other people may need to read or write the data, (c) the disk drive may be busy writing something else at the other end of its 1 TiB of storage and it will take time to get the write head in position to take your data, and it would be better for the overall performance of the system if it scheduled other work before writing your changed buffer to disk. It will get written to disk. It is just not defined when, and it could be fractions of a second or multiple seconds or longer, though most often it will not take minutes for the data to be written to disk.
These days, there could also be buffering in the RAID controllers, and maybe in the individual disks inside the RAID setup, and maybe there's network buffering too if it is a remotely-mounted file system. Those add extra levels of buffering.
The read() and write() and related low-level I/O functions do not have any client-side (application) buffering — unlike the standard C I/O functions.
A file is said to be buffered, when its contents are not outputted or inputted directly. Instead, the file's bytes are written to a temporary buffer in memory.
For example, if you are reading from a file, you are reading from the buffer. Once you have read all the characters in the buffer, it is replenished with new bytes from the file. The reason for this indirectness, is that a memory read is much faster than a hard disk read.
The calls read and write are low-level, and do not perform buffering. The stdio.h calls like getc and putc, do use buffering. These higher-level APIs only call the low level ones, when the buffer must be replenished.
Writing to the hard drive is much slower than writing to RAM. When you write to a drive it writes to memory, but doesn't always write to the disk immediately. The data might not be written to disk until that part of memory needs to be overwritten to make room for something else. This is called a Write-Back cache.
When writing directly to a device in /dev, I open a file descriptor and perform a UNIX write() followed by a read(). Can I have multiple threads doing this write()/read() sequence on the same file descriptor, and not get jumbled data if two threads enter the write() function at the same time?
References to std documentation would be immensely helpful. I've not been able to find anything though. Someone has mentioned that such operations are atomic in the kernel, but I am sceptical.
Also, to clarify this is a file in /dev, so any insight as to how far the "file pointer" concept applies here is helpful as well.
File pointers (FILE *fp, for example), are a layer in the user-side code sitting above the function calls (such as write()). Access to fp is controlled by locks in a threaded environment (you can't have to threads modifying the same structure at the same time).
Inside the kernel, I'd expect there to be a lock on the file descriptor (and/or 'open file description') to prevent it being used from two threads at once.
You can look up the POSIX specification for read() and
getchar_unlocked()
to find out more about locking etc — at least for a POSIX compliant implementation.
Note that POSIX still uses C99. Therefore, it is not cognizant of the C11 thread facilities. The C11 standard does not have read() et al (file I/O using file descriptors), so it says nothing about such system calls. Neither does it provide a getchar_unlocked() or any of its relatives.
Caveat: I have not been in kernels for awhile, but this is the way it used to work.
For disk files:
Can you open the file in append mode, writing block sizes <= BLKSIZE ?
Small enough block sizes guarantee, in POSIX environments, atomic writes in POSIX environments (actually, the limit may be greater than BLKSIZE... I'm too lazy to hunt around for the alternate symbol).
Append guarantees seeks to the end of the file... for devices supporting seeks. Combine with atomic writes you should be golden.
Each buffer must stand by itself under the assumption some "foreign" information may follow it.
For ttys:
Append mode makes no sense here. As before, but paying attention to line endings gets even more important. And this very much does not apply to reads. Codes ttys treat as control sequences can also trip you up if even the modes the sequences enable split across blocks.
For other devices:
Can get tricky here. Depends on the device.
I'm going to assume that you are referring to a generic character device (e.g. a tty) since you were not specific. As far as I know, each fd-type operation (e.g. read()/write()) maps directly into a call into the driver.
Therefore, the driver will receive each write()'s data chunk as a whole and not see the next one's data until it is done (e.g. data is queued to be transmitted).
However, if the driver does not consume the entire chunk of data at once (i.e. write() returns less than the specified number of bytes, then there is no guarantee, that the thread will be able to write again with the remainder before another thread does a different write().
Also, as Johnathan Leffler noted, if you use standard I/O with process-level buffering, all bets are off.
Bottom line, if you are using direct fd writes, each write will map directly to one driver function call. From there, it's up to the driver as to if write is atomic.
Edit: wlformyd brings up the question of locking between multiple threads on multiple processors. To my knowledge, there is no locking on a FD and, in fact, that would be ineffective as multiple FDs could be used to access the same device.
I believe it is up to the driver itself to do locking to prevent contention to internal queues and/or hardware. in that sense, on a multi-processor system, the kernel doesn't prevent multiple simultaneous access to a driver's write routine. However, a properly written driver should do the locking to prevent mixing of output between two write calls.
Why would you want to set aside a block of memory in setvbuf()?
I have no clue why you would want to send your read/write stream to a buffer.
setvbuf is not intended to redirect the output to a buffer (if you want to perform IO on a buffer you use sprintf & co.), but to tightly control the buffering behavior of the given stream.
In facts, C IO functions don't immediately pass the data to be written to the operating system, but keep an intermediate buffer to avoid continuously performing (potentially expensive) system calls, waiting for the buffer to fill before actually performing the write.
The most basic case is to disable buffering altogether (useful e.g. if writing to a log file, where you want the data to go to disk immediately after each output operation) or, on the other hand, to enable block buffering on streams where it is disabled by default (or is set to line-buffering). This may be useful to enhance output performance.
Setting a specific buffer for output can be useful if you are working with a device that is known to work well with a specific buffer size; on the other side, you may want to have a small buffer to cut down on memory usage in memory-constrained environments, or to avoid losing much data in case of power loss without disabling buffering completely.
In C files opened with e.g. fopen are by default buffered. You can use setvbuf to supply your own buffer, or make the file operations completely unbuffered (like to stderr is).
It can be used to create fmemopen functionality on systems that doesn't have that function.
The size of a files buffer can affect Standard library call I/O rates. There is a table in Chap 5 of Steven's 'Advanced Programming in the UNIX Environment' that shows I/O throughput increasing dramatically with I/O buffer size, up to ~16K then leveling off. A lot of other factor can influenc overall I/O throughtput, so this one "tuning" affect may or may not be a cureall. This is the main reason for "why" other than turning off/on buffering.
Each FILE structure has a buffer associated with it internally. The reason behind this is to reduce I/O, and real I/O operations are time costly.
All your read/write will be buffered until the buffer is full. All the data buffered will be output/input in one real I/O operation.
Why would you want to set aside a block of memory in setvbuf()?
For buffering.
I have no clue why you would want to send your read/write stream to a buffer.
Neither do I, but as that's not what it does the point is moot.
"The setvbuf() function may be used on any open stream to change its buffer" [my emphasis]. In other words it alread has a buffer, and all the function does is change that. It doesn't say anything about 'sending your read/write streams to a buffer". I suggest you read the man page to see what it actually says. Especially this part:
When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; when it is block buffered many characters are saved up and written as a block; when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin).
I understand the general process of writing and reading from a file, but I was curious as to what is happening under the hood during file writing. For instance, I have written a program that writes a series of numbers, line by line, to a .txt file. One thing that bothers me however is that I don't see the information written until after my c program is finished running. Is there a way to see the information written while the program is running rather than after? Is this even possible to do? This is a hard question to phrase in one line, so please forgive me if it's already been answered elsewhere.
The reason I ask this is because I'm writing to a file and was hoping that I could scan the file for the highest and lowest values (the program would optimally be able to run for hours).
Research buffering and caching.
There are a number of layers of optimisation performed by:
your application,
your OS, and
your disk driver,
in order to extend the life of your disk and increase performance.
With the careful use of flushing commands, you can generally make things happen "quite quickly" when you really need them to, though you should generally do so sparingly.
Flushing can be particularly useful when debugging.
The GNU C Library documentation has a good page on the subject of file flushing, listing functions such as fflush which may do what you want.
You observe an effect solely caused by the C standard I/O (stdio) buffers. I claim that any OS or disk driver buffering has nothing to do with it.
In stdio, I/O happens in one of three modes:
Fully buffered, data is written once BUFSIZ (from <stdio.h>) characters were accumulated. This is the default when I/0 is redirected to a file or pipe. This is what you observe. Typically BUFSIZ is anywhere from 1k to several kBytes.
Line buffered, data is written once a newline is seen (or BUFSIZ is reached). This is the default when i/o is to a terminal.
Unbuffered, data is written immediately.
You can use the setvbuf() (<stdio.h>) function to change the default, using the _IOFBF, _IOLBF or _IONBF macros, respectively. See your friendly setvbuf man page.
In your case, you can set your output stream (stdout or the FILE * returned by fopen) to line buffered.
Alternatively, you can call fflush() on the output stream whenever you want I/O to happen, regardless of buffering.
Indeed, there are several layers between the writing commands resp. functions and the actual file.
First, you open the file for writing. This causes the file to be either created or emptied. If you write then, the write doesn't actually occur immediately, but the data are cached until the buffer is full or the file is flushed or closed.
You can call fflush() for writing each portion of data, or you can actually wait until the file is closed.
Yes, it is possible to see whats written in the file(s). If you programm under Linux you can open a new Terminal and watch the progress with for example "less Filename".
The posix standard specified that when write less than PIPE_BUF bytes to pipe or FIFO are granted atomic, that is, our write doesn't mix with other processes'. But I failed to find out how standard specify about regular file. I mean it's true that when we write less than PIPE_BUF, it will also granted be atomic. But I want to know does regular file have such limitation? I mean, the pipe has the capacity, so that when write to the pipe and beyond its capacity, kernel will put the writer to sleep, so other process will get chance to write, but regular file seems that doesn't have to have such limitation, am i right?
What I'm doing is several processes generate log to a file. Of course, with O_APPEND set.
Quote from http://pubs.opengroup.org/onlinepubs/9699919799/toc.htm (Single UNIX Specification, Version 4, 2010 Edition):
This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.
The specification does address how semantics of writes regarding writes occur in case of multiple readers, but as you can see from above, the behaviour for multiple, concurrent writers is not defined by the specification.
Note above talks about files. For pipes and FIFOs the PIPE_MAX semantics apply, that concurrent writes are guaranteed to be non-divisible up to PIPE_MAX bytes.
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.
For real file systems the situation is complex. Some local file systems may enforce atomic writes up to arbitrary sizes (memory limit) by locking a file handle during writing, some might not (I tried to look at ext4 logic, but lost track somewhere around http://lxr.linux.no/linux+v3.5.3/fs/jbd2/transaction.c#L147).
For non-local file systems the result is more or less for grabs. Just don't try concurrent writing on a networked file system without some form of explicit locking (or you're positively absolutely sure about the semantics of the network file system you're using).
BTW, O_APPEND guarantees that all writes by different processes go to the end of the file. However as SUS above notes, if the writes are really concurrent (occuring at the same time), then the behavior is undefined. On earlier uniprocess and non-pre-emptive UNIXes this didn't really matter, as a call to write(2) completed before someone else got a chance to write...
This question could be answered definitely for specific combination of operating system (Linux?) and file system (ext4?). A general answer? As SUS reads -- "undefined behavior".
I think this is useful to you: "the data written by writev() is written as a single block that is not intermingled with output from writes in other processes", so you can use writev
Several writers to a file may mix up things. But files opened with O_APPEND are appended atomically per write access.
If you want to keep to the C stdio interface instead of the lower level one, fopene the file with "a" or "a+" (which map to O_APPEND), set up a buffer large enough that there is no need to write inside your records and use fsync to force the write when you are done building them. I'm not sure it is guaranteed by POSIX (C says nothing about that).
There is the ultimate solut8ion to all questions of atomicity; a mutex. Wrap your writes to the log file in a mutex and all will be done atomically.
A simpler solution might be to use the GLOG libraries from Google. A fantastic logging system, far better than anything I ever dreamed up, free, not-GPL, and atomic.
One way to interleave them safely would be to have all writers lock the file, write, and unlock.
Functions that can be used for locking are flock(), lockf(), and fcntl().
Beware that ALL writers must lock (and they should all use the same mechanism to do the locking) or one that doesn't bother getting a lock could still write at the same time as another that holds a lock.