I'm reading from /proc/pid/task/stat to keep track of cpu usage in a thread.
fopen on /proc/pic/task/stat
fget a string from the stream
sscanf on the string
I am having issues however getting the streams buffer to update.
If I fget 1024 characters if regreshes but if I fget 128 characters then it never updates and I always get the same stats.
I rewind the stream before the read and have tried fsync.
I do this very frequently so I'd rather not reopen to file each time.
What is the right way to do this?
Not every program benefits from the use of buffered I/O.
In your case, I think I would just use read(2)1. This way, you:
eliminate all stale buffer2 issues
probably run faster via the elimination of double buffering
probably use less memory
definitely simplify the implementation
For a case like you describe, the efficiency gain may not matter on today's remarkably powerful CPUs. But I will point out that programs like cp(2) and other heavy-duty data movers don't use buffered I/O packages.
1. That is, open(2), read(2), lseek(2), and close(2).
2. And perhaps to intercept an argument, on questions related to this one someone usually offers a "helpful" suggestion along the lines of fflush(stdin), and then another someone comes along to accurately point out that fflush() is defined by C99 on output streams only, and that it's usually unwise to depend on implementation-specific behavior.
Related
Is there a portable way to discard a number of incoming bytes from a socket without copying them to userspace? On a regular file, I could use lseek(), but on a socket, it's not possible. I have two scenarios where I might need it:
A stream of records is arriving on a file descriptor (which can be a TCP, a SOCK_STREAM type UNIX domain socket or potentially a pipe). Each record is preceeded by a fixed size header specifying its type and length, followed by data of variable length. I want to read the header first and if it's not of the type I'm interested in, I want to just discard the following data segment without transferring them into user space into a dummy buffer.
A stream of records of varying and unpredictable length is arriving on a file descriptor. Due to asynchronous nature, the records may still be incomplete when the fd becomes readable, or they may be complete but a piece of the next record already may be there when I try to read a fixed number of bytes into a buffer. I want to stop reading the fd at the exact boundary between the records so I don't need to manage partially loaded records I accidentally read from the fd. So, I use recv() with MSG_PEEK flag to read into a buffer, parse the record to determine its completeness and length, and then read again properly (thus actually removing data from the socket) to the exact length. This would copy the data twice - I want to avoid that by simply discarding the data buffered in the socket by an exact amount.
On Linux, I gather it is possible to achieve that by using splice() and redirecting the data to /dev/null without copying them to userspace. However, splice() is Linux-only, and the similar sendfile() that is supported on more platforms can't use a socket as input. My questions are:
Is there a portable way to achieve this? Something that would work on other UNIXes (primarily Solaris) as well that do not have splice()?
Is splice()-ing into /dev/null an efficient way to do this on Linux, or would it be a waste of effort?
Ideally, I would love to have a ssize_t discard(int fd, size_t count) that simply removes count of readable bytes from a file descriptor fd in kernel (i.e. without copying anything to userspace), blocks on blockable fd until the requested number of bytes is discarded, or returns the number of successfully discarded bytes or EAGAIN on a non-blocking fd just like read() would do. And advances the seek position on a regular file of course :)
The short answer is No, there is no portable way to do that.
The sendfile() approach is Linux-specific, because on most other OSes implementing it, the source must be a file or a shared memory object. (I haven't even checked if/in which Linux kernel versions, sendfile() from a socket descriptor to /dev/null is supported. I would be very suspicious of code that does that, to be honest.)
Looking at e.g. Linux kernel sources, and considering how little a ssize_t discard(fd, len) differs from a standard ssize_t read(fd, buf, len), it is obviously possible to add such support. One could even add it via an ioctl (say, SIOCISKIP) for easy support detection.
However, the problem is that you have designed an inefficient approach, and rather than fix the approach at the algorithmic level, you are looking for crutches that would make your approach perform better.
You see, it is very hard to show a case where the "extra copy" (from kernel buffers to userspace buffers) is an actual performance bottleneck. The number of syscalls (context switches between userspace and kernel space) sometimes is. If you sent a patch upstream implementing e.g. ioctl(socketfd, SIOCISKIP, bytes) for TCP and/or Unix domain stream sockets, they would point out that the performance increase this hopes to achieve is better obtained by not trying to obtain the data you don't need in the first place. (In other words, the way you are trying to do things, is inherently inefficient, and rather than create crutches to make that approach work better, you should just choose a better-performing approach.)
In your first case, a process receiving structured data framed by a type and length identifier, wishing to skip unneeded frames, is better fixed by fixing the transfer protocol. For example, the receiving side could inform the sending side which frames it is interested in (i.e., basic filtering approach). If you are stuck with a stupid protocol that you cannot replace for external reasons, you're on your own. (The FLOSS developer community is not, and should not be responsible for maintaining stupid decisions just because someone wails about it. Anyone is free to do so, but they'd need to do it in a manner that does not require others to work extra too.)
In your second case, you already read your data. Don't do that. Instead, use an userspace buffer large enough to hold two full size frames. Whenever you need more data, but the start of the frame is already past the midway of the buffer, memmove() the frame to start at the beginning of the buffer first.
When you have a partially read frame, and you have N unread bytes from that left that you are not interested in, read them into the unused portion of the buffer. There is always enough room, because you can overwrite the portion already used by the current frame, and its beginning is always within the first half of the buffer.
If the frames are small, say 65536 bytes maximum, you should use a tunable for the maximum buffer size. On most desktop and server machines, with high-bandwidth stream sockets, something like 2 MiB (2097152 bytes or more) is much more reasonable. It's not too much memory wasted, but you rarely do any memory copies (and when you do, they tend to be short). (You can even optimize the memory moves so that only full cachelines are copied, aligned, since leaving almost one cacheline of garbage at the start of the buffer is insignificant.)
I do HPC with large datasets (including text-form molecular data, where records are separated by newlines, and custom parsers for converting decimal integers or floating-point values are used for better performance), and this approach does work well in practice. Simply put, skipping data already in your buffer is not something you need to optimize; it is insignificant overhead compared to simply avoiding doing the things you do not need.
There is also the question of what you wish to optimize by doing that: the CPU time/resources used, or the wall clock used in the overall task. They are completely different things.
For example, if you need to sort a large number of text lines from some file, you use the least CPU time if you simply read the entire dataset to memory, construct an array of pointers to each line, sort the pointers, and finally write each line (using either internal buffering and/or POSIX writev() so that you do not need to do a write() syscall for each separate line).
However, if you wish to minimize the wall clock time used, you can use a binary heap or a balanced binary tree instead of an array of pointers, and heapify or insert-in-order each line completely read, so that when the last line is finally read, you already have the lines in their correct order. This is because the storage I/O (for all but pathological input cases, something like single-character lines) takes longer than sorting them using any robust sorting algorithm! The sorting algorithms that work inline (as data comes in) are typically not as CPU-efficient as those that work offline (on complete datasets), so this ends up using somewhat more CPU time; but because the CPU work is done at a time that is otherwise wasted waiting for the entire dataset to load into memory, it is completed in less wall clock time!
If there is need and interest, I can provide a practical example to illustrate the techniques. However, there is absolutely no magic involved, and any C programmer should be able to implement these (both the buffering scheme, and the sort scheme) on their own. (I do consider using resources like Linux man pages online and Wikipedia articles and pseudocode on for example binary heaps doing it "on your own". As long as you do not just copy-paste existing code, I consider it doing it "on your own", even if somebody or some resource helps you find the good, robust ways to do it.)
I am writing a function to read binary files that are organized as a succession of (key, value) pairs where keys are small ASCII strings and value are int or double stored in binary format.
If implemented naively, this function makes a lot of call to fread to read very small amount of data (usually no more than 10 bytes). Even though fread internally uses a buffer to read the file, I have implemented my own buffer and I have observed speed up by a factor of 10 on both Linux and Windows. The buffer size used by fread is large enough and the function call cannot be responsible for such a slowdown. So I went and dug into the GNU implementation of fread and discovered some lock on the file, and many other things such as verifying that the file is open with read access and so on. No wonder why fread is so slow.
But what is the rationale behind fread being thread-safe where it seems that multiple thread can call fread on the same file which is mind boggling to me. These requirements make it slow as hell. What are the advantages?
Imagine you have a file where each 5 bytes can be processed in parallel (let's say, pixel by pixel in an image):
123456789A
One thread needs to pick 5 bytes "12345", the next one the next 5 bytes "6789A".
If it was not thread-safe different threads could pick-up wrong chunks. For example: "12367" and "4589A" or even worst (unexpected behaviour, repeated bytes or worst).
As suggested by nemequ:
Note that if you're on glibc you can use the _unlocked variants (*e.g., fread_unlocked). On Windows you can define _CRT_DISABLE_PERFCRIT_LOCKS
Stream I/O is already as slow as molasses. Programmers think that a read from main memory (1000x longer than a CPU cycle) is ages. A read from the physical disk or a network may as well be eternity.
I don't know if that's the #1 reason why the library implementers were ok with adding the lock overhead, but I guarantee it played a significant part.
Yes, it slows it down, but as you discovered, you can manually buffer the read and use your own handling to increase the speed when performance really matters. (That's the key--when you absolutely must read the data as fast as possible. Don't bother manually buffering in the general case.)
That's a rationalization. I'm sure you could think of more!
I am making something similar to commit log in database system. The system is able to handle ~ 20,000 events / sec. Each event occupies ~16 bytes. Roughly, the system will write to commit log at a speed of ~312.5 kB / sec. Each commit log file will contain at most of 500,000 events.
I have a question that: Should I call fopen - fwrite - fclose for each event, OR should I call fopen once when creating a new file, then a series of fwrite and finally fclose?
In such cases, it might be even better to revert to open/write/close and get rid of C buffered output completely. Log files are typically consisting of a high volume of nearly identical (size-wise) writes and do not really gain much from C buffering. Low level, unbuffered I/O would also relieve you of calling fflush() and can guarantee to write every single log entry as atomic entity.
Given the volume you mentioned, you should probably still not close and re-open the file between writes.
fopen/fwrite/fclose 20k times per second looks pretty expensive.
Consider calling fflush as an alternative.
If you are looking to use it in order to record database transactions for possible recovery, you may need to rethink it. The f family of functions use buffering so in the event of a crash the final buffer may or may not have actually made it to disk.
You're not obliged to, no... and in fact it would be a much better idea to call fflush as suggested by EvilTeach's answer.
However, better yet, if you can avoid calling fflush that would be ideal since the C standard library might (probably will) implement system-specific caching to unite smaller physical writes into larger physical writes, making your 20k writes per second more optimal.
Calling fopen/fwrite/fclose as you suggested, or fflush as EvilTeach suggested would elude that caching, which will probably degrade performance.
I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen
I'm writing a program where performance is quite important, but not critical. Currently I am reading in text from a FILE* line by line and I use fgets to obtain each line. After using some performance tools, I've found that 20% to 30% of the time my application is running, it is inside fgets.
Are there faster ways to get a line of text? My application is single-threaded with no intentions to use multiple threads. Input could be from stdin or from a file. Thanks in advance.
You don't say which platform you are on, but if it is UNIX-like, then you may want to try the read() system call, which does not perform the extra layer of buffering that fgets() et al do. This may speed things up slightly, on the other hand it may well slow things down - the only way to find out is to try it and see.
Use fgets_unlocked(), but read carefully what it does first
Get the data with fgetc() or fgetc_unlocked() instead of fgets(). With fgets(), your data is copied into memory twice, first by the C runtime library from a file to an internal buffer (stream I/O is buffered), then from that internal buffer to an array in your program
Read the whole file in one go into a buffer.
Process the lines from that buffer.
That's the fastest possible solution.
You might try minimizing the amount of time you spend reading from the disk by reading large amounts of data into RAM then working on that. Reading from disk is slow, so minimize the amount of time you spend doing that by reading (ideally) the entire file once, then working on it.
Sorta like the way CPU cache minimizes the time the CPU actually goes back to RAM, you could use RAM to minimize the number of times you actually go to disk.
Depending on your environment, using setvbuf() to increase the size of the internal buffer used by file streams may or may not improve performance.
This is the syntax -
setvbuf (InputFile, NULL, _IOFBF, BUFFER_SIZE);
Where InputFile is a FILE* to a file just opened using fopen() and BUFFER_SIZE is the size of the buffer (which is allocated by this call for you).
You can try various buffer sizes to see if any have positive influence. Note that this is entirely optional, and your runtime may do absolutely nothing with this call.
If the data is coming from disk, you could be IO bound.
If that is the case, get a faster disk (but first check that you're getting the most out of your existing one...some Linux distributions don't optimize disk access out of the box (hdparm)), stage the data into memory (say by copying it to a RAM disk) ahead of time, or be prepared to wait.
If you are not IO bound, you could be wasting a lot of time copying. You could benefit from so-called zero-copy methods. Something like memory map the file and only access it through pointers.
That is a bit beyond my expertise, so you should do some reading or wait for more knowledgeable help.
BTW-- You might be getting into more work than the problem is worth; maybe a faster machine would solve all your problems...
NB-- It is not clear that you can memory map the standard input either...
If the OS supports it, you can try asynchronous file reading, that is, the file is read into memory whilst the CPU is busy doing something else. So, the code goes something like:
start asynchronous read
loop:
wait for asynchronous read to complete
if end of file goto exit
start asynchronous read
do stuff with data read from file
goto loop
exit:
If you have more than one CPU then one CPU reads the file and parses the data into lines, the other CPU takes each line and processes it.