Block Linux read(2) until all of count bytes have arrived

Block Linux read(2) until all of count bytes have arrived - c

I'm using read (2) to read from a file (/dev/random, where data arrives very slowly).
However, read() returns after reading only a few bytes, while I'd like it to wait until the specified amount of bytes has been read (or an error has occurred), so the return value should always be count, or -1.
Is there any way to enable this behaviour? The open (2) and read (2) manpages do not contain any useful information on that topic, nor have I found any information regarding the topic on the Internet.
I am fully aware of the workaround of just putting the read() inside a while loop and calling it until all data has been read. I'd just like to know if this can be achieved in a proper way that yields deterministic behaviour and involves only O(1) syscalls, instead of the nondeterministic O(n) in case of the while loop solution.
The following minimal example reproduces the problem.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
int fd = open("/dev/random", 0);
char buf[128];
size_t bytes = read(fd, buf, sizeof(buf));
printf("Bytes read: %lu\n", bytes); //output is random, usually 8.
close(fd);
}

While read can be interrupted by a signal before the requested data is received, it cannot really be done without while.
You have to check the return value and count bytes, unfortunately.
And yes, the easiest way would be to write a wrapping function.

As everyone has said,
There's no way to guarantee that 128 bytes of randomness are available before your read returns, and
The overhead involved in getting eight bytes at a time is trivial compared to the amortized cost of generating the eight bytes; consequently,
You should remember that entropy comes at a huge cost and take that into account when consuming it.
Nonetheless, no answer to this question would be complete without noting that in man 4 random (on a vaguely recent Linux distro) you should find the following information:
The files in the directory /proc/sys/kernel/random
(present since 2.3.16) provide an additional interface
to the /dev/random device.
...
The file read_wakeup_threshold contains the number of bits of
entropy required for waking up processes that sleep waiting
for entropy from /dev/random. The default is 64.
That is, 64 bits, which is eight bytes. With superuser privileges you could increase this value, but imho increasing it to 1024 and then expecting your machine to keep working as normal is probably pretty optimistic. I don't know all the things that want a bit of entropy but I've certainly noticed that my entropy pool goes up and down, so I know something wants it and I strongly suspect that whatever something that is would not be happy having to wait for 1024 bits of it to be available. Anyway, you know have a bit of rope...

From the documentation, /dev/random does its best to return the most relibale randomized data it can, and that limits the number of bytes it returns in one read.
But reading /dev/urandom (notice the 'u') will return as much data as you requested (buffer size), with sometimes less randomized data.
Here is an useful link
About the read() behavior, I'm quite sure this cannot be changed : read() returns the amount of data that the underlying plumbery (for example disk+driver+...) decided to return, it's a by-design behavior. The way to do things is, as you said, to loop until you received as much data as expected.

Related

Is there a portable way to discard a number of readable bytes from a socket-like file descriptor?

Is there a portable way to discard a number of incoming bytes from a socket without copying them to userspace? On a regular file, I could use lseek(), but on a socket, it's not possible. I have two scenarios where I might need it:
A stream of records is arriving on a file descriptor (which can be a TCP, a SOCK_STREAM type UNIX domain socket or potentially a pipe). Each record is preceeded by a fixed size header specifying its type and length, followed by data of variable length. I want to read the header first and if it's not of the type I'm interested in, I want to just discard the following data segment without transferring them into user space into a dummy buffer.
A stream of records of varying and unpredictable length is arriving on a file descriptor. Due to asynchronous nature, the records may still be incomplete when the fd becomes readable, or they may be complete but a piece of the next record already may be there when I try to read a fixed number of bytes into a buffer. I want to stop reading the fd at the exact boundary between the records so I don't need to manage partially loaded records I accidentally read from the fd. So, I use recv() with MSG_PEEK flag to read into a buffer, parse the record to determine its completeness and length, and then read again properly (thus actually removing data from the socket) to the exact length. This would copy the data twice - I want to avoid that by simply discarding the data buffered in the socket by an exact amount.
On Linux, I gather it is possible to achieve that by using splice() and redirecting the data to /dev/null without copying them to userspace. However, splice() is Linux-only, and the similar sendfile() that is supported on more platforms can't use a socket as input. My questions are:
Is there a portable way to achieve this? Something that would work on other UNIXes (primarily Solaris) as well that do not have splice()?
Is splice()-ing into /dev/null an efficient way to do this on Linux, or would it be a waste of effort?
Ideally, I would love to have a ssize_t discard(int fd, size_t count) that simply removes count of readable bytes from a file descriptor fd in kernel (i.e. without copying anything to userspace), blocks on blockable fd until the requested number of bytes is discarded, or returns the number of successfully discarded bytes or EAGAIN on a non-blocking fd just like read() would do. And advances the seek position on a regular file of course :)

The short answer is No, there is no portable way to do that.
The sendfile() approach is Linux-specific, because on most other OSes implementing it, the source must be a file or a shared memory object. (I haven't even checked if/in which Linux kernel versions, sendfile() from a socket descriptor to /dev/null is supported. I would be very suspicious of code that does that, to be honest.)
Looking at e.g. Linux kernel sources, and considering how little a ssize_t discard(fd, len) differs from a standard ssize_t read(fd, buf, len), it is obviously possible to add such support. One could even add it via an ioctl (say, SIOCISKIP) for easy support detection.
However, the problem is that you have designed an inefficient approach, and rather than fix the approach at the algorithmic level, you are looking for crutches that would make your approach perform better.
You see, it is very hard to show a case where the "extra copy" (from kernel buffers to userspace buffers) is an actual performance bottleneck. The number of syscalls (context switches between userspace and kernel space) sometimes is. If you sent a patch upstream implementing e.g. ioctl(socketfd, SIOCISKIP, bytes) for TCP and/or Unix domain stream sockets, they would point out that the performance increase this hopes to achieve is better obtained by not trying to obtain the data you don't need in the first place. (In other words, the way you are trying to do things, is inherently inefficient, and rather than create crutches to make that approach work better, you should just choose a better-performing approach.)
In your first case, a process receiving structured data framed by a type and length identifier, wishing to skip unneeded frames, is better fixed by fixing the transfer protocol. For example, the receiving side could inform the sending side which frames it is interested in (i.e., basic filtering approach). If you are stuck with a stupid protocol that you cannot replace for external reasons, you're on your own. (The FLOSS developer community is not, and should not be responsible for maintaining stupid decisions just because someone wails about it. Anyone is free to do so, but they'd need to do it in a manner that does not require others to work extra too.)
In your second case, you already read your data. Don't do that. Instead, use an userspace buffer large enough to hold two full size frames. Whenever you need more data, but the start of the frame is already past the midway of the buffer, memmove() the frame to start at the beginning of the buffer first.
When you have a partially read frame, and you have N unread bytes from that left that you are not interested in, read them into the unused portion of the buffer. There is always enough room, because you can overwrite the portion already used by the current frame, and its beginning is always within the first half of the buffer.
If the frames are small, say 65536 bytes maximum, you should use a tunable for the maximum buffer size. On most desktop and server machines, with high-bandwidth stream sockets, something like 2 MiB (2097152 bytes or more) is much more reasonable. It's not too much memory wasted, but you rarely do any memory copies (and when you do, they tend to be short). (You can even optimize the memory moves so that only full cachelines are copied, aligned, since leaving almost one cacheline of garbage at the start of the buffer is insignificant.)
I do HPC with large datasets (including text-form molecular data, where records are separated by newlines, and custom parsers for converting decimal integers or floating-point values are used for better performance), and this approach does work well in practice. Simply put, skipping data already in your buffer is not something you need to optimize; it is insignificant overhead compared to simply avoiding doing the things you do not need.
There is also the question of what you wish to optimize by doing that: the CPU time/resources used, or the wall clock used in the overall task. They are completely different things.
For example, if you need to sort a large number of text lines from some file, you use the least CPU time if you simply read the entire dataset to memory, construct an array of pointers to each line, sort the pointers, and finally write each line (using either internal buffering and/or POSIX writev() so that you do not need to do a write() syscall for each separate line).
However, if you wish to minimize the wall clock time used, you can use a binary heap or a balanced binary tree instead of an array of pointers, and heapify or insert-in-order each line completely read, so that when the last line is finally read, you already have the lines in their correct order. This is because the storage I/O (for all but pathological input cases, something like single-character lines) takes longer than sorting them using any robust sorting algorithm! The sorting algorithms that work inline (as data comes in) are typically not as CPU-efficient as those that work offline (on complete datasets), so this ends up using somewhat more CPU time; but because the CPU work is done at a time that is otherwise wasted waiting for the entire dataset to load into memory, it is completed in less wall clock time!
If there is need and interest, I can provide a practical example to illustrate the techniques. However, there is absolutely no magic involved, and any C programmer should be able to implement these (both the buffering scheme, and the sort scheme) on their own. (I do consider using resources like Linux man pages online and Wikipedia articles and pseudocode on for example binary heaps doing it "on your own". As long as you do not just copy-paste existing code, I consider it doing it "on your own", even if somebody or some resource helps you find the good, robust ways to do it.)

Amount of data read() syscall will actually read

Suppose I have a file for which the file descriptor has more than n bytes left until EOF, and I invoke the read() syscall for n bytes. Is the function guaranteed to read n bytes into the buffer? Or can it read less?

The read system call is guaranteed to read as many many characters as you asked for, except when it can't. But it turns out that there are so many exceptions -- so many cases where it can't read as many characters as you asked for -- that it basically ends up being safest to assume that any given read call probably won't read as many characters as you asked for. I believe it's good practice to always write your code with that in mind.
The man page on my system says
The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.
So if it's not a normal file, or if it is a normal file but there aren't enough characters, you'll get fewer than you asked for. But in the case you asked about, yes, you should be guaranteed to get exactly as many characters as you asked for.
With that said, though, if you find yourself with a choice between assuming that read is allegedly guaranteed to read exactly the number of characters requested, versus acknowledging that it might return less, I would always write the code to assume it might return less. That is, if you have a call like
r = read(fd, buf, n);
there isn't usually much to be gained by assuming that if r is greater than 0, it must be exactly n. Your code has to be able to handle the r < n case so it will behave properly when it's almost at end-of-file, so unless you want to have two different code paths (one for "normal" reads, and one for the last read), you might as well write one piece of code, that can handle the r < n case, and let it operate all the time.
(Also, as Zan Lynx reminds in a comment, don't have the code notice that r < n, and infer from that that end-of-file is coming up soon. Wait for r == 0 before deciding you're at end-of-file.)

You could've read it from the man page yourself:
On Linux, read() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
actually transferred. (This is true on both 32-bit and 64-bit
systems.)
So even if you had enough RAM and so on, you couldn't read a full-size DVD image in one go - however, this wouldn't be the sane thing to do either; to access such large files, mmap would be better.
Other than that, a signal might be delivered, which can cause exit with EINTR and buffer contents indeterminate.
ERRORS
[...]
EINTR The call was interrupted by a signal before any data was read; see signal(7).

Is the function guaranteed to read n bytes into the buffer? Or can it
read less?
No, even if your file has more than n bytes before its end, the read(fd, buf, n) function is not guaranteed to read n bytes into bufffer and then return n. It can read less and return a positive value that is less than n.
See Linux man page at https://man7.org/linux/man-pages/man2/read.2.html
RETURN VALUE
It is not an error if this number is smaller than the number of
bytes requested; this may happen for example because fewer bytes
are actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.

Why fread does have thread safe requirements which slows down its call

I am writing a function to read binary files that are organized as a succession of (key, value) pairs where keys are small ASCII strings and value are int or double stored in binary format.
If implemented naively, this function makes a lot of call to fread to read very small amount of data (usually no more than 10 bytes). Even though fread internally uses a buffer to read the file, I have implemented my own buffer and I have observed speed up by a factor of 10 on both Linux and Windows. The buffer size used by fread is large enough and the function call cannot be responsible for such a slowdown. So I went and dug into the GNU implementation of fread and discovered some lock on the file, and many other things such as verifying that the file is open with read access and so on. No wonder why fread is so slow.
But what is the rationale behind fread being thread-safe where it seems that multiple thread can call fread on the same file which is mind boggling to me. These requirements make it slow as hell. What are the advantages?

Imagine you have a file where each 5 bytes can be processed in parallel (let's say, pixel by pixel in an image):
123456789A
One thread needs to pick 5 bytes "12345", the next one the next 5 bytes "6789A".
If it was not thread-safe different threads could pick-up wrong chunks. For example: "12367" and "4589A" or even worst (unexpected behaviour, repeated bytes or worst).

As suggested by nemequ:
Note that if you're on glibc you can use the _unlocked variants (*e.g., fread_unlocked). On Windows you can define _CRT_DISABLE_PERFCRIT_LOCKS

Stream I/O is already as slow as molasses. Programmers think that a read from main memory (1000x longer than a CPU cycle) is ages. A read from the physical disk or a network may as well be eternity.
I don't know if that's the #1 reason why the library implementers were ok with adding the lock overhead, but I guarantee it played a significant part.
Yes, it slows it down, but as you discovered, you can manually buffer the read and use your own handling to increase the speed when performance really matters. (That's the key--when you absolutely must read the data as fast as possible. Don't bother manually buffering in the general case.)
That's a rationalization. I'm sure you could think of more!

speed comparison between fgetc/fputc and fread/fwrite in C

So(just for fun), i was just trying to write a C code to copy a file. I read around and it seems that all the functions to read from a stream call fgetc() (I hope this is this true?), so I used that function:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define FILEr "img1.png"
#define FILEw "img2.png"
main()
{
clock_t start,diff;
int msec;
FILE *fr,*fw;
fr=fopen(FILEr,"r");
fw=fopen(FILEw,"w");
start=clock();
while((!feof(fr)))
fputc(fgetc(fr),fw);
diff=clock()-start;
msec=diff*1000/CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec/1000, msec%1000);
fclose(fr);
fclose(fw);
}
This gave a run time of 140 ms for this file on a 2.10Ghz core2Duo T6500 Dell inspiron laptop.
However, when I try using fread/fwrite, I get decreasing run time as I keep increasing the number of bytes(ie. variable st in the following code) transferred for each call until it peaks at around 10ms! Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define FILEr "img1.png"
#define FILEw "img2.png"
main()
{
clock_t start,diff;
// number of bytes copied at each step
size_t st=10000;
int msec;
FILE *fr,*fw;
// placeholder for value that is read
char *x;
x=malloc(st);
fr=fopen(FILEr,"r");
fw=fopen(FILEw,"w");
start=clock();
while(!feof(fr))
{
fread(x,1,st,fr);
fwrite(x,1,st,fw);
}
diff=clock()-start;
msec=diff*1000/CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec/1000, msec%1000);
fclose(fr);
fclose(fw);
free(x);
}
Why this is happening? I.e if fread is actually multiple calls to fgetc then why the speed difference?
EDIT: specified that "increasing number of bytes" refers to the variable st in the second code

fread() is not calling fgetc() to read each byte.
It behaves as if calling fgetc() repeatedly, but it has direct access to the buffer that fgetc() reads from so it can directly copy a larger quantity of data.

You are forgetting about file buffering (inode, dentry and page caches).
Clear them before you run:
echo 3 > /proc/sys/vm/drop_caches
Backgrounder:
Benchmarking is an art. Refer to bonnie++, iozone and phoronix for proper filesystem benchmarking. As a characteristic, bonnie++ won't allow a benchmark with a written volume of less than 2x the available system memory.
Why?
(answer: buffering effects!)

Like sehe says its partly because buffering, but there is more to it and I'll explain why is that and at the same why fgetc() will give more latency.
fgetc() is called for every byte that is read from from file.
fread() is called for every n bytes of the local buffer for file data.
So for a 10MiB file:
fgetc() is called: 10 485 760 times
While fread with a 1KiB buffer the function called 10 240 times.
Lets say for simplicity that every function call takes 1ms:
fgetc would take 10 485 760 ms = 10485.76 seconds ~ 2,9127 hours
fread would take 10 240 ms = 10.24 seconds
On top of that the OS does reading and writing on usually the same device, I suppose your example does it on the same hard disk. The OS when reading your source file, move the hard disk heads over the spinning disk platters seeking the file and then reads 1 byte, put it on memory, then move again the reading/writing head over the hard disk spinning platters looking on the place that the OS and the hard disk controller agreed to locate the destination file and then writes 1 byte from memory. For the above example this happens over 10 million times for each file: totaling over 20 million times, using the buffered version this happens just a grand total of over 20 000 times.
Besides that the OS when reading the disk puts in memory a few more KiB of hard disk data for performance purposes, an this can speed up the program even when using the less efficient fgetc because the program read from the OS memory instead of reading directly from the hard disk. This is to what sehe's response refers.
Depending on your machine configuration/load/OS/etc your results from reading and writing can vary a lot, hence his recommendation to empty the disk caches to grasp better more meaningful results.
When source and destination files are on different hdd things are a lot faster. With SDDs I'm not really sure if reading/writing are absolutely exclusive of each other.
Summary: Every call to a function has certain overhead, reading from a HDD has other overheads and caches/buffers help to get things faster.
Other info
http://en.wikipedia.org/wiki/Disk_read-and-write_head
http://en.wikipedia.org/wiki/Hard_disk#Components

stdio functions will fill a read buffer, of size "BUFSIZ" as defined in stdio.h, and will only make one read(2) system call every time that buffer is drained. They will not do an individual read(2) system call for every byte consumed -- they read large chunks. BUFSIZ is typically something like 1024 or 4096.
You can also adjust that buffer's size, if you wish, to increase it -- see the man pages for setbuf/setvbuf/setbuffer on most systems -- though that is unlikely to make a huge difference in performance.
On the other hand, as you note, you can make a read(2) system call of arbitrary size by setting that size in the call, though you get diminishing returns with that at some point.
BTW, you might as well use open(2) and not fopen(3) if you are doing things this way. There is little point in fopen'ing a file you are only going to use for its file descriptor.

Trouble implementing a (stable) method of retrieving n bytes from file

One of the purposes of the library of which I am developing is to retrieve a specified amount of bytes from a file, in this specific case I am wishing for access to /dev/random to retrieve entropy based random sequences.
My main issue with fread is that it will hang indefinitely when waiting for more entropy, and this is unwanted. My next choice would have been wrapping fread with feof to take bytes in chunks, then I could at least provide percentages complete for a better experience, although from what I could gather iteration 1, 2, 3, 4..'s bytes will be hard to track to equal exactly the amount needed.
Is there a method in a C standard that allows for what I am looking for, exact amount wanted and in chunks? If I were to look for timeouts of this, would threading the data request be a good option to look at?

Define "standard". Do you mean the ISO C standard? POSIX? Linux standards base (LSB)? For POSIX, the read call lets you specify the size of the buffer that you are trying to read. You can use pselect or poll to determine if there are bytes available to be read, with a timeout instead of blocking. On Linux, it is possible to use the "FIONREAD" ioctl call to obtain the exact number of bytes available for reading.
That said, you should ask yourself if you need that level of entropy. You might (or might not) be able to get away with reading from "/dev/urandom". Of course, you would have to determine if that is the case.

Try this
Here is the man page for a function I think will solve your problem.
http://www.manpagez.com/man/3/fgets/
I just saw that fread wasnt working, fgets reads a certain number of byes from file stream into buffer

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight