Fastest file reading in C - c

Right now I am using fread() to read a file, but in other language fread() is inefficient i'v been told. Is this the same in C? If so, how would faster file reading be done?

It really shouldn't matter.
If you're reading from an actual hard disk, it's going to be slow. The hard disk is your bottle neck, and that's it.
Now, if you're being silly about your call to read/fread/whatever, and say, fread()-ing a byte at a time, then yes, it's going to be slow, as the overhead of fread() will outstrip the overhead of reading from the disk.
If you call read/fread/whatever and request a decent portion of data. This will depend on what you're doing: sometimes all want/need is 4 bytes (to get a uint32), but sometimes you can read in large chunks (4 KiB, 64 KiB, etc. RAM is cheap, go for something significant.)
If you're doing small reads, some of the higher level calls like fread() will actual help you by buffering data behind your back. If you're doing large reads, it might not be helpful, but switching from fread to read will probably not yield that much improvement, as you're bottlenecked on disk speed.
In short: if you can, request a liberal amount when reading, and try to minimize what you write. For large amounts, powers of 2 tend to be friendlier than anything else, but of course, it's OS, hardware, and weather dependent.
So, let's see if this might bring out any differences:
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)
double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec / 1000000.;
}
int main()
{
unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer
double end_time;
double total_time;
int i, x, y;
double start_time = now();
#ifdef USE_FREAD
FILE *fp;
fp = fopen("/dev/zero", "rb");
for(i = 0; i < ITERATIONS; ++i)
{
fread(buffer, BUFFER_SIZE, 1, fp);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
fclose(fp);
#elif USE_MMAP
unsigned char *mmdata;
int fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
// But if we don't touch it, it won't be read...
// I happen to know I have 4 KiB pages, YMMV
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += mmdata[x];
}
munmap(mmdata, BUFFER_SIZE);
}
close(fd);
#else
int fd;
fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
read(fd, buffer, BUFFER_SIZE);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
close(fd);
#endif
end_time = now();
total_time = end_time - start_time;
printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);
return 0;
}
...yields:
$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.
...or no noticeable difference. (fread is winning sometimes, sometimes read)
Note: The slow mmap is surprising. This might be due to me asking it to allocate the buffer for me. (I wasn't sure about requirements of supplying a pointer...)
In really short: Don't prematurely optimize. Make it run, make it right, make it fast, that order.
Back by popular demand, I ran the test on a real file. (The first 675 MiB of the Ubuntu 10.04 32-bit desktop installation CD ISO) These were the results:
# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.
...and one very bored programmer later, we've read the CD ISO off disk. 12 times. Before each test, the disk cache was cleared, and during each test there was enough, and approximately the same amout of, RAM free to hold the CD ISO twice in RAM.
One note of interest: I was originally using a large malloc() to fill memory and thus minimize the effects of disk caching. It may be worth noting that mmap performed terribly here. The other two solutions merely ran, mmap ran and, for reasons I can't explain, began pushing memory to swap, which killed its performance. (The program was not leaking, as far as I know (the source code is above) - the actual "used memory" stayed constant throughout the trials.)
read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially fread and read...)

If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.
For Posix, check out mmap and for Windows check out OpenFileMapping

What's slowing you down?
If you need the fastest possible file reading (while still playing nicely with the operating system), go straight to your OS's calls, and make sure you study how to use them most effectively.
How is your data physically laid out? For example, rotating drives might read data stored at the edges faster, and you want to minimize or eliminate seek times.
Is your data pre-processed? Do you need to do stuff between loading it from disk and using it?
What is the optimum chunk size for reading? (It might be some even multiple of the sector size. Check your OS documentation.)
If seek times are a problem, re-arrange your data on disk (if you can) and store it in larger, pre-processed files instead of loading small chunks from here and there.
If data transfer times are a problem, perhaps consider compressing the data.

I'm thinking of the read system call.
Keep in mind that fread is a wrapper for 'read'.
On the other hand fread has an internal buffer, so 'read' may be faster but i think 'fread' will be more efficient.

If fread is slow it is because of the additional layers it adds to the underlying operating system mechanism to read from a file that interfere with how your particular program is using fread. In other words, it's slow because you aren't using it the way it has been optimized for.
Having said that, faster file reading would be done by understanding how the operating system I/O functions work and providing your own abstraction that handles your program's particular I/O access patterns better. Most of the time you can do this with memory mapping the file.
However, if you are hitting the limits of the machine you are running on, memory mapping probably won't be sufficient. At that point it's really up to you to figure out how to optimize your I/O code.

It's not the fastest but it's pretty good and short.
#include <fcntl.h>
#include <unistd.h>
int main() {
int f = open("file1", O_RDWR);
char buffer[4096];
while ( read(f, buffer, 4096) > 0 ) {
printf("%s", buffer);
}
}

The problem that some people have noted here, is that depending on your source, your target buffer size, etc, you can create a custom handler for that specific case, but there are other cases, like block/character devices, i.e. /dev/* where standard rules like that do or don't apply and your backing source might be something that pops character off serially without any buffering, like an I2C bus, standard RS-232, etc. And there are some other sources where character devices are memory mappable large sections of memory like nvidia does with their video driver character device (/dev/nvidiactl).
One other design implementation that many people have chosen in high-performance applications is asynchronous instead of synchronous I/O for handling how data is read. Look into libaio, and the ported versions of libaio which provide prepackaged solutions for asynchronous I/O, as well as look into using read with shared memory between a worker and consumer thread (but keep in mind that this will increase programming complexity if you go this route). Asynchronous I/O is also something that you can't get out of the box with stdio that you can get with standard OS system calls. Just be careful as there are bits of read which are `portable' according to the spec, but not all operating systems (like FreeBSD for instance) support POSIX STREAMs (by choice).
Another thing that you can do (depending on how portable your data is) is look into compression and/or conversion into a binary format like database formats, i.e. BDB, SQL, etc. Some database formats are portable across machines using endianness conversion functions.
In general it would be best to take a set of algorithms and methods, run performance tests using the different methods, and evaluate the best algorithm that serves the mean task that your application would serve. That would help you determine what the best performing algorithm is.

Maybe check out how perl does it. Perl's I/O routines are optimized, and are, I gather, the reason why processing text with a perl filter can be twice as fast as doing the same transformation with sed.
Obviously perl is pretty complex, and I/O is only one small part of what it does. I've never looked at its source so I couldn't give you any better directions than to point you here.

Related

How to correctly time speed of writing to a disk (i.e. file) in C

I have a Zynq SoC which runs a Linux system and would like to time the speed at which I can write to its SD card in C.
I have tried out clock() and clock_gettime() where for the latter, I have done the following:
#define BILLION 1000000000.0
#define BUFFER_SIZE 2097152
...
for (int i=0; i<100; i++) {
memcpy(buf, channel->dma + offset, BUFFER_SIZE);
/* Open file for output */
FILE *f = fopen(path, "w");
/* Start timer */
clock_gettime(CLOCK_MONOTONIC, &start);
/* Write data from cpu ram to sd card*/
fwrite(buf, 1, BUFFER_SIZE, f);
fclose(f);
/* Stop timer and report result */
clock_gettime(CLOCK_MONOTONIC, &end);
elapsed_time = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / BILLION;
printf("%lf seconds\n", elapsed_time);
}
The times that I'm getting are around 0.019 seconds for 2097152 bytes which I believe is wrong because the speed of writing to the SD card is quoted as 10 MB/s and can't possibly be the 110 MB/s that I appear to be getting. When I, instead put the timing code outside the loop and, basically, time how long it takes to write the whole 100*BUFFER_SIZE, I get a more reasonable 18 MB/s.
My question is, is it incorrect to try to time a single fwrite operation? Does clock_gettime() not give me the adequate accuracy/precision for that? I would like to have the most accurate value possible for how quickly I can write to the disk because it is a very important parameter in the design of the system and so this discrepancy I'm getting is rather disconcerting.
The Linux Kernel often caches read write access to disk storage. That is, it returns from the write call and does the actual writing in the background. It does that transparently, so that if you would read the same file, just after writing, you would get the results you wrote, even if they are not yet transferred and written completely to disk.
To force a complete write you can call the fsync function. It blocks until all file IO for a specific file has completed. In your case a call to
#include <unistd.h>
...
fwrite(...);
fsync(fileno(f));
should suffice.
Edit
as #pmg mentioned in the comments, there is also be buffering at stream level. Although it probably is not that large of a buffer, you can force the stream buffer to be written with a call to fflush() before the fsync.

Same program is 10 times slower on windows

I wrote a simple c program that copies 10 million bytes from a file and pastes them in reverse order on another file (this is done one byte at a time, I know it's not efficient but it's just to make some tests), I don't understand why on linux it takes 2.5 seconds while on windows it takes more than 20 seconds. I run the same program changing only the paths.
I use windows 10 and archlinux, the files are on an ntfs partition.
code on windows
#include <stdio.h>
#include <time.h>
void get_nth_byte(FILE *fp, int nth_index,unsigned char* output){
fseek(fp,nth_index,SEEK_SET);
fread(output, sizeof(unsigned char), 1,fp);
}
int main() {
clock_t begin = clock();
//
FILE* input = fopen( "C:\\Users\\piero\\Desktop\\input.txt","rb");
FILE* output = fopen("C:\\Users\\piero\\Desktop\\output.txt","wb");
unsigned char byte;
for (int i = 10000000; i > 0; i--) {
get_nth_byte(input,i,&byte);
fwrite(&byte, sizeof(unsigned char),1,output);
}
//
clock_t end = clock();
double result = (double) (end - begin)/CLOCKS_PER_SEC;
printf("%f",result);
return 0;
}
code on linux
#include <stdio.h>
#include <time.h>
void get_nth_byte(FILE *fp, int nth_index,unsigned char* output){
fseek(fp,nth_index,SEEK_SET);
fread(output, sizeof(unsigned char), 1,fp);
}
int main() {
clock_t begin = clock();
//
FILE* input = fopen( "/run/media/piero/Windows/Users/piero/Desktop/input.txt","rb");
FILE* output = fopen("/run/media/piero/Windows/Users/piero/Desktop/output.txt","wb");
unsigned char byte;
for (int i = 10000000; i > 0; i--) {
get_nth_byte(input,i,&byte);
fwrite(&byte, sizeof(unsigned char),1,output);
}
//
clock_t end = clock();
double result = (double) (end - begin)/CLOCKS_PER_SEC;
printf("%f",result);
return 0;
}
output on linux : 2.224549
output on windows : 25.349647
UPDATE
I solved the problem by using cygwin rather than mingwin, now it takes about 4.3 seconds
This is a great demonstration of how it's not the code we write that runs, it's the executable that the compiler makes from the code that runs.
It is possible that your Windows C compiler is not as advanced as your Linux C compiler, and is not optimizing your code as well as it could, or it's possible that the libraries that the Windows compiler is linking to for fread() and fwrite() are slower than the equivalent libraries in the Linux system.
If I had to put up my best guess, the Linux C compiler probably noticed that it would be more efficient to read more than one byte at a time, and it could do that without affecting the semantics of your program, and the Windows compiler either didn't infer the same, or wasn't able to optimize in the same way due to some underlying proprietary filesystem thing that only Microsoft engineers understand.
I can't say for sure without a peek at the disassembled binaries
One of the strengths of Unix/Linux is that files are designed to be treated as streams of bytes, with it being maximally easy and efficient to seek to the n'th byte using fseek or lseek.
Non-Unix operating systems, such as Windows, tend to have to work much harder to implement those seek operations. In the worst case, they may actually need to read through the file, counting characters as they go.
Your code opens both files in binary mode, and this should reduce the need for the fseek implementation to perform any expensive emulations. In text mode, a 10x performance penalty for heavy fseek use wouldn't surprise me. I'm much more surprised you're seeing it in binary mode.
[Disclaimer: strictly speaking, in text mode fseek is not defined as seeking to an arbitrary byte offset at all, but rather, only to a position defined by the number returned by a previous call to ftell. If an implementation takes advantage of that freedom, it can reduce the performance penalty for text-mode fseek operations, also, but it then means that code like yours, that constructs positions to seek to on the assumption that they're pure byte offsets, may not work at all.]

C's write() throughput inconsistency for larger buffer

I wrote a small program to measure the system time for C's write() call. I keep appending a buffer to a file until it reaches a certain file size. But for two different buffer sizes, I get drastically different numbers.
Here is a snippet of case #1:
char buffer1[] = malloc(4* 1024);
for(int i=0; i< 1048576; i++){
int w = write(outputfile, buffer1, sizeof(buffer1));
}
Here is a snippet of case #2:
char buffer2[] = malloc(1024* 1024);
for(int i=0; i< 4096; i++){
int w = write(outputfile, buffer2, sizeof(buffer2));
}
You can see in both cases the program writes 1048576x4kB = 4096x1024kB = ~4096 Mbytes of data to a file.
In my machine (8 gig ddr3 ram, core i7, 240gb ssd), case#1 takes 14.96 sys time to finish, giving me a throughput approx 274 MB/s.
Whereas case#2 takes 0.9 seconds sys time to finish, giving me a throughput approx 4551 MB/s.
There are some intermediate runs I did for some other buffer sizes, they also produce highly varying numbers.
I know a larger buffer size means less number of calls to write() function. But, isn't that each call should have taken longer, and eventually the overall time it takes to finish writing on file should be the same, regardless of buffer size? Why does the throughput vary so much for varying buffer sizes?
Here is the program:
https://drive.google.com/file/d/1Bj_CnO8DqFrOO3WwbsZbzFYjHLijTW7A/view?usp=sharing

Fast I/O in c, stdin/out

In a coding competition specified at this link there is a task where you need to read much data on stdin, do some calculations and present a whole lot of data on stdout.
In my benchmarking it is almost only i/o that takes time although I have tried optimizing it as much as possible.
What you have as input is a string (1 <= len <= 100'000) and q rows of pair of int where q also is 1 <= q <= 100'000.
I benchmarked my code on a 100 times larger dataset (len = 10M, q = 10M) and this is the result:
Activity time accumulated
Read text: 0.004 0.004
Read numbers: 0.146 0.150
Parse numbers: 0.200 0.350
Calc answers: 0.001 0.351
Format output: 0.037 0.388
Print output: 0.143 0.531
By implementing my own formating and number parsing inline i managed to get the time down to 1/3 of the time when using printf and scanf.
However when I uploaded my solution to the competitions webpage my solution took 1.88 seconds (I think that is the total time over 22 datasets). When I look in the high-score there are several implementations (in c++) that finished in 0.05 seconds, nearly 40 times faster than mine! How is that possible?
I guess that I could speed it up a bit by using 2 threads, then I can start calculating and writing to stdout while still reading from stdin. This will however decrease the time to min(0.150, 0.143) in a theoretical best case on my large dataset. I'm still nowhere close to the highscore..
In the image below you can see the statistics of the consumed time.
The program gets compiled by the website with this options:
gcc -g -O2 -std=gnu99 -static my_file.c -lm
and timed like this:
time ./a.out < sample.in > sample.out
My code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN (100000 + 1)
#define ROW_LEN (6 + 1)
#define DOUBLE_ROW_LEN (2*ROW_LEN)
int main(int argc, char *argv[])
{
int ret = 1;
// Set custom buffers for stdin and out
char stdout_buf[16384];
setvbuf(stdout, stdout_buf, _IOFBF, 16384);
char stdin_buf[16384];
setvbuf(stdin, stdin_buf, _IOFBF, 16384);
// Read stdin to buffer
char *buf = malloc(MAX_LEN);
if (!buf) {
printf("Failed to allocate buffer");
return 1;
}
if (!fgets(buf, MAX_LEN, stdin))
goto EXIT_A;
// Get the num tests
int m ;
scanf("%d\n", &m);
char *num_buf = malloc(DOUBLE_ROW_LEN);
if (!num_buf) {
printf("Failed to allocate num_buffer");
goto EXIT_A;
}
int *nn;
int *start = calloc(m, sizeof(int));
int *stop = calloc(m, sizeof(int));
int *staptr = start;
int *stpptr = stop;
char *cptr;
for(int i=0; i<m; i++) {
fgets(num_buf, DOUBLE_ROW_LEN, stdin);
nn = staptr++;
cptr = num_buf-1;
while(*(++cptr) > '\n') {
if (*cptr == ' ')
nn = stpptr++;
else
*nn = *nn*10 + *cptr-'0';
}
}
// Count for each test
char *buf_end = strchr(buf, '\0');
int len, shift;
char outbuf[ROW_LEN];
char *ptr_l, *ptr_r, *out;
for(int i=0; i<m; i++) {
ptr_l = buf + start[i];
ptr_r = buf + stop[i];
while(ptr_r < buf_end && *ptr_l == *ptr_r) {
++ptr_l;
++ptr_r;
}
// Print length of same sequence
shift = len = (int)(ptr_l - (buf + start[i]));
out = outbuf;
do {
out++;
shift /= 10;
} while (shift);
*out = '\0';
do {
*(--out) = "0123456789"[len%10];
len /= 10;
} while(len);
puts(outbuf);
}
ret = 0;
free(start);
free(stop);
EXIT_A:
free(buf);
return ret;
}
Thanks to your question, I went and solved the problem myself. Your time is better than mine, but I'm still using some stdio functions.
I simply do not think the high score of 0.05 seconds is bona fide. I suspect it's the product of a highly automated system that returned that result in error, and that no one ever verified it.
How to defend that assertion? There's no real algorithmic complexity: the problem is O(n). The "trick" is to write specialized parsers for each aspect of the input (and avoid work done only in debug mode). The total time for 22 trials is 50 milliseconds, meaning each trial averages 2.25 ms? We're down near the threshold of measurability.
Competitions like the problem you addressed yourself to are unfortunate, in a way. They reinforce the naive idea that performance is the ultimate measure of a program (there's no score for clarity). Worse, they encourage going around things like scanf "for performance" while, in real life, getting a program to run correctly and fast basically never entails avoiding or even tuning stdio. In a complex system, performance comes from things like avoiding I/O, passing over the data only once, and minimizing copies. Using the DBMS effectively is often key (as it were), but such things never show up in programming challenges.
Parsing and formatting numbers as text does take time, and in rare circumstances can be a bottleneck. But the answer is hardly ever to rewrite the parser. Rather, the answer is to parse the text into a convenient binary form, and use that. In short: compilation.
That said, a few observations may help.
You don't need dynamic memory for this problem, and it's not helping. The problem statement says the input array may be up to 100,000 elements, and the number of trials may be as many as 100,000. Each trial is two integer strings of up to 6 digits each separated by a space and terminated by a newline: 6 + 1 + 6 + 1 = 14. Total input, maximum is 100,000 + 1 + 6 + 1 + 100,000 * 14: under 16 KB. You are allowed 1 GB of memory.
I just allocated a single 16 KB buffer, and read it in all at once with read(2). Then I made a single pass over that input.
You got suggestions to use asynchronous I/O and threads. The problem statement says you're measured on CPU time, so neither of those help. The shortest distance between two points is a straight line; a single read into statically allocated memory wastes no motion.
One ridiculous aspect of the way they measure performance is that they use gcc -g. That means assert(3) is invoked in code that is measured for performance! I couldn't get under 4 seconds on test 22 until I removed the my asserts.
In sum, you did pretty well, and I suspect the winner you're baffled by is a phantom. Your code does faff about a bit, and you can dispense with dynamic memory and tuning stdio. I bet your time can be trimmed by simplifying it. To the extent that performance matters, that's where I'd direct your attention.
You should allocate all your buffers continuously.
Allocate a buffer which is the size of all your buffers (num_buff, start, stop) then rearrange the points to the corresponding offsets by their size.
This can reduce your cache miss \ page faults.
Since the read and the write operation seems to consume a lot of time you should consider adding threads. One thread should deal with I\O and another should deal with the computation. (It is worth checking if another thread for prints could speed things up as well). Make sure you don't use any locks while doing this.
Answering this question is tricky because optimization heavily depends on the problem you have.
One idea is to look at the content of the file you are trying to read and see if there patterns or things that you can use in your favor.
The code you wrote is a "general" solution for reading from a file, executing something and then writing to a file. But if you the file is not randomly generated each time and the content is always the same why not try to write a solution for that file?
On the other hand, you could try to use low-level system functions. One that comes to my thinking is mmap which allows you to map a file directly to memory and access that memory instead of using scanf and fgets.
Another thing I found that might help is in your solutin you are having two while loops, why not try and use only one? Another thing would be to do some Asynchronous I/O reading, so instead of reading the whole file in a loop, and then doing the calculation in another loop, you can try and read a portion at the beginning, start processing it async and continue reading.
This link might help for the async part

How to use /dev/random or urandom in C?

I want to use /dev/random or /dev/urandom in C. How can I do it? I don't know how can I handle them in C, if someone knows please tell me how. Thank you.
In general, it's a better idea to avoid opening files to get random data, because of how many points of failure there are in the procedure.
On recent Linux distributions, the getrandom system call can be used to get crypto-secure random numbers, and it cannot fail if GRND_RANDOM is not specified as a flag and the read amount is at most 256 bytes.
As of October 2017, OpenBSD, Darwin and Linux (with -lbsd) now all have an implementation of arc4random that is crypto-secure and that cannot fail. That makes it a very attractive option:
char myRandomData[50];
arc4random_buf(myRandomData, sizeof myRandomData); // done!
Otherwise, you can use the random devices as if they were files. You read from them and you get random data. I'm using open/read here, but fopen/fread would work just as well.
int randomData = open("/dev/urandom", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
ssize_t result = read(randomData, myRandomData, sizeof myRandomData);
if (result < 0)
{
// something went wrong
}
}
You may read many more random bytes before closing the file descriptor. /dev/urandom never blocks and always fills in as many bytes as you've requested, unless the system call is interrupted by a signal. It is considered cryptographically secure and should be your go-to random device.
/dev/random is more finicky. On most platforms, it can return fewer bytes than you've asked for and it can block if not enough bytes are available. This makes the error handling story more complex:
int randomData = open("/dev/random", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomData)
{
ssize_t result = read(randomData, myRandomData + randomDataLen, (sizeof myRandomData) - randomDataLen);
if (result < 0)
{
// something went wrong
}
randomDataLen += result;
}
close(randomData);
}
There are other accurate answers above. I needed to use a FILE* stream, though. Here's what I did...
int byte_count = 64;
char data[64];
FILE *fp;
fp = fopen("/dev/urandom", "r");
fread(&data, 1, byte_count, fp);
fclose(fp);
Just open the file for reading and then read data. In C++11 you may wish to use std::random_device which provides cross-platform access to such devices.
Zneak is 100% correct. Its also very common to read a buffer of random numbers that is slightly larger than what you'll need on startup. You can then populate an array in memory, or write them to your own file for later re-use.
A typical implementation of the above:
typedef struct prandom {
struct prandom *prev;
int64_t number;
struct prandom *next;
} prandom_t;
This becomes more or less like a tape that just advances which can be magically replenished by another thread as needed. There are a lot of services that provide large file dumps of nothing but random numbers that are generated with much stronger generators such as:
Radioactive decay
Optical behavior (photons hitting a semi transparent mirror)
Atmospheric noise (not as strong as the above)
Farms of intoxicated monkeys typing on keyboards and moving mice (kidding)
Don't use 'pre-packaged' entropy for cryptographic seeds, in case that doesn't go without saying. Those sets are fine for simulations, not fine at all for generating keys and such.
Not being concerned with quality, if you need a lot of numbers for something like a monte carlo simulation, it's much better to have them available in a way that will not cause read() to block.
However, remember, the randomness of a number is as deterministic as the complexity involved in generating it. /dev/random and /dev/urandom are convenient, but not as strong as using a HRNG (or downloading a large dump from a HRNG). Also worth noting that /dev/random refills via entropy, so it can block for quite a while depending on circumstances.
zneak's answer covers it simply, however the reality is more complicated than that. For example, you need to consider whether /dev/{u}random really is the random number device in the first place. Such a scenario may occur if your machine has been compromised and the devices replaced with symlinks to /dev/zero or a sparse file. If this happens, the random stream is now completely predictable.
The simplest way (at least on Linux and FreeBSD) is to perform an ioctl call on the device that will only succeed if the device is a random generator:
int data;
int result = ioctl(fd, RNDGETENTCNT, &data);
// Upon success data now contains amount of entropy available in bits
If this is performed before the first read of the random device, then there's a fair bet that you've got the random device. So #zneak's answer can better be extended to be:
int randomData = open("/dev/random", O_RDONLY);
int entropy;
int result = ioctl(randomData, RNDGETENTCNT, &entropy);
if (!result) {
// Error - /dev/random isn't actually a random device
return;
}
if (entropy < sizeof(int) * 8) {
// Error - there's not enough bits of entropy in the random device to fill the buffer
return;
}
int myRandomInteger;
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomInteger)
{
ssize_t result = read(randomData, ((char*)&myRandomInteger) + randomDataLen, (sizeof myRandomInteger) - randomDataLen);
if (result < 0)
{
// error, unable to read /dev/random
}
randomDataLen += result;
}
close(randomData);
The Insane Coding blog covered this, and other pitfalls not so long ago; I strongly recommend reading the entire article. I have to give credit to their where this solution was pulled from.
Edited to add (2014-07-25)...
Co-incidentally, I read last night that as part of the LibReSSL effort, Linux appears to be getting a GetRandom() syscall. As at time of writing, there's no word of when it will be available in a kernel general release. However this would be the preferred interface to get cryptographically secure random data as it removes all pitfalls that access via files provides. See also the LibReSSL possible implementation.

Resources