Fast I/O in c, stdin/out

Fast I/O in c, stdin/out - c

In a coding competition specified at this link there is a task where you need to read much data on stdin, do some calculations and present a whole lot of data on stdout.
In my benchmarking it is almost only i/o that takes time although I have tried optimizing it as much as possible.
What you have as input is a string (1 <= len <= 100'000) and q rows of pair of int where q also is 1 <= q <= 100'000.
I benchmarked my code on a 100 times larger dataset (len = 10M, q = 10M) and this is the result:
Activity time accumulated
Read text: 0.004 0.004
Read numbers: 0.146 0.150
Parse numbers: 0.200 0.350
Calc answers: 0.001 0.351
Format output: 0.037 0.388
Print output: 0.143 0.531
By implementing my own formating and number parsing inline i managed to get the time down to 1/3 of the time when using printf and scanf.
However when I uploaded my solution to the competitions webpage my solution took 1.88 seconds (I think that is the total time over 22 datasets). When I look in the high-score there are several implementations (in c++) that finished in 0.05 seconds, nearly 40 times faster than mine! How is that possible?
I guess that I could speed it up a bit by using 2 threads, then I can start calculating and writing to stdout while still reading from stdin. This will however decrease the time to min(0.150, 0.143) in a theoretical best case on my large dataset. I'm still nowhere close to the highscore..
In the image below you can see the statistics of the consumed time.
The program gets compiled by the website with this options:
gcc -g -O2 -std=gnu99 -static my_file.c -lm
and timed like this:
time ./a.out < sample.in > sample.out
My code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN (100000 + 1)
#define ROW_LEN (6 + 1)
#define DOUBLE_ROW_LEN (2*ROW_LEN)
int main(int argc, char *argv[])
{
int ret = 1;
// Set custom buffers for stdin and out
char stdout_buf[16384];
setvbuf(stdout, stdout_buf, _IOFBF, 16384);
char stdin_buf[16384];
setvbuf(stdin, stdin_buf, _IOFBF, 16384);
// Read stdin to buffer
char *buf = malloc(MAX_LEN);
if (!buf) {
printf("Failed to allocate buffer");
return 1;
}
if (!fgets(buf, MAX_LEN, stdin))
goto EXIT_A;
// Get the num tests
int m ;
scanf("%d\n", &m);
char *num_buf = malloc(DOUBLE_ROW_LEN);
if (!num_buf) {
printf("Failed to allocate num_buffer");
goto EXIT_A;
}
int *nn;
int *start = calloc(m, sizeof(int));
int *stop = calloc(m, sizeof(int));
int *staptr = start;
int *stpptr = stop;
char *cptr;
for(int i=0; i<m; i++) {
fgets(num_buf, DOUBLE_ROW_LEN, stdin);
nn = staptr++;
cptr = num_buf-1;
while(*(++cptr) > '\n') {
if (*cptr == ' ')
nn = stpptr++;
else
*nn = *nn*10 + *cptr-'0';
}
}
// Count for each test
char *buf_end = strchr(buf, '\0');
int len, shift;
char outbuf[ROW_LEN];
char *ptr_l, *ptr_r, *out;
for(int i=0; i<m; i++) {
ptr_l = buf + start[i];
ptr_r = buf + stop[i];
while(ptr_r < buf_end && *ptr_l == *ptr_r) {
++ptr_l;
++ptr_r;
}
// Print length of same sequence
shift = len = (int)(ptr_l - (buf + start[i]));
out = outbuf;
do {
out++;
shift /= 10;
} while (shift);
*out = '\0';
do {
*(--out) = "0123456789"[len%10];
len /= 10;
} while(len);
puts(outbuf);
}
ret = 0;
free(start);
free(stop);
EXIT_A:
free(buf);
return ret;
}

Thanks to your question, I went and solved the problem myself. Your time is better than mine, but I'm still using some stdio functions.
I simply do not think the high score of 0.05 seconds is bona fide. I suspect it's the product of a highly automated system that returned that result in error, and that no one ever verified it.
How to defend that assertion? There's no real algorithmic complexity: the problem is O(n). The "trick" is to write specialized parsers for each aspect of the input (and avoid work done only in debug mode). The total time for 22 trials is 50 milliseconds, meaning each trial averages 2.25 ms? We're down near the threshold of measurability.
Competitions like the problem you addressed yourself to are unfortunate, in a way. They reinforce the naive idea that performance is the ultimate measure of a program (there's no score for clarity). Worse, they encourage going around things like scanf "for performance" while, in real life, getting a program to run correctly and fast basically never entails avoiding or even tuning stdio. In a complex system, performance comes from things like avoiding I/O, passing over the data only once, and minimizing copies. Using the DBMS effectively is often key (as it were), but such things never show up in programming challenges.
Parsing and formatting numbers as text does take time, and in rare circumstances can be a bottleneck. But the answer is hardly ever to rewrite the parser. Rather, the answer is to parse the text into a convenient binary form, and use that. In short: compilation.
That said, a few observations may help.
You don't need dynamic memory for this problem, and it's not helping. The problem statement says the input array may be up to 100,000 elements, and the number of trials may be as many as 100,000. Each trial is two integer strings of up to 6 digits each separated by a space and terminated by a newline: 6 + 1 + 6 + 1 = 14. Total input, maximum is 100,000 + 1 + 6 + 1 + 100,000 * 14: under 16 KB. You are allowed 1 GB of memory.
I just allocated a single 16 KB buffer, and read it in all at once with read(2). Then I made a single pass over that input.
You got suggestions to use asynchronous I/O and threads. The problem statement says you're measured on CPU time, so neither of those help. The shortest distance between two points is a straight line; a single read into statically allocated memory wastes no motion.
One ridiculous aspect of the way they measure performance is that they use gcc -g. That means assert(3) is invoked in code that is measured for performance! I couldn't get under 4 seconds on test 22 until I removed the my asserts.
In sum, you did pretty well, and I suspect the winner you're baffled by is a phantom. Your code does faff about a bit, and you can dispense with dynamic memory and tuning stdio. I bet your time can be trimmed by simplifying it. To the extent that performance matters, that's where I'd direct your attention.

You should allocate all your buffers continuously.
Allocate a buffer which is the size of all your buffers (num_buff, start, stop) then rearrange the points to the corresponding offsets by their size.
This can reduce your cache miss \ page faults.
Since the read and the write operation seems to consume a lot of time you should consider adding threads. One thread should deal with I\O and another should deal with the computation. (It is worth checking if another thread for prints could speed things up as well). Make sure you don't use any locks while doing this.

Answering this question is tricky because optimization heavily depends on the problem you have.
One idea is to look at the content of the file you are trying to read and see if there patterns or things that you can use in your favor.
The code you wrote is a "general" solution for reading from a file, executing something and then writing to a file. But if you the file is not randomly generated each time and the content is always the same why not try to write a solution for that file?
On the other hand, you could try to use low-level system functions. One that comes to my thinking is mmap which allows you to map a file directly to memory and access that memory instead of using scanf and fgets.
Another thing I found that might help is in your solutin you are having two while loops, why not try and use only one? Another thing would be to do some Asynchronous I/O reading, so instead of reading the whole file in a loop, and then doing the calculation in another loop, you can try and read a portion at the beginning, start processing it async and continue reading.
This link might help for the async part

Related

How to make reading the same file at the same time in multiple processes slow down the reading?

I'm doing some research work on Ubuntu 15.10 x64. I want to study if there's a way to make 2 or more processes reading a text file simultaneously slow down each other's reading.
For example, two processes P1 and P2. A text file /etc/example.txt. It has 1KB data.
P1's pseudo code:
for (int i = 0; i < 1000000; i ++) {
str = read_file ('/etc/example.txt', 'r');
print(str);
}
P2's pseudo code:
for (int i = 0; i < 100; i ++) {
str = read_file ('/etc/example.txt', 'r');
print(str);
}
time = get_the_whole_run_time();
print(time / 100);
Condition 1:
P1 is running. P2 is used to "race" with P1 and it calculates the average reading time TIME_1.
Condition 2:
P1 is NOT running. Only run P2 and it calculates the average reading time TIME_2.
My goal is to make TIME_1 significantly higher than TIME_2 (this is for research purpose). But my experiments don't work out that way. TIME_1 is nearly the same as TIME_2.
I know there maybe are some things like file system cache that affects the result. I used the command: echo 3 > /proc/sys/vm/drop_caches to clear the cache. But it doesn't work.
Any ideas? Thanks!

Use huge files to experiment.
Beware that if P1 and P2 run simultaneously, the average time may be less than with a single process; because one of the process may benefits of the fresh cache the other has set up just before, thus has no need to wait for physical I/O for example. Your experiment is very difficult to set up, as there is many many variables in it and many system-internal mechanisms that have non orthogonal effects; and surprising results may appear.

Why is my Memory dumping soo slow?

The idea behind this program is to simply access the ram and download the data from it to a txt file.
Later Ill convert the txt file to jpeg and hopefully it will be readable .
However when I try and read from the RAM using NEW[] it takes waaaaaay to long to actually copy all the values into the file?
Isnt it suppose to be really fast? I mean I save pictures everyday and it doesn't even take a second?
Is there some other method I can use to dump memory to a file?
#include <stdio.h>
#include <stdlib.h>
#include <hw/pci.h>
#include <hw/inout.h>
#include <sys/mman.h>
main()
{
FILE *fp;
fp = fopen ("test.txt","w+d");
int NumberOfPciCards = 3;
struct pci_dev_info info[NumberOfPciCards];
void *PciDeviceHandler1,*PciDeviceHandler2,*PciDeviceHandler3;
uint32_t *Buffer;
int *BusNumb; //int Buffer;
uint32_t counter =0;
int i;
int r;
int y;
volatile uint32_t *NEW,*NEW2;
uintptr_t iobase;
volatile uint32_t *regbase;
NEW = (uint32_t *)malloc(sizeof(uint32_t));
NEW2 = (uint32_t *)malloc(sizeof(uint32_t));
Buffer = (uint32_t *)malloc(sizeof(uint32_t));
BusNumb = (int*)malloc(sizeof(int));
printf ("\n 1");
for (r=0;r<NumberOfPciCards;r++)
{
memset(&info[r], 0, sizeof(info[r]));
}
printf ("\n 2");
//Here the attach takes place.
for (r=0;r<NumberOfPciCards;r++)
{
(pci_attach(r) < 0) ? FuncPrint(1,r) : FuncPrint(0,r);
}
printf ("\n 3");
info[0].VendorId = 0x8086; //Wont be using this one
info[0].DeviceId = 0x3582; //Or this one
info[1].VendorId = 0x10B5; //WIll only be using this one PLX 9054 chip
info[1].DeviceId = 0x9054; //Also PLX 9054
info[2].VendorId = 0x8086; //Not used
info[2].DeviceId = 0x24cb; //Not used
printf ("\n 4");
//I attached the device and give it a handler and set some setting.
if ((PciDeviceHandler1 = pci_attach_device(0,PCI_SHARE|PCI_INIT_ALL, 0, &info[1])) == 0)
{
perror("pci_attach_device fail");
exit(EXIT_FAILURE);
}
for (i = 0; i < 6; i++)
//This just prints out some details of the card.
{
if (info[1].BaseAddressSize[i] > 0)
printf("Aperture %d: "
"Base 0x%llx Length %d bytes Type %s\n", i,
PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? PCI_MEM_ADDR(info[1].CpuBaseAddress[i]) : PCI_IO_ADDR(info[1].CpuBaseAddress[i]),
info[1].BaseAddressSize[i],PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? "MEM" : "IO");
}
printf("\nEnd of Device random info dump---\n");
printf("\nNEWs Address : %d\n",*(int*)NEW);
//Not sure if this is a legitimate way of memory allocation but I cant see to read the ram any other way.
NEW = mmap_device_memory(NULL, info[1].BaseAddressSize[3],PROT_READ|PROT_WRITE|PROT_NOCACHE, 0,info[1].CpuBaseAddress[3]);
//Here is where things are starting to get messy and REALLY long to just run through all the ram and dump it.
//Is there some other way I can dump the data in the ram into a file?
while (counter!=info[1].BaseAddressSize[3])
{
fprintf(fp, "%x",NEW[counter]);
counter++;
}
fclose(fp);
printf("0x%x",*Buffer);
}

A few issues that I can see:
You are writing blocks of 4 bytes - that's quite inefficient. The stream buffering in the C library may help with that to a degree, but using larger blocks would still be more efficient.
Even worse, you are writing out the memory dump in hexadecimal notation, rather than the bytes themselves. That conversion is very CPU-intensive, not to mention that the size of the output is essentially doubled. You would be better off writing raw binary data using e.g. fwrite().
Depending on the specifics of your system (is this on QNX?), reading from I/O-mapped memory may be slower than reading directly from physical memory, especially if your PCI device has to act as a relay. What exactly is it that you are doing?
In any case I would suggest using a profiler to actually find out where your program is spending most of its time. Even a rudimentary system monitor would allow you to determine if your program is CPU-bound or I/O-bound.
As it is, "waaaaaay to long" is hardly a valid measurement. How much data is being copied? How long does it take? Where is the output file located?
P.S.: I also have some concerns w.r.t. what you are trying to do, but that is slightly off-topic for this question...

For fastest speed: write the data in binary form and use the open() / write() / close() API-s. Since your data is already available in a contiguous block of (virtual) memory it is a waste to copy it to a temporary buffer (used by the fwrite(), fprintf(), etc. API-s).
The code using write() will be similar to:
int fd = open("filename.bin", O_RDWR|O_CREAT, S_IRWXU);
write(fd, (void*)NEW, 4*info[1].BaseAddressSize[3]);
close(fd);
You will need to add error handling and make sure that the buffer size is specified correctly.
To reiterate, you get the speed-up from:
avoiding the conversion from binary to ASCII (as pointed out by others above)
avoiding many calls to libc
reducing the number of system-calls (from inside libc)
eliminating the overhead of copying data to a temporary buffer inside the fwrite()/fprintf() and related functions (buffering would be useful if your data arrived in small chunks, including the case of converting to ASCII in 4 byte units)
I intentionally ignore commenting on other parts of your code as it is apparently not intended to be production quality yet and your question is focused on how to speed up writing data to a file.

Parallelize while loop with OpenMP

I have a very large data file, and each record in this data file has 4 lines. I have written a very simple C program to analyze files of this type and print out some useful information. The basic idea of the program is this.
int main()
{
char buffer[BUFFER_SIZE];
while(fgets(buffer, BUFFER_SIZE, stdin))
{
fgets(buffer, BUFFER_SIZE, stdin);
do_some_simple_processing_on_the_second_line_of_the_record(buffer);
fgets(buffer, BUFFER_SIZE, stdin);
fgets(buffer, BUFFER_SIZE, stdin);
}
print_out_result();
}
This of course leaves out some details (sanity/error checking, etc), but that is not relevant to the question.
The program works fine, but the data files I'm working with are huge. I figured I would try to speed up the program by parallelizing the loop with OpenMP. After a bit of searching, though, it appears that OpenMP can only handle for loops where the number of iterations is know beforehand. Since I don't know the size of the files beforehand, and even simple commands like wc -l take a long time to run, how can I parallelize this program?

As thiton mentioned, this code could be I/O bounded. However, these days many computers may have SSDs and high-throughput RAID disks. In such case, you can get speedup from parallelization. Moreover, if the computation is not trivial, then parallelize wins. Even if the I/O is effectively serialized due to saturated bandwidth, you can still get speedup by distributing the computation to multicore.
Back to the question itself, you can parallelize this loop by OpenMP. With stdin, I have no idea to parallelize because it needs to read sequentially and no prior information of the end. However, if you're working a typical file, you can do it.
Here is my code with omp parallel. I used some Win32 API and MSVC CRT:
void test_io2()
{
const static int BUFFER_SIZE = 1024;
const static int CONCURRENCY = 4;
uint64_t local_checksums[CONCURRENCY];
uint64_t local_reads[CONCURRENCY];
DWORD start = GetTickCount();
omp_set_num_threads(CONCURRENCY);
#pragma omp parallel
{
int tid = omp_get_thread_num();
FILE* file = fopen("huge_file.dat", "rb");
_fseeki64(file, 0, SEEK_END);
uint64_t total_size = _ftelli64(file);
uint64_t my_start_pos = total_size/CONCURRENCY * tid;
uint64_t my_end_pos = min((total_size/CONCURRENCY * (tid + 1)), total_size);
uint64_t my_read_size = my_end_pos - my_start_pos;
_fseeki64(file, my_start_pos, SEEK_SET);
char* buffer = new char[BUFFER_SIZE];
uint64_t local_checksum = 0;
uint64_t local_read = 0;
size_t read_bytes;
while ((read_bytes = fread(buffer, 1, min(my_read_size, BUFFER_SIZE), file)) != 0 &&
my_read_size != 0)
{
local_read += read_bytes;
my_read_size -= read_bytes;
for (int i = 0; i < read_bytes; ++i)
local_checksum += (buffer[i]);
}
local_checksums[tid] = local_checksum;
local_reads[tid] = local_read;
fclose(file);
}
uint64_t checksum = 0;
uint64_t total_read = 0;
for (int i = 0; i < CONCURRENCY; ++i)
checksum += local_checksums[i], total_read += local_reads[i];
std::cout << checksum << std::endl
<< total_read << std::endl
<< double(GetTickCount() - start)/1000. << std::endl;
}
This code looks a bit dirty because I needed to precisely distribute the amount of the file to be read. However, the code is fairly straightforward. One thing keep in mind is that you need to have a per-thread file pointer. You can't simply share a file pointer because the internal data structure may not be thread-safe. Also, this code can be parallelized by parallel for. But, I think this approach is more natural.
Simple experimental results
I have tested this code to read a 10GB file on either a HDD (WD Green 2TB) and a SSD (Intel 120GB).
With a HDD, yes, no speedups were obtained. Even slowdown was observed. This clearly shows that this code is I/O bounded. This code virtually has no computation. Just I/O.
However, with a SSD, I had a speedup of 1.2 with 4 cores. Yes, the speedup is small. But, you still can get it with SSD. And, if the computation becomes a bit more (I just put a very short busy-waiting loop), speedups would be significant. I was able to get speedup of 2.5.
In sum, I'd like to recommend that you try to parallelize this code.
Also, if the computation is not trivial, I would recommend pipelining. The above code simply divides into several big chunks, resulting in poor cache efficiency. However, pipeline parallelization may yield better cache utilization. Try to use TBB for pipeline parallelization. They provide a simple pipeline construct.

Have you checked that your process is actually CPU-bound and not I/O-bound? Your code looks very much like I/O-bound code, which would gain nothing from parallelization.

In response to "minding", I don't think your code actually optimize anything here. There are a lot of common misunderstanding about this statement "#pragma omp parallel", this one would actually just spawn the threads, without the "for" key word, all the threads will just execute whatever codes that are following. So your code would actually be duplicating the computation on each thread. In response to Daniel, you were right, OpenMP can't optimize while loop, the only way to optimize it is by restructuring the code so that iteration is known in advance (such as while loop it once with a counter). Sorry about posting another answer, as I can't comment yet, but hopefully, this clears out the common misunderstandings.

fopen Segfault error on large files

Hello everyone I'm new to C but I've recently been getting a weird segfault error with my fopen.
FILE* thefile = fopen(argv[1],"r");
The problem I've been having is that this code works on other smaller text files, but when I try with a file around 400MB it will give a sefault error. I've even tried hardcoding the filename but that doesn't work either. Could there be a problem in the rest of the code causing the segfault on this line?(doubt it but would like to know if its possible. It's just really odd that no errors come up for a small text file, but a large text file does get errors.
Thanks!
EDIT* didn't want to bog this down with too much but heres my code
int main(int argc, char *argv[])
{
if(argc != 3)
{
printf("[ERROR] Invalid number of arguments. Please pass 2 arguments, input_bound_file (column 1:probe, columne 2,...: samples) and desired_output_file_name");
exit(2);
}
int i,j;
rankAvg= g_hash_table_new(g_direct_hash, g_direct_equal);
rankCnt= g_hash_table_new(g_direct_hash, g_direct_equal);
table = g_hash_table_new_full (g_direct_hash, g_direct_equal, NULL, g_free);
getCounts(argv[1]);
printf("NC=: %i nR =: %i",nC,nR);
double srcMat[nR][nC];
int rankMat[nR][nC];
double normMat[nR][nC];
int sorts[nR][nC];
char line[100];
FILE* thefile = fopen(argv[1],"r");
printf("%s\n", strerror(errno));
FILE* output = fopen(argv[2],"w");
char* rownames[100];
i=0;j = 1;
int processedProbeNumber = 0;
int previousStamp = 0;
fgets(line,sizeof(line),thefile); //read file
while(fgets(line,sizeof(line),thefile) != NULL)
{
cleanSpace(line); //creates only one space between entries
char dest[100];
int len = strlen(line);
for(i = 0; i < len; i++)
{
if(line[i] == ' ') //read in rownames
{
rownames[j] = strncpy(dest, line, i);
dest[i] = '\0';
break;
}
}
char* token = strtok(line, " ");
token = strtok(NULL, " ");
i=1;
while(token!=NULL) //put words into array
{
rankMat[j][i]= abs(atof(token));
srcMat[j][i] = abs(atof(token));
token = strtok(NULL, " ");
i++;
}
// set the first column as a row id
j++;
processedProbeNumber++;
if( (processedProbeNumber-previousStamp) >= 10000)
{
previousStamp = processedProbeNumber;
printf("\tnumber of loaded lines = %i",processedProbeNumber);
}
}
printf("\ttotal number of loaded lines = %i \n",processedProbeNumber);
fclose(thefile);

How do you know that fopen is seg faulting? If you're simply sprinkling printf in the code, there's a chance the standard output isn't sent to the console before the error occurs. Obviously, if you're using a debugger you will know exactly where the segfault occured.
Looking at your code, nR and nC aren't defined so I don't know how big rankMat and srcMat are, but two thoughts crossed my mind while looking at your code:
You don't check i and j to ensure that they don't exceed nR and nC
If nR and nC are sufficiently large, that may mean you're using a very large amount of memory on the stack (srcMat, rankMat, normMat, and sorts are all huge). I don't know what environemnt you're running in, but some systems my not be able to handle huge stacks (Linux, Windows, etc. should be fine, but I do a lot of embedded work). I normally allocate very large structures in the heap (using malloc).

Generally files 2GB (2**31) or larger are the ones you can expect to get this on. This is because you are starting to run out of space in a 32-bit integer for things like file indices, and one bit is typically taken up for directions in relative offsets.
Supposedly on Linux you can get around this issue by using the following macro defintion:
#define _FILE_OFFSET_BITS 64
Some systems also provide a separate API call for large file opens (eg: fopen64() in MKS).

400Mb should not be considered a "large file" nowadays. I would reserve this for files larger than, say, 2Gb.
Also, just opening a file is very unlikely to give a segfault. WOuld you show us the code that access the file? I suspect some other factor is at play here.
UPDATE
I still can't tell exactly what's happening here. There are strange things that could be legitimate: you discard the first line and also the first token of each line.
You also assign to all the rownames[j] (except the first one) the address of dest which is a variable that has a block scope and whose associated memory is most likely to be reused outside that block. I hope you don't rely on rownames[j] to be any meaningful (but then why you have them?) and you never try to access them.
C99 allows you to mix variable declarations with actual instructions but I would suggest a little bit of cleaning to make the code clearer (also a better indentation would help).
From the symptoms I would look for some memory corruption somewhere. On small files (and hence less tokens) it may go unnoticed, but with larger files (and many more token) it fires a segfault.

How to use /dev/random or urandom in C?

I want to use /dev/random or /dev/urandom in C. How can I do it? I don't know how can I handle them in C, if someone knows please tell me how. Thank you.

In general, it's a better idea to avoid opening files to get random data, because of how many points of failure there are in the procedure.
On recent Linux distributions, the getrandom system call can be used to get crypto-secure random numbers, and it cannot fail if GRND_RANDOM is not specified as a flag and the read amount is at most 256 bytes.
As of October 2017, OpenBSD, Darwin and Linux (with -lbsd) now all have an implementation of arc4random that is crypto-secure and that cannot fail. That makes it a very attractive option:
char myRandomData[50];
arc4random_buf(myRandomData, sizeof myRandomData); // done!
Otherwise, you can use the random devices as if they were files. You read from them and you get random data. I'm using open/read here, but fopen/fread would work just as well.
int randomData = open("/dev/urandom", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
ssize_t result = read(randomData, myRandomData, sizeof myRandomData);
if (result < 0)
{
// something went wrong
}
}
You may read many more random bytes before closing the file descriptor. /dev/urandom never blocks and always fills in as many bytes as you've requested, unless the system call is interrupted by a signal. It is considered cryptographically secure and should be your go-to random device.
/dev/random is more finicky. On most platforms, it can return fewer bytes than you've asked for and it can block if not enough bytes are available. This makes the error handling story more complex:
int randomData = open("/dev/random", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomData)
{
ssize_t result = read(randomData, myRandomData + randomDataLen, (sizeof myRandomData) - randomDataLen);
if (result < 0)
{
// something went wrong
}
randomDataLen += result;
}
close(randomData);
}

There are other accurate answers above. I needed to use a FILE* stream, though. Here's what I did...
int byte_count = 64;
char data[64];
FILE *fp;
fp = fopen("/dev/urandom", "r");
fread(&data, 1, byte_count, fp);
fclose(fp);

Just open the file for reading and then read data. In C++11 you may wish to use std::random_device which provides cross-platform access to such devices.

Zneak is 100% correct. Its also very common to read a buffer of random numbers that is slightly larger than what you'll need on startup. You can then populate an array in memory, or write them to your own file for later re-use.
A typical implementation of the above:
typedef struct prandom {
struct prandom *prev;
int64_t number;
struct prandom *next;
} prandom_t;
This becomes more or less like a tape that just advances which can be magically replenished by another thread as needed. There are a lot of services that provide large file dumps of nothing but random numbers that are generated with much stronger generators such as:
Radioactive decay
Optical behavior (photons hitting a semi transparent mirror)
Atmospheric noise (not as strong as the above)
Farms of intoxicated monkeys typing on keyboards and moving mice (kidding)
Don't use 'pre-packaged' entropy for cryptographic seeds, in case that doesn't go without saying. Those sets are fine for simulations, not fine at all for generating keys and such.
Not being concerned with quality, if you need a lot of numbers for something like a monte carlo simulation, it's much better to have them available in a way that will not cause read() to block.
However, remember, the randomness of a number is as deterministic as the complexity involved in generating it. /dev/random and /dev/urandom are convenient, but not as strong as using a HRNG (or downloading a large dump from a HRNG). Also worth noting that /dev/random refills via entropy, so it can block for quite a while depending on circumstances.

zneak's answer covers it simply, however the reality is more complicated than that. For example, you need to consider whether /dev/{u}random really is the random number device in the first place. Such a scenario may occur if your machine has been compromised and the devices replaced with symlinks to /dev/zero or a sparse file. If this happens, the random stream is now completely predictable.
The simplest way (at least on Linux and FreeBSD) is to perform an ioctl call on the device that will only succeed if the device is a random generator:
int data;
int result = ioctl(fd, RNDGETENTCNT, &data);
// Upon success data now contains amount of entropy available in bits
If this is performed before the first read of the random device, then there's a fair bet that you've got the random device. So #zneak's answer can better be extended to be:
int randomData = open("/dev/random", O_RDONLY);
int entropy;
int result = ioctl(randomData, RNDGETENTCNT, &entropy);
if (!result) {
// Error - /dev/random isn't actually a random device
return;
}
if (entropy < sizeof(int) * 8) {
// Error - there's not enough bits of entropy in the random device to fill the buffer
return;
}
int myRandomInteger;
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomInteger)
{
ssize_t result = read(randomData, ((char*)&myRandomInteger) + randomDataLen, (sizeof myRandomInteger) - randomDataLen);
if (result < 0)
{
// error, unable to read /dev/random
}
randomDataLen += result;
}
close(randomData);
The Insane Coding blog covered this, and other pitfalls not so long ago; I strongly recommend reading the entire article. I have to give credit to their where this solution was pulled from.
Edited to add (2014-07-25)...
Co-incidentally, I read last night that as part of the LibReSSL effort, Linux appears to be getting a GetRandom() syscall. As at time of writing, there's no word of when it will be available in a kernel general release. However this would be the preferred interface to get cryptographically secure random data as it removes all pitfalls that access via files provides. See also the LibReSSL possible implementation.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight