How to use /dev/random or urandom in C?

How to use /dev/random or urandom in C? - c

I want to use /dev/random or /dev/urandom in C. How can I do it? I don't know how can I handle them in C, if someone knows please tell me how. Thank you.

In general, it's a better idea to avoid opening files to get random data, because of how many points of failure there are in the procedure.
On recent Linux distributions, the getrandom system call can be used to get crypto-secure random numbers, and it cannot fail if GRND_RANDOM is not specified as a flag and the read amount is at most 256 bytes.
As of October 2017, OpenBSD, Darwin and Linux (with -lbsd) now all have an implementation of arc4random that is crypto-secure and that cannot fail. That makes it a very attractive option:
char myRandomData[50];
arc4random_buf(myRandomData, sizeof myRandomData); // done!
Otherwise, you can use the random devices as if they were files. You read from them and you get random data. I'm using open/read here, but fopen/fread would work just as well.
int randomData = open("/dev/urandom", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
ssize_t result = read(randomData, myRandomData, sizeof myRandomData);
if (result < 0)
{
// something went wrong
}
}
You may read many more random bytes before closing the file descriptor. /dev/urandom never blocks and always fills in as many bytes as you've requested, unless the system call is interrupted by a signal. It is considered cryptographically secure and should be your go-to random device.
/dev/random is more finicky. On most platforms, it can return fewer bytes than you've asked for and it can block if not enough bytes are available. This makes the error handling story more complex:
int randomData = open("/dev/random", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomData)
{
ssize_t result = read(randomData, myRandomData + randomDataLen, (sizeof myRandomData) - randomDataLen);
if (result < 0)
{
// something went wrong
}
randomDataLen += result;
}
close(randomData);
}

There are other accurate answers above. I needed to use a FILE* stream, though. Here's what I did...
int byte_count = 64;
char data[64];
FILE *fp;
fp = fopen("/dev/urandom", "r");
fread(&data, 1, byte_count, fp);
fclose(fp);

Just open the file for reading and then read data. In C++11 you may wish to use std::random_device which provides cross-platform access to such devices.

Zneak is 100% correct. Its also very common to read a buffer of random numbers that is slightly larger than what you'll need on startup. You can then populate an array in memory, or write them to your own file for later re-use.
A typical implementation of the above:
typedef struct prandom {
struct prandom *prev;
int64_t number;
struct prandom *next;
} prandom_t;
This becomes more or less like a tape that just advances which can be magically replenished by another thread as needed. There are a lot of services that provide large file dumps of nothing but random numbers that are generated with much stronger generators such as:
Radioactive decay
Optical behavior (photons hitting a semi transparent mirror)
Atmospheric noise (not as strong as the above)
Farms of intoxicated monkeys typing on keyboards and moving mice (kidding)
Don't use 'pre-packaged' entropy for cryptographic seeds, in case that doesn't go without saying. Those sets are fine for simulations, not fine at all for generating keys and such.
Not being concerned with quality, if you need a lot of numbers for something like a monte carlo simulation, it's much better to have them available in a way that will not cause read() to block.
However, remember, the randomness of a number is as deterministic as the complexity involved in generating it. /dev/random and /dev/urandom are convenient, but not as strong as using a HRNG (or downloading a large dump from a HRNG). Also worth noting that /dev/random refills via entropy, so it can block for quite a while depending on circumstances.

zneak's answer covers it simply, however the reality is more complicated than that. For example, you need to consider whether /dev/{u}random really is the random number device in the first place. Such a scenario may occur if your machine has been compromised and the devices replaced with symlinks to /dev/zero or a sparse file. If this happens, the random stream is now completely predictable.
The simplest way (at least on Linux and FreeBSD) is to perform an ioctl call on the device that will only succeed if the device is a random generator:
int data;
int result = ioctl(fd, RNDGETENTCNT, &data);
// Upon success data now contains amount of entropy available in bits
If this is performed before the first read of the random device, then there's a fair bet that you've got the random device. So #zneak's answer can better be extended to be:
int randomData = open("/dev/random", O_RDONLY);
int entropy;
int result = ioctl(randomData, RNDGETENTCNT, &entropy);
if (!result) {
// Error - /dev/random isn't actually a random device
return;
}
if (entropy < sizeof(int) * 8) {
// Error - there's not enough bits of entropy in the random device to fill the buffer
return;
}
int myRandomInteger;
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomInteger)
{
ssize_t result = read(randomData, ((char*)&myRandomInteger) + randomDataLen, (sizeof myRandomInteger) - randomDataLen);
if (result < 0)
{
// error, unable to read /dev/random
}
randomDataLen += result;
}
close(randomData);
The Insane Coding blog covered this, and other pitfalls not so long ago; I strongly recommend reading the entire article. I have to give credit to their where this solution was pulled from.
Edited to add (2014-07-25)...
Co-incidentally, I read last night that as part of the LibReSSL effort, Linux appears to be getting a GetRandom() syscall. As at time of writing, there's no word of when it will be available in a kernel general release. However this would be the preferred interface to get cryptographically secure random data as it removes all pitfalls that access via files provides. See also the LibReSSL possible implementation.

Related

Fast I/O in c, stdin/out

In a coding competition specified at this link there is a task where you need to read much data on stdin, do some calculations and present a whole lot of data on stdout.
In my benchmarking it is almost only i/o that takes time although I have tried optimizing it as much as possible.
What you have as input is a string (1 <= len <= 100'000) and q rows of pair of int where q also is 1 <= q <= 100'000.
I benchmarked my code on a 100 times larger dataset (len = 10M, q = 10M) and this is the result:
Activity time accumulated
Read text: 0.004 0.004
Read numbers: 0.146 0.150
Parse numbers: 0.200 0.350
Calc answers: 0.001 0.351
Format output: 0.037 0.388
Print output: 0.143 0.531
By implementing my own formating and number parsing inline i managed to get the time down to 1/3 of the time when using printf and scanf.
However when I uploaded my solution to the competitions webpage my solution took 1.88 seconds (I think that is the total time over 22 datasets). When I look in the high-score there are several implementations (in c++) that finished in 0.05 seconds, nearly 40 times faster than mine! How is that possible?
I guess that I could speed it up a bit by using 2 threads, then I can start calculating and writing to stdout while still reading from stdin. This will however decrease the time to min(0.150, 0.143) in a theoretical best case on my large dataset. I'm still nowhere close to the highscore..
In the image below you can see the statistics of the consumed time.
The program gets compiled by the website with this options:
gcc -g -O2 -std=gnu99 -static my_file.c -lm
and timed like this:
time ./a.out < sample.in > sample.out
My code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN (100000 + 1)
#define ROW_LEN (6 + 1)
#define DOUBLE_ROW_LEN (2*ROW_LEN)
int main(int argc, char *argv[])
{
int ret = 1;
// Set custom buffers for stdin and out
char stdout_buf[16384];
setvbuf(stdout, stdout_buf, _IOFBF, 16384);
char stdin_buf[16384];
setvbuf(stdin, stdin_buf, _IOFBF, 16384);
// Read stdin to buffer
char *buf = malloc(MAX_LEN);
if (!buf) {
printf("Failed to allocate buffer");
return 1;
}
if (!fgets(buf, MAX_LEN, stdin))
goto EXIT_A;
// Get the num tests
int m ;
scanf("%d\n", &m);
char *num_buf = malloc(DOUBLE_ROW_LEN);
if (!num_buf) {
printf("Failed to allocate num_buffer");
goto EXIT_A;
}
int *nn;
int *start = calloc(m, sizeof(int));
int *stop = calloc(m, sizeof(int));
int *staptr = start;
int *stpptr = stop;
char *cptr;
for(int i=0; i<m; i++) {
fgets(num_buf, DOUBLE_ROW_LEN, stdin);
nn = staptr++;
cptr = num_buf-1;
while(*(++cptr) > '\n') {
if (*cptr == ' ')
nn = stpptr++;
else
*nn = *nn*10 + *cptr-'0';
}
}
// Count for each test
char *buf_end = strchr(buf, '\0');
int len, shift;
char outbuf[ROW_LEN];
char *ptr_l, *ptr_r, *out;
for(int i=0; i<m; i++) {
ptr_l = buf + start[i];
ptr_r = buf + stop[i];
while(ptr_r < buf_end && *ptr_l == *ptr_r) {
++ptr_l;
++ptr_r;
}
// Print length of same sequence
shift = len = (int)(ptr_l - (buf + start[i]));
out = outbuf;
do {
out++;
shift /= 10;
} while (shift);
*out = '\0';
do {
*(--out) = "0123456789"[len%10];
len /= 10;
} while(len);
puts(outbuf);
}
ret = 0;
free(start);
free(stop);
EXIT_A:
free(buf);
return ret;
}

Thanks to your question, I went and solved the problem myself. Your time is better than mine, but I'm still using some stdio functions.
I simply do not think the high score of 0.05 seconds is bona fide. I suspect it's the product of a highly automated system that returned that result in error, and that no one ever verified it.
How to defend that assertion? There's no real algorithmic complexity: the problem is O(n). The "trick" is to write specialized parsers for each aspect of the input (and avoid work done only in debug mode). The total time for 22 trials is 50 milliseconds, meaning each trial averages 2.25 ms? We're down near the threshold of measurability.
Competitions like the problem you addressed yourself to are unfortunate, in a way. They reinforce the naive idea that performance is the ultimate measure of a program (there's no score for clarity). Worse, they encourage going around things like scanf "for performance" while, in real life, getting a program to run correctly and fast basically never entails avoiding or even tuning stdio. In a complex system, performance comes from things like avoiding I/O, passing over the data only once, and minimizing copies. Using the DBMS effectively is often key (as it were), but such things never show up in programming challenges.
Parsing and formatting numbers as text does take time, and in rare circumstances can be a bottleneck. But the answer is hardly ever to rewrite the parser. Rather, the answer is to parse the text into a convenient binary form, and use that. In short: compilation.
That said, a few observations may help.
You don't need dynamic memory for this problem, and it's not helping. The problem statement says the input array may be up to 100,000 elements, and the number of trials may be as many as 100,000. Each trial is two integer strings of up to 6 digits each separated by a space and terminated by a newline: 6 + 1 + 6 + 1 = 14. Total input, maximum is 100,000 + 1 + 6 + 1 + 100,000 * 14: under 16 KB. You are allowed 1 GB of memory.
I just allocated a single 16 KB buffer, and read it in all at once with read(2). Then I made a single pass over that input.
You got suggestions to use asynchronous I/O and threads. The problem statement says you're measured on CPU time, so neither of those help. The shortest distance between two points is a straight line; a single read into statically allocated memory wastes no motion.
One ridiculous aspect of the way they measure performance is that they use gcc -g. That means assert(3) is invoked in code that is measured for performance! I couldn't get under 4 seconds on test 22 until I removed the my asserts.
In sum, you did pretty well, and I suspect the winner you're baffled by is a phantom. Your code does faff about a bit, and you can dispense with dynamic memory and tuning stdio. I bet your time can be trimmed by simplifying it. To the extent that performance matters, that's where I'd direct your attention.

You should allocate all your buffers continuously.
Allocate a buffer which is the size of all your buffers (num_buff, start, stop) then rearrange the points to the corresponding offsets by their size.
This can reduce your cache miss \ page faults.
Since the read and the write operation seems to consume a lot of time you should consider adding threads. One thread should deal with I\O and another should deal with the computation. (It is worth checking if another thread for prints could speed things up as well). Make sure you don't use any locks while doing this.

Answering this question is tricky because optimization heavily depends on the problem you have.
One idea is to look at the content of the file you are trying to read and see if there patterns or things that you can use in your favor.
The code you wrote is a "general" solution for reading from a file, executing something and then writing to a file. But if you the file is not randomly generated each time and the content is always the same why not try to write a solution for that file?
On the other hand, you could try to use low-level system functions. One that comes to my thinking is mmap which allows you to map a file directly to memory and access that memory instead of using scanf and fgets.
Another thing I found that might help is in your solutin you are having two while loops, why not try and use only one? Another thing would be to do some Asynchronous I/O reading, so instead of reading the whole file in a loop, and then doing the calculation in another loop, you can try and read a portion at the beginning, start processing it async and continue reading.
This link might help for the async part

Why is my Memory dumping soo slow?

The idea behind this program is to simply access the ram and download the data from it to a txt file.
Later Ill convert the txt file to jpeg and hopefully it will be readable .
However when I try and read from the RAM using NEW[] it takes waaaaaay to long to actually copy all the values into the file?
Isnt it suppose to be really fast? I mean I save pictures everyday and it doesn't even take a second?
Is there some other method I can use to dump memory to a file?
#include <stdio.h>
#include <stdlib.h>
#include <hw/pci.h>
#include <hw/inout.h>
#include <sys/mman.h>
main()
{
FILE *fp;
fp = fopen ("test.txt","w+d");
int NumberOfPciCards = 3;
struct pci_dev_info info[NumberOfPciCards];
void *PciDeviceHandler1,*PciDeviceHandler2,*PciDeviceHandler3;
uint32_t *Buffer;
int *BusNumb; //int Buffer;
uint32_t counter =0;
int i;
int r;
int y;
volatile uint32_t *NEW,*NEW2;
uintptr_t iobase;
volatile uint32_t *regbase;
NEW = (uint32_t *)malloc(sizeof(uint32_t));
NEW2 = (uint32_t *)malloc(sizeof(uint32_t));
Buffer = (uint32_t *)malloc(sizeof(uint32_t));
BusNumb = (int*)malloc(sizeof(int));
printf ("\n 1");
for (r=0;r<NumberOfPciCards;r++)
{
memset(&info[r], 0, sizeof(info[r]));
}
printf ("\n 2");
//Here the attach takes place.
for (r=0;r<NumberOfPciCards;r++)
{
(pci_attach(r) < 0) ? FuncPrint(1,r) : FuncPrint(0,r);
}
printf ("\n 3");
info[0].VendorId = 0x8086; //Wont be using this one
info[0].DeviceId = 0x3582; //Or this one
info[1].VendorId = 0x10B5; //WIll only be using this one PLX 9054 chip
info[1].DeviceId = 0x9054; //Also PLX 9054
info[2].VendorId = 0x8086; //Not used
info[2].DeviceId = 0x24cb; //Not used
printf ("\n 4");
//I attached the device and give it a handler and set some setting.
if ((PciDeviceHandler1 = pci_attach_device(0,PCI_SHARE|PCI_INIT_ALL, 0, &info[1])) == 0)
{
perror("pci_attach_device fail");
exit(EXIT_FAILURE);
}
for (i = 0; i < 6; i++)
//This just prints out some details of the card.
{
if (info[1].BaseAddressSize[i] > 0)
printf("Aperture %d: "
"Base 0x%llx Length %d bytes Type %s\n", i,
PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? PCI_MEM_ADDR(info[1].CpuBaseAddress[i]) : PCI_IO_ADDR(info[1].CpuBaseAddress[i]),
info[1].BaseAddressSize[i],PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? "MEM" : "IO");
}
printf("\nEnd of Device random info dump---\n");
printf("\nNEWs Address : %d\n",*(int*)NEW);
//Not sure if this is a legitimate way of memory allocation but I cant see to read the ram any other way.
NEW = mmap_device_memory(NULL, info[1].BaseAddressSize[3],PROT_READ|PROT_WRITE|PROT_NOCACHE, 0,info[1].CpuBaseAddress[3]);
//Here is where things are starting to get messy and REALLY long to just run through all the ram and dump it.
//Is there some other way I can dump the data in the ram into a file?
while (counter!=info[1].BaseAddressSize[3])
{
fprintf(fp, "%x",NEW[counter]);
counter++;
}
fclose(fp);
printf("0x%x",*Buffer);
}

A few issues that I can see:
You are writing blocks of 4 bytes - that's quite inefficient. The stream buffering in the C library may help with that to a degree, but using larger blocks would still be more efficient.
Even worse, you are writing out the memory dump in hexadecimal notation, rather than the bytes themselves. That conversion is very CPU-intensive, not to mention that the size of the output is essentially doubled. You would be better off writing raw binary data using e.g. fwrite().
Depending on the specifics of your system (is this on QNX?), reading from I/O-mapped memory may be slower than reading directly from physical memory, especially if your PCI device has to act as a relay. What exactly is it that you are doing?
In any case I would suggest using a profiler to actually find out where your program is spending most of its time. Even a rudimentary system monitor would allow you to determine if your program is CPU-bound or I/O-bound.
As it is, "waaaaaay to long" is hardly a valid measurement. How much data is being copied? How long does it take? Where is the output file located?
P.S.: I also have some concerns w.r.t. what you are trying to do, but that is slightly off-topic for this question...

For fastest speed: write the data in binary form and use the open() / write() / close() API-s. Since your data is already available in a contiguous block of (virtual) memory it is a waste to copy it to a temporary buffer (used by the fwrite(), fprintf(), etc. API-s).
The code using write() will be similar to:
int fd = open("filename.bin", O_RDWR|O_CREAT, S_IRWXU);
write(fd, (void*)NEW, 4*info[1].BaseAddressSize[3]);
close(fd);
You will need to add error handling and make sure that the buffer size is specified correctly.
To reiterate, you get the speed-up from:
avoiding the conversion from binary to ASCII (as pointed out by others above)
avoiding many calls to libc
reducing the number of system-calls (from inside libc)
eliminating the overhead of copying data to a temporary buffer inside the fwrite()/fprintf() and related functions (buffering would be useful if your data arrived in small chunks, including the case of converting to ASCII in 4 byte units)
I intentionally ignore commenting on other parts of your code as it is apparently not intended to be production quality yet and your question is focused on how to speed up writing data to a file.

File size lookup in C

I was wondering if there was any significant performance increase in using sys/stat.h versus fseek() and and ftell()?

Choosing between fstat() and the fseek()/ftell() combination, there isn't going to be much difference. The single function call should be slightly quicker than the double function call, but the difference won't be great.
Choosing between stat() and the combination isn't a very fair comparison. For the combination calls, the hard work was done when the file was opened, so the inode information is readily available. The stat() call has to parse the file path and then report what it finds. It should almost always be slower - unless you recently opened the file anyway so the kernel has most of the information cached. Even so, the pathname lookup required by stat() is likely to make it slower than the combination.

If you're not sure, try it!
I just coded this test. I generated 10,000 files of 2KB each, and iterated over all of them, asking for their file size.
Results on my machine by measuring with the "time" command and doing an average of 10 runs:
fseek/fclose version: 0.22 secs
stat version: 0.06 secs
So, the winner (at least on my machine): stat!
Here's the test code:
#include <stdio.h>
#include <sys/stat.h>
#if 0
size_t getFileSize(const char * filename)
{
struct stat st;
stat(filename, &st);
return st.st_size;
}
#else
size_t getFileSize(const char * filename)
{
FILE * fd=fopen(filename, "rb");
if(!fd)
printf("ERROR on file %s\n", filename);
fseek(fd, 0, SEEK_END);
size_t size = ftell(fd);
fclose(fd);
return size;
}
#endif
int main()
{
char buf[256];
int i, n;
for(i=0; i<10000; ++i)
{
sprintf(buf, "file_%d", i);
if(getFileSize(buf)!= 2048)
printf("WRONG!\n");
}
return 0;
}

Logically, one would assume that fseek() when prompted to seek to the end of the file uses stat to know how far to seek, or rather, where the end of the file is.
This would make fseek slower than using the facilities directly, and it also requires you to fopen the file in the first place.
Still, any performance difference is likely to be negligible, and if you need to open the file for some reason anyway, fseek/ftell likely improves the readability of your code significantly.

For stat.h you mainly want to use it to tell the stats of the file. Like if you want to tell if it's a file or a directory, etc.
However, if you want to do manipulations with the file, then you'll probably want to use ftell() and fseek(). That is you're actually doing manipulations on the file stream itself.
So in terms of performance, it's really what you need.
Hope it helps :) Cheers!

Depending on the circumstances, stat() can be hundred of times faster then seek()/tell(). I am currently toying around with sshfs/FUSE and getting the file size of a few thousand files with seek()/tell() takes well over a minute, doing it with stat() takes a second. So the difference is pretty huge when working over sshfs/FUSE.

Asynchronous io in c using windows API: which method to use and why does my code execute synchronous?

I have a C application which generates a lot of output and for which speed is critical. The program is basically a loop over a large (8-12GB) binary input file which must be read sequentially. In each iteration the read bytes are processed and output is generated and written to multiple files, but never to multiple files at the same time. So if you are at the point where output is generated and there are 4 output files you write to either file 0 or 1 or 2, or 3. At the end of the iteration I now write the output using fwrite(), thereby waiting for the write operation to finish. The total number of output operations is large, up to 4 million per file, and output size of files ranges from 100mb to 3.5GB. The program runs on a basic multicore processor.
I want to write output in a separate thread and I know this can be done with
Asyncronous I/O
Creating threads
I/O completion ports
I have 2 type of questions, namely conceptual and code specific.
Conceptual Question
What would be the best approach. Note that the application should be portable to Linux, however, I don't see how that would be very important for my choice for 1-3, since I would write a wrapper around anything kernel/API specific. For me the most important criteria is speed. I have read that option 1 is not that likely to increase the performance of the program and that the kernel in any case creates new threads for the i/o operation, so then why not use option (2) immediately with the advantage that it seems easier to program (also since I did not succeed with option (1), see code issues below).
Note that I read https://stackoverflow.com/questions/3689759/how-can-i-run-a-specific-function-of-thread-asynchronously-in-c-c, but I dont see a motivation on what to use based on the nature of the application. So I hope somebody could provide me with some advice what would be best in my situation. Also from the book "Windows System Programming" by Johnson M. Hart, I know that the recommendation is using threads, mainly because of the simplicity. However, will it also be fastest?
Code Question
This question involves the attempts I made so far to make asynchronous I/O work. I understand that its a big piece of code so that its not that easy to look into. In any case I would really appreciate any attempt.
To decrease execution time I try to write the output by means of a new thread using WINAPI via CreateFile() with FILE_FLAGGED_OVERLAP with an overlapped structure. I have created a sample program in which I try to get this to work. However, I encountered 2 problems:
The file is only opened in overlapped mode when I delete an already existing file (I have tried using CreateFile in different modes (CREATE_ALWAYS, CREATE_NEW, OPEN_EXISTING), but this does not help).
Only the first WriteFile is executed asynchronously. The remainder of WriteFile commands is synchronous. For this problem I already consulted http://support.microsoft.com/kb/156932. It seems that the problem I have is related to the fact that "any write operation to a file that extends its length will be synchronous". I've already tried to solve this by increasing file size/valid data size (commented region in code). However, I still do not get it to work. I'm aware of the fact that it could be the case that to get most out of asynchronous io i should CreateFile with FILE_FLAG_NO_BUFFERING, however I cannot get this to work as well.
Please note that the program creates a file of about 120mb in the path of execution. Also note that print statements "not ok" are not desireable, I would like to see "can do work in background" appear on my screen... What goes wrong here?
#include <windows.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ASYNC // remove this definition to run synchronously (i.e. using fwrite)
#ifdef ASYNC
struct _OVERLAPPED *pOverlapped;
HANDLE *pEventH;
HANDLE *pFile;
#else
FILE *pFile;
#endif
#define DIM_X 100
#define DIM_Y 150000
#define _PRINTERROR(msgs)\
{printf("file: %s, line: %d, %s",__FILE__,__LINE__,msgs);\
fflush(stdout);\
return 0;} \
#define _PRINTF(msgs)\
{printf(msgs);\
fflush(stdout);} \
#define _START_TIMER \
time_t time1,time2; \
clock_t clock1; \
time(&time1); \
printf("start time: %s",ctime(&time1)); \
fflush(stdout);
#define _END_TIMER\
time(&time2);\
clock1 = clock();\
printf("end time: %s",ctime(&time2));\
printf("elapsed processor time: %.2f\n",(((float)clock1)/CLOCKS_PER_SEC));\
fflush(stdout);
double aio_dat[DIM_Y] = {0};
double do_compute(double A,double B, int arr_len);
int main()
{
_START_TIMER;
const char *pName = "test1.bin";
DWORD dwBytesToWrite;
BOOL bErrorFlag = FALSE;
int j=0;
int i=0;
int fOverlapped=0;
#ifdef ASYNC
// create / open the file
pFile=CreateFile(pName,
GENERIC_WRITE, // open for writing
0, // share write access
NULL, // default security
CREATE_ALWAYS, // create new/overwrite existing
FILE_FLAG_OVERLAPPED, // | FILE_FLAG_NO_BUFFERING, // overlapped file
NULL); // no attr. template
// check whether file opening was ok
if(pFile==INVALID_HANDLE_VALUE){
printf("%x\n",GetLastError());
_PRINTERROR("file not opened properly\n");
}
// make the overlapped structure
pOverlapped = calloc(1,sizeof(struct _OVERLAPPED));
pOverlapped->Offset = 0;
pOverlapped->OffsetHigh = 0;
// put event handle in overlapped structure
if(!(pOverlapped->hEvent = CreateEvent(NULL,TRUE,FALSE,NULL))){
printf("%x\n",GetLastError());
_PRINTERROR("error in createevent\n");
}
#else
pFile = fopen(pName,"wb");
#endif
// create some output
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
// determine how many bytes should be written
dwBytesToWrite = (DWORD)sizeof(aio_dat);
for(i=0;i<DIM_X;i++){ // do this DIM_X times
#ifdef ASYNC
//if(i>0){
//SetFilePointer(pFile,dwBytesToWrite,NULL,FILE_CURRENT);
//if(!(SetEndOfFile(pFile))){
// printf("%i\n",pFile);
// _PRINTERROR("error in set end of file\n");
//}
//SetFilePointer(pFile,-dwBytesToWrite,NULL,FILE_CURRENT);
//}
// write the bytes
if(!(bErrorFlag = WriteFile(pFile,aio_dat,dwBytesToWrite,NULL,pOverlapped))){
// check whether io pending or some other error
if(GetLastError()!=ERROR_IO_PENDING){
printf("lasterror: %x\n",GetLastError());
_PRINTERROR("error while writing file\n");
}
else{
fOverlapped=1;
}
}
else{
// if you get here output got immediately written; bad!
fOverlapped=0;
}
if(fOverlapped){
// do background, this msgs is what I want to see
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
_PRINTF("can do work in background\n");
}
else{
// not overlapped, this message is bad
_PRINTF("not ok\n");
}
// wait to continue
if((WaitForSingleObject(pOverlapped->hEvent,INFINITE))!=WAIT_OBJECT_0){
_PRINTERROR("waiting did not succeed\n");
}
// reset event structure
if(!(ResetEvent(pOverlapped->hEvent))){
printf("%x\n",GetLastError());
_PRINTERROR("error in resetevent\n");
}
pOverlapped->Offset+=dwBytesToWrite;
#else
fwrite(aio_dat,sizeof(double),DIM_Y,pFile);
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
for(j=0;j<DIM_Y;j++){
aio_dat[j] = do_compute(i, j, DIM_X);
}
#endif
}
#ifdef ASYNC
CloseHandle(pFile);
free(pOverlapped);
#else
fclose(pFile);
#endif
_END_TIMER;
return 1;
}
double do_compute(double A,double B, int arr_len)
{
int i;
double res = 0;
double *xA = malloc(arr_len * sizeof(double));
double *xB = malloc(arr_len * sizeof(double));
if ( !xA || !xB )
abort();
for (i = 0; i < arr_len; i++) {
xA[i] = sin(A);
xB[i] = cos(B);
res = res + xA[i]*xA[i];
}
free(xA);
free(xB);
return res;
}
Useful links
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/cref_cls/common/cppref_asynchioC_aio_read_write_eg.htm
http://www.ibm.com/developerworks/linux/library/l-async/?ca=dgr-lnxw02aUsingPOISIXAIOAPI
http://www.flounder.com/asynchexplorer.htm#Asynchronous%20I/O
I know this is a big question and I would like to thank everybody in advance who takes the trouble reading it and perhaps even respond!

You should be able to get this to work using the OVERLAPPED structure.
You're on the right track: the system is preventing you from writing asynchronously because every WriteFile extends the size of the file. However, you're doing the file size extension wrong. Simply calling SetFileSize will not actually reserve space in the MFT. Use the SetFileValidData function. This will allocate clusters for your file (note that they will contain whatever garbage the disk had there) and you should be able to execute WriteFile and your computation in parallel.
I would stay away from FILE_FLAG_NO_BUFFERING. You're after more performance with parallelism I presume? Don't prevent the cache from doing its job.

Another option that you did not consider is a memory mapped file. Those are available on Windows and Linux. There is a handy Boost abstraction that you could use.
With a memory mapped file, every thread in your process could write its output to the file on its own time, assuming that the record sizes are known and each thread has its own output area.
The operating system will take care of writing the mapped pages to disk when needed or when it gets around to it or when you close the file. Maybe when you close the file. Now that I think about it, some operating systems may require that you call msync to guarantee it.

I don't see why you would want to write asynchronously. Doing things in parallel does not make them faster in all cases. If you write two file at the same time to the same disk, it will almost always be a lot faster. If that is the case, just write them one after another.
If you have some fancy drive like SSD or a virtual RAM drive, parallel writing could be faster. You have to create an file with at full size and then do your parallel magic.
Asynchronous writing is nice, but is done by any OS anyway. The potential gain for you is that you can do other things than writing to disk like displaying a progress bar. This is where multi-threading can help you.
So imho you should use serial writing or parallel writing to multiple disks.
hth

XOR on a very big file

I would like to XOR a very big file (~50 Go).
More precisely, I would like to do so by XORing each block of 32 bytes of a plaintext file (because of lack of memory) with the key 3847611839 and create (block after block) a new cipher file.
Thank You for any help!!

This sounded like fun, and doesn't sound like a homework assignment.
I don't have a previously xor-encrypted file to try with,but if you convert one back and forward, there's no diff.
That I tried atleast. Enjoy! :) This xor's every 4 bytes with 0xE555E5BF, I presume that's what you wanted.
Here's bloxor.c
// bloxor.c - by Peter Boström 2009, public domain, use as you see fit. :)
#include <stdio.h>
unsigned int xormask = 0xE555E5BF; //3847611839 in hex.
int main(int argc, char *argv[])
{
printf("%x\n", xormask);
if(argc < 3)
{
printf("usage: bloxor 'file' 'outfile'\n");
return -1;
}
FILE *in = fopen(argv[1], "rb");
if(in == NULL)
{
printf("Cannot open: %s", argv[2]);
return -1;
}
FILE *out = fopen(argv[2], "wb");
if(out == NULL)
{
fclose(in);
printf("unable to open '%s' for writing.",argv[2]);
return -1;
}
char buffer[1024]; //presuming 1024 is a good block size, I dunno...
int count;
while(count = fread(buffer, 1, 1024, in))
{
int i;
int end = count/4;
if(count % 4)
++end;
for(i = 0;i < end; ++i)
{
((unsigned int *)buffer)[i] ^= xormask;
}
if(fwrite(buffer, 1, count, out) != count)
{
fclose(in);
fclose(out);
printf("cannot write, disk full?\n");
return -1;
}
}
fclose(in);
fclose(out);
return 0;
}

As starblue mentioned in a comment, "Be aware that this is at best obfuscation, not encryption". And it's probably not even obfuscation.
One property of XOR is that (Y xor 0) == Y. What this means for your algorithm is that for anyplace in your very big file where there are runs of zeros (which seems pretty likely given the size of the file), your key will show up in the cipher file. Plain as day.
Another nice feature of XOR encrypted stuff is that if someone has both the plaintext and the cipher text, XOR'ing those items together nets you an output that has the key used to perform the cipher repeated over and over. If the person knows that the 2 files are a plaintext/ciphertext pair, they've learned the key which is bad if the key is used for more than one encryption. if the attacker isn't sure if the plaintext and ciphertext are related, they have a pretty good idea after this since the key is a repeated pattern in the output. None of this is a problem with one time pad because each bit of the key is used only once, so one one learns anything new from this attack.
A lot of people make the mistake of assuming that because a one time pad is provably unbreakable, that an XOR encryption might be OK 'if done well' since the fundamental operation performed is the same. The difference is that a one time pad uses each random bit of the key exactly once. So among other things, if the plaintext has a run of zeros, nothing is learned about the key, unlike with a simple fixed-key XOR cipher.
As Bruce Schneier said: "There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files."
An XOR cipher is barely kid sister proof - if even that.

You need to craft a solution around a streaming architecture: you read the input file in "stream", modify it, and write the result in the output file.
This way, you don't have to read all the file at once.

If your question is how to do it without using extra space on the disk, I would just read in the chunks in multiples of 32 bytes (as big as you can), work with the chunk in memory, then write it out again. You should be able to use the ftell and fseek functions to do that (assuming your long type is large enough, of course).
It may be faster to memory-map the file if you can spare that much out of your address space (and your OS supports it) but I'd try the easiest solution first.
Of course, if space isn't a problem, just read the chunks in and write them to a new file, something like the following (pseudo-code):
open infile
open outfile
while not end of infile:
read chunk from file
change chunk
write chunk to outfile
close outfile
close infile
This sort of read/process/write is pretty basic stuff. If you have more complicated requirements, you should update your question with them.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight