how to decrypt large files in memory using c

how to decrypt large files in memory using c - c

C based application is using aes with key size of 256. Data is available in binary form, it is encrypted and is written in the binary file. Requirment is to decrypt this binary file in RAM (i.e on the fly / real time encryption). Question is how to achieve on the fly encryption in efficient way? Any good web links or code references for understanding on the fly encryption are required.
In more simple way the question is how to decrypt large files in memory using c (Linux)? Like in truecrypt.

Use mmap on the file; the file is then opened as a datastream in memory. For example, a simple memory-changing function that XOR's each byte in a file on a large (say, 400Gb) file:
// The encryption function
void xor_ram (unsigned char *buffer, long len) {
while (len--) *buffer ^= *buffer++;
}
// The file we want to encrypt
int fd = open ("/path/to/file", O_RDWR);
// Figure out the file length
FILE *tmpf = fdopen (fd, "r");
fseek (tmpf, 0, SEEK_END);
long length = ftell (tmpf);
// Memory map the file using the fd
unsigned char *mapped_file = mmap (NULL, length,
PROT_READ | PROT_WRITE, MAP_PRIVATE,
fd, 0);
// Call the encryption function
xor_ram (mapped_file, length);
// All done now
munmap (mapped_file, length);
close (fd);
You can read the manpage for mmap here: http://unixhelp.ed.ac.uk/CGI/man-cgi?mmap
Although you should really find the documentation for mmap on your particular platform (man mmap if you're on a unix system of some sort, or search the platforms libraries if not).

Related

MMAP segmentation fault

int fp, page;
char *data;
if(argc > 1){
printf("Read the docs");
exit(1);
}
fp = open("log.txt", O_RDONLY); //Opening file to read
page = getpagesize();
data = mmap(0, page, PROT_READ, 0,fp, 0);
initscr(); // Creating the ncurse screen
clear();
move(0, 0);
printw("%s", data);
endwin(); //Ends window
fclose(fp); //Closing file
return 0;
Here is my code I keep getting a segmentation fault for some reason.
All my header files have been included so that's not the problem (clearly, because its something to do with memory). Thanks in advance.
Edit: Got it - it wasn't being formatted as a string. and also had to use stat() to get the file info rather than getpagesize()

You can't fclose() a file descriptor you got from open(). You must use close(fp) instead. What you do is passing a small int that gets treated as a pointer. This causes a segmentation fault.
Note that your choice of identifier naming is unfortunate. Usually fp would be a pointer-to-FILE (FILE*, as used by the standard IO library), while fd would be a file descriptor (a small integer), used by the kernel's IO system calls.
Your compiler should have told you that you pass an int where a pointer-to-FILE was expected, or that you use fclose() without a prototype in scope. Did you enable the maximum warning level of your compiler?
Another segfault is possible if the data pointer does not point to a NUL (0) terminated string. Does your log.txt contain NUL-terminated strings?
You should also check if mmap() fails returning MAP_FAILED.

Okay so here is the code that got it working
#include <sys/stat.h>
int status;
struct stat s;
status = stat(file, &s);
if(status < 0){
perror("Stat:");
exit(1);
data = mmap(NULL, s.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Before i was using 'getpagesize();' thanks beej !!!

mmap's man page gives you information on the parameters:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
As you can see, your second argument may be wrong (except you really want to exactly map a part of the file fitting into a single page).
Also: Probably 0 is not a valid flag value? Let's have a look again at the man page:
The flags argument determines whether updates to the mapping are
visible to other processes mapping the same region, and whether
updates are carried through to the underlying file. This behavior is
determined by including exactly one of the following values in flags: MAP_SHARED or MAP_PRIVATE
So you could try something like
data = mmap(0, size, PROT_READ, MAP_SHARED, fp, 0);
Always use the provided flags, as the underlying value may differ from machine to machine.
Also, the mapped area should not be larger than the underlying file. Check the size of log.txt beforehand.

The second argument to mmap should not be page size, it should be the size of your file. Here is a nice example.

Mmap a large 1TB file and write 1's to it?

I am very new to mmap and memset. I have been assigned a task to create a large file (1TB) and write 1's to it as we are trying to understand the performance.
Now from what I understand, I can basically fallocate a file with 1Tb, then in a C function, I can mmap it with PROT_READ, PROT_WRITE, MAP_SHARED and then memset that mmap'ed pointed with memset, like this :
int fd = open(FILEPATH, O_RDWE, (mode_t)0700);
size_t data_length = 1000000000000;
char *data = (char*)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd , 0);
memset(data, '1', data_length);
Is this correct?
Do I need to sync or anything to make this data persistent?
If so, I'm basically writing a 1TB file then why does my function run within split seconds.
I've tried to cat the output file and there indeed are 1's, its just that my terminal craps out after a few seconds, but the data is actually getting written.
Am I doing this correctly or should I actually go and write data to memory rather than memset. If so, how should I do it?
Thanks for any help.

Copying files using memory map

I want to implement an effective file copying technique in C for my process which runs on BSD OS. As of now the functionality is implemented using read-write technique. I am trying to make it optimized by using memory map file copying technique.
Basically I will fork a process which mmaps both src and dst file and do memcpy() of the specified bytes from src to dst. The process exits after the memcpy() returns. Is msync() required here, because when I actually called msync with MS_SYNC flag, the function took lot of time to return. Same behavior is seen with MS_ASYNC flag as well?
i) So to summarize is it safe to avoid msync()?
ii) Is there any other better way of copying files in BSD. Because bsd seems to be does not support sendfile() or splice()? Any other equivalents?
iii) Is there any simple method for implementing our own zero-copy like technique for this requirement?
My code
/* mmcopy.c
Copy the contents of one file to another file, using memory mappings.
Usage mmcopy source-file dest-file
*/
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "tlpi_hdr.h"
int
main(int argc, char *argv[])
{
char *src, *dst;
int fdSrc, fdDst;
struct stat sb;
if (argc != 3)
usageErr("%s source-file dest-file\n", argv[0]);
fdSrc = open(argv[1], O_RDONLY);
if (fdSrc == -1)
errExit("open");
/* Use fstat() to obtain size of file: we use this to specify the
size of the two mappings */
if (fstat(fdSrc, &sb) == -1)
errExit("fstat");
/* Handle zero-length file specially, since specifying a size of
zero to mmap() will fail with the error EINVAL */
if (sb.st_size == 0)
exit(EXIT_SUCCESS);
src = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fdSrc, 0);
if (src == MAP_FAILED)
errExit("mmap");
fdDst = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
if (fdDst == -1)
errExit("open");
if (ftruncate(fdDst, sb.st_size) == -1)
errExit("ftruncate");
dst = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdDst, 0);
if (dst == MAP_FAILED)
errExit("mmap");
memcpy(dst, src, sb.st_size); /* Copy bytes between mappings */
if (msync(dst, sb.st_size, MS_SYNC) == -1)
errExit("msync");
enter code here
exit(EXIT_SUCCESS);
}

Short answer: msync() is not required.
When you do not specify msync(), the operating system flushes the memory-mapped pages in the background after the process has been terminated. This is reliable on any POSIX-compliant operating system.
To answer the secondary questions:
Typically the method of copying a file on any POSIX-compliant operating system (such as BSD) is to use open() / read() / write() and a buffer of some size (16kb, 32kb, or 64kb, for example). Read data into buffer from src, write data from buffer into dest. Repeat until read(src_fd) returns 0 bytes (EOF).
However, depending on your goals, using mmap() to copy a file in this fashion is probably a perfectly viable solution, so long as the files being coped are relatively small (relative to the expected memory constraints of your target hardware and your application). The mmap copy operation will require roughly 2x the total physical memory of the file. So if you're trying to copy a file that's a 8MB, your application will use 16MB to perform the copy. If you expect to be working with even larger files then that duplication could become very costly.
So does using mmap() have other advantages? Actually, no.
The OS will often be much slower about flushing mmap pages than writing data directly to a file using write(). This is because the OS will intentionally prioritize other things ahead of page flushes so to keep the system 'responsive' for foreground tasks/apps.
During the time the mmap pages are being flushed to disk (in the background), the chance of sudden loss of power to the system will cause loss of data. Of course this can happen when using write() as well but if write() finishes faster then there's less chance for unexpected interruption.
the long delay you observe when calling msync() is roughly the time it takes the OS to flush your copied file to disk. When you don't call msync() it happens in the background instead (and also takes even longer for that reason).

reading and writing in chunks on linux using c

I have a ASCII file where every line contains a record of variable length. For example
Record-1:15 characters
Record-2:200 characters
Record-3:500 characters
...
...
Record-n: X characters
As the file sizes is about 10GB, i would like to read the record in chunks. Once read, i need to transform them, write them into another file in binary format.
So, for reading, my first reaction was to create a char array such as
FILE *stream;
char buffer[104857600]; //100 MB char array
fread(buffer, sizeof(buffer), 104857600, stream);
Is it correct to assume, that linux will issue one system call and fetch the entire 100MB?
As the records are separated by new line, i search for character by character for a new line character in the buffer and reconstruct each record.
My question is that is this how i should read in chunks or is there a better alternative to read data in chunks and reconstitute each record? Is there an alternative way to read x number of variable sized lines from an ASCII file in one call ?
Next during write, i do the same. I have a write char buffer, which i pass to fwrite to write a whole set of records in one call.
fwrite(buffer, sizeof(buffer), 104857600, stream);
UPDATE: If i setbuf(stream, buffer), where buffer is my 100MB char buffer, would fgets return from buffer or cause a disk IO?

Yes, fread will fetch the entire thing at once. (Assuming it's a regular file.) But it won't read 105 MB unless the file itself is 105 MB, and if you don't check the return value you have no way of knowing how much data was actually read, or if there was an error.
Use fgets (see man fgets) instead of fread. This will search for the line breaks for you.
char linebuf[1000];
FILE *file = ...;
while (fgets(linebuf, sizeof(linebuf), file) {
// decode one line
}
There is a problem with your code.
char buffer[104857600]; // too big
If you try to allocate a large buffer (105 MB is certainly large) on the stack, then it will fail and your program will crash. If you need a buffer that big, you will have to allocate it on the heap with malloc or similar. I'd certainly keep stack usage for a single function in the tens of KB at most, although you could probably get away with a few MB on most stock Linux systems.
As an alternative, you could just mmap the entire file into memory. This will not improve or degrade performance in most cases, but it easier to work with.
int r, fdes;
struct stat st;
void *ptr;
size_t sz;
fdes = open(filename, O_RDONLY);
if (fdes < 0) abort();
r = fstat(fdes, &st);
if (r) abort();
if (st.st_size > (size_t) -1) abort(); // too big to map
sz = st.st_size;
ptr = mmap(NULL, sz, PROT_READ, MAP_SHARED, fdes, 0);
if (ptr == MAP_FAILED) abort();
close(fdes); // file no longer needed
// now, ptr has the data, sz has the data length
// you can use ordinary string functions
The advantage of using mmap is that your program won't run out of memory. On a 64-bit system, you can put the entire file into your address space at the same time (even a 10 GB file), and the system will automatically read new chunks as your program accesses the memory. The old chunks will be automatically discarded, and re-read if your program needs them again.
It's a very nice way to plow through large files.

If you can, you might find that mmaping the file will be easiest. mmap maps a (portion of a) file into memory so the whole file can be accessed essentially as an array of bytes. In your case, you might not be able to map the whole file at once it would look something like:
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
/* ... */
struct stat stat_buf;
long pagesz = sysconf(_SC_PAGESIZE);
int fd = fileno(stream);
off_t line_start = 0;
char *file_chunk = NULL;
char *input_line;
off_t cur_off = 0;
off_t map_offset = 0;
/* map 16M plus pagesize to ensure any record <= 16M will always fit in the mapped area */
size_t map_size = 16*1024*1024+pagesz;
if (map_offset + map_size > stat_buf.st_size) {
map_size = stat_buf.st_size - map_offset;
}
fstat(fd, &stat_buf);
/* map the first chunk of the file */
file_chunk = mmap(NULL, map_size, PROT_READ, MAP_SHARED, fd, map_offset);
// until we reach the end of the file
while (cur_off < stat_buf.st_size) {
/* check if we're about to read outside the current chunk */
if (!(cur_off-map_offset < map_size)) {
// destroy the previous mapping
munmap(file_chunk, map_size);
// round down to the page before line_start
map_offset = (line_start/pagesz)*pagesz;
// limit mapped region to size of file
if (map_offset + map_size > stat_buf.st_size) {
map_size = stat_buf.st_size - map_offset;
}
// map the next chunk
file_chunk = mmap(NULL, map_size, PROT_READ, MAP_SHARED, fd, map_offset);
// adjust the line start for the new mapping
input_line = &file_chunk[line_start-map_offset];
}
if (file_chunk[cur_off-map_offset] == '\n') {
// found a new line, process the current line
process_line(input_line, cur_off-line_start);
// set up for the next one
line_start = cur_off+1;
input_line = &file_chunk[line_start-map_offset];
}
cur_off++;
}
Most of the complication is to avoid making too huge a mapping. You might be able to map the whole file using
char *file_data = mmap(NULL, stat_buf.st_size, PROT_READ, MAP_SHARED, fd, 0);

my opinion is using fgets(buff) for auto detect new line.
and then use strlen(buff) for counting the buffer size,
if( (total+strlen(buff)) > 104857600 )
then write in new chunk..
But the chunk's size will hardly be 104857600 bytes.
CMIIW

What is the fastest way to overwrite an entire file with zeros in C?

What I need to do is to fill the entire file contents with zeros in the fastest way. I know some linux commands like cp actually gets what is the best block size information to write at a time, but I wasn't able to figure out if using this block size information is enough to have a nice performance and looks like the st_blksize from the stat() isn't giving me that block size.
Thank you !
Some answers to the comments:
This need to be done in C, not using utilities like shred.
There is no error in the usage of the stat()
st_blksize is returning a block greater than the file size,
don't know how can I handle that.
Using truncate()/ftruncate(), only the extra space is filled with
zeros, I need to overwrite the entire file data.
I'm thinking in something like:
fd = open("file.txt", O_WRONLY);
// check for errors (...)
while(TRUE)
{
ret = write(fd, buffer, sizeof(buffer));
if (ret == -1) break;
}
close(fd);
The problem is how to define the best buffer size "programmatically".

Fastest and simplest:
int fd = open("file", O_WRONLY);
off_t size = lseek(fd, 0, SEEK_END);
ftruncate(fd, 0);
ftruncate(fd, size);
Obviously it would be nice to add some error checking.
This solution is not what you want for secure obliteration of the file though. It will simply mark the old blocks used by the file as unused and leave a sparse file that doesn't occupy any physical space. If you want to clear the old contents of the file from the physical storage medium, you might try something like:
static const char zeros[4096];
int fd = open("file", O_WRONLY);
off_t size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);
while (size>sizeof zeros)
size -= write(fd, zeros, sizeof zeros);
while (size)
size -= write(fd, zeros, size);
You could increase the size of zeros up to 32768 or so if testing shows that it improves performance, but beyond a certain point it should not help and will just be a waste.

With mmap (and without error checking):
stat(filename,&stat_buf);
len=stat_buf.st_size;
fd=open(filename,O_RDWR);
ptr=mmap(NULL,len,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);
memset(ptr,0,len);
munmap(ptr,len);
close(fd);
This should use the kernel's idea of block size, so you don't need to worry about it. Unless the file is larger than your address space.

This is my idea; notice I removed every error checking code for clarity.
int f = open("file", "w"); // open file
int len = lseek(f, 0, SEEK_END); // and get its length
lseek(f, 0, SEEK_BEG); // then go back at the beginning
char *buff = malloc(len); // create a buffer large enough
memset(buff, 0, len); // fill it with 0s
write(f, buff, len); // write back to file
close(f); // and close