I have this function :
int cipher_file(char *file_path, uint8_t *key, int key_size){
FILE *file;
size_t read_char_count, wrote_char_count;
fpos_t *pos = malloc(sizeof(fpos_t));
char *block = malloc(16*sizeof(uint8_t));
if ( !(file = fopen(file_path, "rb+")) ) {
return EXIT_FAILURE;
}
while(!feof(file)){
while( ( read_char_count = fread(block, 1, 16*sizeof(uint8_t), file) ) > 0 ) {
block = cipher_block(block, key, key_size);
fseek(file, -read_char_count, SEEK_CUR);
wrote_char_count = fwrite(block , 1, 16*sizeof(uint8_t), file);
}
}
fclose(file);
return EXIT_SUCCESS;
}
(I know ECB mode is not safe btw)
Which takes a file, break it down in blocks of 128 bits, cipher them using an AES and write them back to the file, effectively replacing plain text with cipher text.
I also wrote a function decipher_file() to decipher the file.
The issue is that if the file size is not a multiple of 128 bits, at the end fread() only partially replace content of "block" (which is 16 bytes long) with the successfully read characters, leaving a bunch of garbage from the previous ciphered block.
When deciphering since decipher_file() has normally no way of knowing the size of the original file, it deciphers all the content, including the garbage characters, and write it back to the file.
I also tried re initializing "block" with zeros at each round but, without great surprise, they were added to the file too, which can be very problematic.
So my question is, is there a way (like a function) to signify where the file ends, or tell fwrite() to stop writing?
You can't use a special character because the encrypted data might end up looking like it. It would not be an elegant solution anyway.
There are multiple solutions:
Prefix the file with the decrypted content length. That's very clean and easy to implement.
Use a cipher mode that retains length information. ECB does not. Use padding schemes or a scheme, that preserves the length such as counter mode.
Related
I am reading data from an input file and compressing it with bzip library function calls BZ2_bzCompress in C. I can compress the data successfully. But I cannot write all the compressed data to an output file. Only the first compressed line can be written. Am I missing something here.
int main()
{
bz_stream bz;
FILE* f_d;
FILE* f_s;
BZFILE* b;
int bzerror = -10;
unsigned int nbytes_in;
unsigned int nbytes_out;
char buf[3000] = {0};
int result = 0;
char buf_read[500];
char file_name[] = "/path/file_name";
long int save_pos;
f_d = fopen ( "myfile.bz2", "wb+" );
f_s = fopen(file_name, "r");
if ((!f_d) && (!f_s)) {
printf("Cannot open files");
return(-1);
}
bz.opaque = NULL;
bz.bzalloc = NULL;
bz.bzfree = NULL;
result = BZ2_bzCompressInit(&bz, 1, 2, 30);
while (fgets(buf_read, sizeof(buf_read), f_s) != NULL)
{
bz.next_in = buf_read;
bz.avail_in = sizeof(buf_read);
bz.next_out = buf;
bz.avail_out = sizeof(buf);
printf("%s\n", buf_read);
save_pos = ftell(f_d);
fseek(f_d, save_pos, SEEK_SET);
while ((result == BZ_RUN_OK) || (result == 0) || (result == BZ_FINISH_OK))
{
result = BZ2_bzCompress(&bz, (bz.avail_in) ? BZ_RUN : BZ_FINISH);
printf("2 result:%d,in:%d,outhi:%d, outlo:%d \n",result, bz.total_in_lo32, bz.total_out_hi32, bz.total_out_lo32);
fwrite(buf, 1, bz.total_out_lo32, f_d);
}
if (result == BZ_STREAM_END)
{
result = BZ2_bzCompressEnd(&bz);
}
printf("3 result:%d, out:%d\n", result, bz.total_out_lo32);
result = BZ2_bzCompressInit(&bz, 1, 2, 30);
memset(buf, 0, sizeof(buf));
}
fclose(f_d);
fclose(f_s);
return(0);
}
TL;DR: there are multiple problems, but the main one that explains the problem you asked about is likely that you compress each line of the file independently, instead of the whole file as a unit.
According to the docs of BZ2_bzCompressInit, the bz_stream argument should be allocated and initialized before the call. Yours is (automatically) allocated, but not (fully) initialized. It would be clearer and easier to change to
bz_stream bz = { 0 };
and then skip the assignments to bz.opaque, bz.alloc, and bz.free.
You store but do not really check the return value of your BZ2_bzCompressInit call. It does eventually get tested in the condition of the inner while loop, but you do not detect error conditions there, but instead just success and normal completion conditions.
Your handling of the input buffer is significantly flawed.
In the first place, you set the number of available input bytes incorrectly:
bz.avail_in = sizeof(buf_read);
Since you're using fgets() to read data into the buffer, under no circumstances is the full size of the buffer occupied by input data, because fgets() ensures that a string terminator is written into the array. In fact, it could be worse because fgets() will stop at after newlines, so it may provide as few as just one input byte on a successful read.
If you want to stick with fgets() then you need to use strlen() to determine the number of bytes available from each read, but I would suggest that you instead switch to fread(), which will more reliably fill the buffer, indicate with its return value how many bytes were read, and correctly handle inputs containing null bytes.
In the second place, you use BZ2_bzCompress() to compress each buffer of input as if it were a complete file. When you come to the end of the buffer, you finish a compression run and reinitialize the bz_stream. This will definitely interfere with decompressing, and it may explain why your program (seems to) compress only the first line of its input. You should be reading the whole content of the file (in suitably-sized chunks) and feeding all of it to BZ2_bzCompress(... BZ_RUN) before you finish up. There should be one sequence of calls to BZ2_bzCompress(... BZ_FINISH) and finally one call to BZ2_bzCompressEnd() for the whole file, not per line.
You do not perform error detection or handling for any of your calls to standard library or bzip functions. You do handle the expected success-case return values for some of these, but you need to be rpepared for errors, too.
There are some additional oddities
you have unused variables nbytes_in, nbytes_out, bzerror, and b.
you open the input file as a text file, though whether that makes any difference is platform-dependent.
the ftell() / fseek() pair has no overall effect other than setting save_pos, which is not otherwise used.
although it is not harmful, it also is not useful to memset() the output buffer to all-zeroes at the end of each line (or initially).
Given that you're compressing the input, it's odd (but again not harmful) that you provide six times as much output buffer as you do input buffer.
I took over a project that use the following function to read files:
char *fetchFile(char *filename) {
char *buffer;
int len;
FILE *f = fopen(filename, "rb");
if(f) {
if(verbose) {
fprintf(stdout, "Opened file %s successfully\n", filename);
}
fseek(f, 0, SEEK_END);
len = ftell(f);
fseek(f, 0, SEEK_SET);
if(verbose) {
fprintf(stdout, "Allocating memory for buffer for %s\n", filename);
}
buffer = malloc(len + 1);
if(buffer) fread (buffer, 1, len, f);
fclose (f);
buffer[len] = '\0';
} else {
fprintf(stderr, "Error reading file %s\n", filename);
exit(1);
}
return buffer;
}
The rb mode is used because sometimes the file can be a spreadsheet and therefore I want the information as in a text file.
The program runs on a linux machine but the files to read come from linux and windows.
I am not sure of what approach is better to not have windows line ending mess with my code.
I was thinking of using dos2unix at the start of this function.
I also thought of opening in r mode, but I believe that could potentially mess things up when opening non-text files.
I would like to understand better the differences between using:
dos2unix,
r vs rb mode,
or any other solution which would fit
better the problem.
Note: I believe that I understand r vs rb modes, but if you could explain why it is a bad or good solution for this specific situation (I think it wouldn't be good because sometimes it opens spreadsheets but I am not sure of that).
If my understanding is correct the rb mode is used because sometimes the file can be a spreadsheet and therefore the programs just want the information as in a text file.
You seem uncertain, and though perhaps you do understand correctly, your explanation does not give me any confidence in that.
C knows about two distinct kinds of streams: binary streams and text streams. A binary stream is simply an ordered sequence of bytes, written and / or read as-is without any kind of transformation. On the other hand,
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation. [...]
(C2011 7.21.2/2)
For some implementations, such as POSIX-compliant ones, this is a distinction without a difference. For other implementations, such as those targeting Windows, the difference matters. In particular, on Windows, text streams convert on the fly between carriage-return / line-feed pairs in the external representation and newlines (only) in the internal representation.
The b in your fopen() mode specifies that the file should be opened as a binary stream -- that is, no translation will be performed on the bytes read from the file. Whether this is the right thing to do depends on your environment and the application's requirements. This is moot on Linux or another Unix, however, as there is no observable difference between text and binary streams on such systems.
dos2unix converts carriage-return / line-feed pairs in the input file to single line-feed (newline) characters. This will convert a Windows-style text file or one with mixed Windows / Unix line terminators to Unix text file convention. It is irreversible if there are both Windows-style and Unix-style line terminators in the file, and it is furthermore likely to corrupt your file if it is not a text file in the first place.
If your inputs are sometimes binary files then opening in binary mode is appropriate, and conversion via dos2unix probably is not. If that's the case and you also need translation for text-file line terminators, then you first and foremost need a way to distinguish which case applies for any particular file -- for example, by command-line argument or by pre-analyzing the file via libmagic. You then must provide different handling for text files; your main options are
Perform the line terminator conversion in your own code.
Provide separate versions of the fetchFile() function for text and binary files.
The code just copies the contents of a file to an allocated buffer. The UNIX way (YMMV) is to just memory map the file instead of reading it. Much faster.
// untested code
void* mapfile(const char *name)
{
int fd;
struct stat st;
if ((fd = open(name, O_RDONLY)) == -1)
return NULL;
if (fstat(fd, &st)) {
close(fd);
return NULL;
}
void *p = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, 0, fd);
close(fd);
if (p == (void *)MAP_FAILED)
p = NULL;
return p;
}
Something along these lines will work. Adjust settings if you want to write to the file as well.
I am designing an image decoder and as a first step I tried to just copy the using c. i.e open the file, and write its contents to a new file. Below is the code that I used.
while((c=getc(fp))!=EOF)
fprintf(fp1,"%c",c);
where fp is the source file and fp1 is the destination file.
The program executes without any error, but the image file(".bmp") is not properly copied. I have observed that the size of the copied file is less and only 20% of the image is visible, all else is black. When I tried with simple text files, the copy was complete.
Do you know what the problem is?
Make sure that the type of the variable c is int, not char. In other words, post more code.
This is because the value of the EOF constant is typically -1, and if you read characters as char-sized values, every byte that is 0xff will look as the EOF constant. With the extra bits of an int; there is room to separate the two.
Did you open the files in binary mode? What are you passing to fopen?
It's one of the most "popular" C gotchas.
You should use freadand fwrite using a block at a time
FILE *fd1 = fopen("source.bmp", "r");
FILE *fd2 = fopen("destination.bmp", "w");
if(!fd1 || !fd2)
// handle open error
size_t l1;
unsigned char buffer[8192];
//Data to be read
while((l1 = fread(buffer, 1, sizeof buffer, fd1)) > 0) {
size_t l2 = fwrite(buffer, 1, l1, fd2);
if(l2 < l1) {
if(ferror(fd2))
// handle error
else
// Handle media full
}
}
fclose(fd1);
fclose(fd2);
It's substantially faster to read in bigger blocks, and fread/fwrite handle only binary data, so no problem with \n which might get transformed to \r\n in the output (on Windows and DOS) or \r (on (old) MACs)
I am currently working on a project in which I have to read from a binary file and send it through sockets and I am having a hard time trying to send the whole file.
Here is what I wrote so far:
FILE *f = fopen(line,"rt");
//size = lseek(f, 0, SEEK_END)+1;
fseek(f, 0L, SEEK_END);
int size = ftell(f);
unsigned char buffer[MSGSIZE];
FILE *file = fopen(line,"rb");
while(fgets(buffer,MSGSIZE,file)){
sprintf(r.payload,"%s",buffer);
r.len = strlen(r.payload)+1;
res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
return -1;
}
}
I think it has something to do with the size of the buffer that I read into,but I don't know what it's the correct formula for it.
One more thing,is the sprintf done correctly?
If you are reading binary files, a NUL character may appear anywhere in the file.
Thus, using string functions like sprintf and strlen is a bad idea.
If you really need to use a second buffer (buffer), you could use memcpy.
You could also directly read into r.payload (if r.payload is already allocated with sufficient size).
You are looking for fread for a binary file.
The return value of fread tells you how many bytes were read into your buffer.
You may also consider to call fseek again.
See here How can I get a file's size in C?
Maybe your code could look like this:
#include <stdint.h>
#include <stdio.h>
#define MSGSIZE 512
struct r_t {
uint8_t payload[MSGSIZE];
int len;
};
int send_message(struct r_t *t);
int main() {
struct r_t r;
FILE *f = fopen("test.bin","rb");
fseek(f, 0L, SEEK_END);
size_t size = ftell(f);
fseek(f, 0L, SEEK_SET);
do {
r.len = fread(r.payload, 1, sizeof(r.payload), f);
if (r.len > 0) {
int res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
fclose(f);
return -1;
}
}
} while (r.len > 0);
fclose(f);
return 0;
}
No, the sprintf is not done correctly. It is prone to buffer overflow, a very serious security problem.
I would consider sending the file as e.g. 1024-byte chunks instead of as line-by-line, so I would replace the fgets call with an fread call.
Why are you opening the file twice? Apparently to get its size, but you could open it only once and jump back to the beginning of the file. And, you're not using the size you read for anything.
Is it a binary file or a text file? fgets() assumes you are reading a text file -- it stops on a line break -- but you say it's a binary file and open it with "rb" (actually, the first time you opened it with "rt", I assume that was a typo).
IMO you should never ever use sprintf. The number of characters written to the buffer depends on the parameters that are passed in, and in this case if there is no '\0' in buffer then you cannot predict how many bytes will be copied to r.payload, and there is a very good chance you will overflow that buffer.
I think sprintf() would be the first thing to fix. Use memcpy() and you can tell it exactly how many bytes to copy.
I would like to XOR a very big file (~50 Go).
More precisely, I would like to do so by XORing each block of 32 bytes of a plaintext file (because of lack of memory) with the key 3847611839 and create (block after block) a new cipher file.
Thank You for any help!!
This sounded like fun, and doesn't sound like a homework assignment.
I don't have a previously xor-encrypted file to try with,but if you convert one back and forward, there's no diff.
That I tried atleast. Enjoy! :) This xor's every 4 bytes with 0xE555E5BF, I presume that's what you wanted.
Here's bloxor.c
// bloxor.c - by Peter Boström 2009, public domain, use as you see fit. :)
#include <stdio.h>
unsigned int xormask = 0xE555E5BF; //3847611839 in hex.
int main(int argc, char *argv[])
{
printf("%x\n", xormask);
if(argc < 3)
{
printf("usage: bloxor 'file' 'outfile'\n");
return -1;
}
FILE *in = fopen(argv[1], "rb");
if(in == NULL)
{
printf("Cannot open: %s", argv[2]);
return -1;
}
FILE *out = fopen(argv[2], "wb");
if(out == NULL)
{
fclose(in);
printf("unable to open '%s' for writing.",argv[2]);
return -1;
}
char buffer[1024]; //presuming 1024 is a good block size, I dunno...
int count;
while(count = fread(buffer, 1, 1024, in))
{
int i;
int end = count/4;
if(count % 4)
++end;
for(i = 0;i < end; ++i)
{
((unsigned int *)buffer)[i] ^= xormask;
}
if(fwrite(buffer, 1, count, out) != count)
{
fclose(in);
fclose(out);
printf("cannot write, disk full?\n");
return -1;
}
}
fclose(in);
fclose(out);
return 0;
}
As starblue mentioned in a comment, "Be aware that this is at best obfuscation, not encryption". And it's probably not even obfuscation.
One property of XOR is that (Y xor 0) == Y. What this means for your algorithm is that for anyplace in your very big file where there are runs of zeros (which seems pretty likely given the size of the file), your key will show up in the cipher file. Plain as day.
Another nice feature of XOR encrypted stuff is that if someone has both the plaintext and the cipher text, XOR'ing those items together nets you an output that has the key used to perform the cipher repeated over and over. If the person knows that the 2 files are a plaintext/ciphertext pair, they've learned the key which is bad if the key is used for more than one encryption. if the attacker isn't sure if the plaintext and ciphertext are related, they have a pretty good idea after this since the key is a repeated pattern in the output. None of this is a problem with one time pad because each bit of the key is used only once, so one one learns anything new from this attack.
A lot of people make the mistake of assuming that because a one time pad is provably unbreakable, that an XOR encryption might be OK 'if done well' since the fundamental operation performed is the same. The difference is that a one time pad uses each random bit of the key exactly once. So among other things, if the plaintext has a run of zeros, nothing is learned about the key, unlike with a simple fixed-key XOR cipher.
As Bruce Schneier said: "There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files."
An XOR cipher is barely kid sister proof - if even that.
You need to craft a solution around a streaming architecture: you read the input file in "stream", modify it, and write the result in the output file.
This way, you don't have to read all the file at once.
If your question is how to do it without using extra space on the disk, I would just read in the chunks in multiples of 32 bytes (as big as you can), work with the chunk in memory, then write it out again. You should be able to use the ftell and fseek functions to do that (assuming your long type is large enough, of course).
It may be faster to memory-map the file if you can spare that much out of your address space (and your OS supports it) but I'd try the easiest solution first.
Of course, if space isn't a problem, just read the chunks in and write them to a new file, something like the following (pseudo-code):
open infile
open outfile
while not end of infile:
read chunk from file
change chunk
write chunk to outfile
close outfile
close infile
This sort of read/process/write is pretty basic stuff. If you have more complicated requirements, you should update your question with them.