How to properly work with file upon encoding and decoding it? - c

It doesn't matter how I exactly encrypt and decode files. I operate with file as a char massive, everything is almost fine, until I get file, which size is not divide to 8 bytes. Because I can encrypt and decode file each round 8 bytes, because of particular qualities of algorithm (size of block must be 64 bit).
So then, for example, I face .jpg and tried simply add spaces to end of file, result file can't be opened ( ofc. with .txt files nothing bad happen).
Is any way out here?
If you want information about algorithm http://en.wikipedia.org/wiki/GOST_(block_cipher).
UPD: I can't store how many bytes was added, because initial file can be deleted or moved. And, what we are suppose to do then we know only key and have encrypted file.

Do you need padding.
The best way to do this would be to use PKCS#7.
However GOST is not so good, better using AES-CBC.
There is an ongoing similar discussion in "python-channel".

Related

Two the same files, different size

I have been playing with the files' sizes a bit as I use CheckSum to prevent from creating duplicates of the same file. CheckSum works absolutely fine, exactly as I would expect it to work. The problem I face is a fact that the same files have different sizes. Let me explain it, e.g. if I have a docx file and one of the words it contains is my first name "Szymon" and the size of this file is 436,854 bytes. Then, I will remove "Szymon" from the document and wrote it again, in exactly the same way, so "Szymon". In the very end I can see a slight difference of 10-20 bytes between the initial size of the document (436,854 bytes) and the second one (436,875 bytes). My question is, what is the reason for it to happen, cause both docx files contain exactly the same content?
Thanks in advance

After encryption, an exe file becomes non-executable

After writing a basic LFSR-based stream cipher encryption module in C, I tried it on usual text files, and then on a .exe file in Windows. However, after decrypting it back the file is not running, giving some error about being a 16-bit. Evidently some error in decrypting. Or are files made so that if I tamper with their binary code they become corrupted?
I'm checking my program on text-files in the hope of locating any error on my part. However, the question is had anyone tried running your own encryption programs on an executable file? Is their any obvious answer to this?
There is nothing special about executables. They are obviously binary files and thus contain 00 bytes and bytes >127. As long as your algorithm is binary safe, it should work.
Compare the original file and the decrypted file using a hex-editor. To see how they differ.
The error you get means that you didn't decrypt the executable header correctly, so the decryption mistake must already affect the first few bytes of your file.
Evidently some error in decrypting. An exe is a bag 'o bytes just like any other file, there's no magic. You are merely likely to run into byte values that you won't get in a text file. Like a zero.
A decryption process should be the inverse of its encryption. In other words, Decrypt(Encrypt(X)) == X for all inputs X, of all possible lengths, of all possible byte values.
I suggest you build yourself a test harness that will run some pairwise checks with randomised data so you can prove to yourself that the two transformations do indeed cancel each other out. I mean something like:
for length from 0 to 1000000:
generate a string of that length with random contents
encrypt it to a fresh memory buffer
decrypt it to a fresh memory buffer
compare the decrypted string with the original string
Do this first of all on in-memory strings so you can isolate the algorithm from your file-handling code.
Once you've proved the algorithm is properly inverting, you can then do the same for files; as others have said you might well be running into issues with handling binary files, that's a common gotcha.

openssl aes256 encryption of a file

I'd like to encrypt a file with aes256 using OpenSSL with C.
I did find a pretty nice example here.
Should I first read the whole file into a memory buffer and than do the aes256, or should I do it partial with a ~16K buffer?
Any snippets or hints?
Loading the whole file in a buffer can get inefficient to impossible on larger files - do this only if all your files are below some size limit.
OpenSSL's EVP API (which is also used by the example you linked) has an EVP_EncryptUpdate function, which can be called multiple times, each time providing some more bytes to encrypt. Use this in a loop together with reading in the plaintext from a file into a buffer, and writing out the ciphertext to another file (or the same one). (Analogously for decryption.)
Of course, instead of inventing a new file format (which you are effectively doing here), think about implementing the OpenPGP Message format (RFC 4880). There are less chances to make mistakes which might destroy your security – and as an added bonus, if your program somehow ceases to work, your users can always use the standard tools (PGP or GnuPG) to decrypt the file.
It's better to reuse a fixed buffer, unless you know you'll always process small files - but I don't think that fits your backup files definition.
I said better in a non-cryptographic way :-) There won't be any difference at the end (for the encrypted file) but your computer might not like (or even be able) to load several MB (or GB) into memory.
Crypto-wise the operations are done in block, for AES it's 128 bits (16 bytes). So, for simplicity, you better use a multiple of 16 bytes for your buffer. Otherwise the choice is yours. I would suggest between 4kb to 16kb buffers but, to be honest, I would test several values.

ARM binary and hexedit

I have an ARM binary file and want to change some text.
I remove couple of text-symbols from comment.
But the binary won't start, with log:
link_image[1710]: 3013 missing essential tables CANNOT LINK EXECUTABLE
Does anybody have an idea how to edit ARM binary files?
I remove couple of text-symbols
Stop right there. If I am reading what you wrote correctly, you removed some characters, instead of replacing them with other characters.
This would shift the whole rest of the file. But binary files often have tables or offsets which point to other parts of the file. Shifting the contents of the file, even by a single byte, means these tables or offsets no longer point where they should. The code trying to read the file was rightly confused after that.
When editing binary files, you must never move the contents, unless you know what you are doing. If you are editing the text, your changes must not change the size of the text. If the new text is smaller, you must pad it so it keeps the same size; if the new text is larger, it will not fit and you must find a shorter text.
Of course, this assumes that the file format does not have checksums which would notice the change, or that you know how to recompute them.
Also, make sure you are using a proper editor. Normal text editors can silently add, remove, or replace characters, which could break the file, possibly in a hard-to-detect way.

Lots of questions about file I/O (reading/writing message strings)

For this university project I'm doing (for which I've made a couple of posts in the past), which is some sort of social network, it's required the ability for the users to exchange messages.
At first, I designed my data structures to hold ALL messages in a linked list, limiting the message size to 256 chars. However, I think my instructors will prefer if I save the messages on disk and read them only when I need them. Of course, they won't say what they prefer, I need to make a choice and justify the best I can why I went that route.
One thing to keep in mind is that I only need to save the latest 20 messages from each user, no more.
Right now I have an Hash Table that will act as inbox, this will be inside the user profile. This Hash Table will be indexed by name (the user that sent the message). The value for each element will be a data structure holding an array of size_t with 20 elements (20 messages like I said above). The idea is to keep track of the disk file offsets and bytes written. Then, when I need to read a message, I just need to use fseek() and read the necessary bytes.
I think this could work nicely... I could use just one single file to hold all messages from all users in the network. I'm saying one single file because a colleague asked an instructor about saving the messages from each user independently which he replied that it might not be the best approach cause the file system has it's limits. That's why I'm thinking of going the single file route.
However, this presents a problem... Since I only need to save the latest 20 messages, I need to discard the older ones when I reach this limit.
I have no idea how to do this... All I know is about fread() and fwrite() to read/write bytes from/to files. How can I go to a file offset and say "hey, delete the following X bytes"? Even if I could do that, there's another problem... All offsets below that one will be completely different and I would have to process all users mailboxes to fix the problem. Which would be a pain...
So, any suggestions to solve my problems? What do you suggest?
You can't arbitrarily delete bytes from the middle of a file; the only way that works is to rewrite the entire file without them. Disregarding the question of whether doing things this way is a good idea, if you have fixed length fields, one solution would be to just overwrite the oldest message with the newest one; that way, the size / position of the message on disk doesn't change, so none of the other offsets are affected.
Edit: If you're allowed to use external libraries, making a simple SQLite db could be a good solution.
You're complicating your life way more than you need to.
If your messages are 256 characters, then use a array of 256 characters to hold each message.
Write it to disk with fwrite, read with fread, delete it by changing the first character of the string to \0 (or whatever else strikes your fancy) and write that to disk.
Keep an index of the messages in a simple structure (username/recno) and bounce around in the file with fseek. You can either brute-force the next free record when writing a new one (start reading at the beginning of the file and stop when you hit your \0) or keep an index of free records in an array and grab one of them when writing a new one (or if your array is empty then fseek to the end of the file and write a complete new record.)
I want to suggest another solution for completeness' sake:
Strings should be ending with a null-byte character, "hello world\0", so you might read the raw binary data until reaching "\0".
Other datatypes have fixed bits, beware of byteorder (endian).
Also you could define a payload before each message, so you know its string length:
"11hello world;2hi;15my name is loco"
Thus making it possible to treat raw snippets like data fields.

Resources