After writing a basic LFSR-based stream cipher encryption module in C, I tried it on usual text files, and then on a .exe file in Windows. However, after decrypting it back the file is not running, giving some error about being a 16-bit. Evidently some error in decrypting. Or are files made so that if I tamper with their binary code they become corrupted?
I'm checking my program on text-files in the hope of locating any error on my part. However, the question is had anyone tried running your own encryption programs on an executable file? Is their any obvious answer to this?
There is nothing special about executables. They are obviously binary files and thus contain 00 bytes and bytes >127. As long as your algorithm is binary safe, it should work.
Compare the original file and the decrypted file using a hex-editor. To see how they differ.
The error you get means that you didn't decrypt the executable header correctly, so the decryption mistake must already affect the first few bytes of your file.
Evidently some error in decrypting. An exe is a bag 'o bytes just like any other file, there's no magic. You are merely likely to run into byte values that you won't get in a text file. Like a zero.
A decryption process should be the inverse of its encryption. In other words, Decrypt(Encrypt(X)) == X for all inputs X, of all possible lengths, of all possible byte values.
I suggest you build yourself a test harness that will run some pairwise checks with randomised data so you can prove to yourself that the two transformations do indeed cancel each other out. I mean something like:
for length from 0 to 1000000:
generate a string of that length with random contents
encrypt it to a fresh memory buffer
decrypt it to a fresh memory buffer
compare the decrypted string with the original string
Do this first of all on in-memory strings so you can isolate the algorithm from your file-handling code.
Once you've proved the algorithm is properly inverting, you can then do the same for files; as others have said you might well be running into issues with handling binary files, that's a common gotcha.
Related
Why does inserting characters into an executable binary file cause it to "break" ?
And, is there any way to add characters without breaking the compiled program?
Background
I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...
Example
As an example in the application below, Facebook could be changed to Lacebook, and the program will still execute just fine:
But it Breaks with new Characters
I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My in front of Facebook would achieve this:
What I know
I've done some work with C and understand that code is written in human readable, compiled, and linked into an executable file.
I've done introductory studies of assembly language and understand the concepts about data, commands, and pointers being moved around
I've written small programs for Windows, Mac and Linux
What I don't know
I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
I understand why having extra characters in a text string of the binary file would cause problems
What I'd like to know
Why do the extra characters cause the program to break?
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.
Why do the extra characters cause the program to break?
Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.
When a program is compiled into machine code, it includes many references to the addresses of instructions and data in the program memory. The compiler determines the layout of all the memory of the program, and puts these addresses into the program. The executable file is also organized into sections, and there's a table of contents at the beginning that contains the number of bytes in each section.
If you insert something into the program, the address of everything after that is shifted up. But the parts of the program that contain references to the program and data locations are not updated, they continue to point to the original addresses. Also, the table that contains the sizes of all the sections is no longer correct, because you increased the size of whatever section you modified.
The format of a machine-language executable file is based on hard offsets, rather than on parsing a byte stream (like textual program source code). When you insert a byte somewhere, the file format continues to reference information which follows the insertion point at the original offsets.
Offsets may occur in the file format itself, such as the header which tells the loader where things are located in the file and how big they are.
Hard offsets also occur in machine language itself, such in instructions which refer to the program's data or in branch instructions.
Suppose an instruction says "branch 200 bytes down from where we are now", and you insert a byte into those 200 bytes (because a character string happens to be there that you want to alter). Oops; the branch still covers 200 bytes.
On some machines, the branch couldn't even be 201 bytes even if you fixed it up because it would be misaligned and cause a CPU exception; you would have to add, say, four bytes to patch it to 204 (along with a myriad other things needed to make the file sane).
It doesn't matter how I exactly encrypt and decode files. I operate with file as a char massive, everything is almost fine, until I get file, which size is not divide to 8 bytes. Because I can encrypt and decode file each round 8 bytes, because of particular qualities of algorithm (size of block must be 64 bit).
So then, for example, I face .jpg and tried simply add spaces to end of file, result file can't be opened ( ofc. with .txt files nothing bad happen).
Is any way out here?
If you want information about algorithm http://en.wikipedia.org/wiki/GOST_(block_cipher).
UPD: I can't store how many bytes was added, because initial file can be deleted or moved. And, what we are suppose to do then we know only key and have encrypted file.
Do you need padding.
The best way to do this would be to use PKCS#7.
However GOST is not so good, better using AES-CBC.
There is an ongoing similar discussion in "python-channel".
I'd like to encrypt a file with aes256 using OpenSSL with C.
I did find a pretty nice example here.
Should I first read the whole file into a memory buffer and than do the aes256, or should I do it partial with a ~16K buffer?
Any snippets or hints?
Loading the whole file in a buffer can get inefficient to impossible on larger files - do this only if all your files are below some size limit.
OpenSSL's EVP API (which is also used by the example you linked) has an EVP_EncryptUpdate function, which can be called multiple times, each time providing some more bytes to encrypt. Use this in a loop together with reading in the plaintext from a file into a buffer, and writing out the ciphertext to another file (or the same one). (Analogously for decryption.)
Of course, instead of inventing a new file format (which you are effectively doing here), think about implementing the OpenPGP Message format (RFC 4880). There are less chances to make mistakes which might destroy your security – and as an added bonus, if your program somehow ceases to work, your users can always use the standard tools (PGP or GnuPG) to decrypt the file.
It's better to reuse a fixed buffer, unless you know you'll always process small files - but I don't think that fits your backup files definition.
I said better in a non-cryptographic way :-) There won't be any difference at the end (for the encrypted file) but your computer might not like (or even be able) to load several MB (or GB) into memory.
Crypto-wise the operations are done in block, for AES it's 128 bits (16 bytes). So, for simplicity, you better use a multiple of 16 bytes for your buffer. Otherwise the choice is yours. I would suggest between 4kb to 16kb buffers but, to be honest, I would test several values.
I am designing a binary file format from scratch, and I would like to include some magic bytes at the beginning so that it can be identified easily. How do I go about choosing which bytes? I am not aware of any central registry of magic numbers, so is it just a matter of picking something fairly random that isn't already identified by, say, the file command on a nearby UNIX box?
Stay away from super-short magic numbers. Just because you're designing a binary format doesn't mean you can't use a text string for identifier. Follow that by an EOF char, and as an added bonus people who cat or type your binary file won't get a mangled terminal.
There is no universally correct way. Best practices can be suggested, but these often situational. For example, if you're checking the integrity of volatile memory, which has an undefined initial state when power is applied, it may be beneficial to incorporate many 0s or 1s in a sequence (i.e. FFF0 00FF F000) which can stand out against random noise.
If the file is mostly binary, a popular choice is using a text encoding like ASCII which stands out among the binary data in a hex editor. For example, GIF uses GIF89a, FLAC uses fLaC. On the other hand, a plain text identifier may be falsely detected in a random text file, so invalid/control characters might be incorporated.
In general, it does not matter that much what they are, even a bunch of NULL bytes can be used for file detection. But ideally you want the longest unique identifier you can afford, and at minimum 4 bytes long. Any identifier under 4 bytes will show up more often in random data. The longer it is, the less likely it will ever be detected as a false positive. Some known examples are as long as 40 bytes. In a way, it's like a password.
Also, it doesn't have to be at offset 0. The file signature has conventionally been at offset zero, since it made sense to store it first if it will be processed first.
That said, a single file signature should not be the only line of defense. The actual parsing process itself should be able to verify integrity and weed out invalid files even if the signature matches. This can be done with additional file signatures, using length-sensitive data, value/range checking, and especially, hash/checksum values.
How can I encrypt & decrypt binary files in C using OpenSSL?
I have a test program that encrypts and then decrypts the input it's given.
I executed my test program for text files, and the output is the same as the input, but when I execute my test program on a binary file the output is not the same as the input.
Just guessing: you are using Windows and missed O_BINARY flag in file operations?
Chances are you are using string functions like strlen() on the buffers you're reading. The OpenSSL functions work fine for binary files.
Without seeing your code we can only guess. But my first guess would be that your encryption or decryption routine is barfing on a \0 character or two within the binary file. The data must be treated as bytes not as character strings. (Same as the StrLen() problem mentioned elsewhere on this page.)
I'm not a C programmer(!) but the way I managed to get the encryption routines working within Delphi/Pascal was by downloading the OpenSSL source (in C) and stepping through the code for the openssl.exe application. Using the EVP_* functions became a whole lot easier once you work out how they do it themselves.