Using libsodium XChaCha20-Poly1305 for large files - c

I was looking through libsodium, and in particular at the symmetric encryption option XChaCha20-Poly1305. What I can't get my head around is that libsodium appears to provide no "context/update/finalise" style of working that you commonly find in crypto libraries.
It is clear from the libsodium that there is "no practical limit" to the size of a XChaCha20-Poly1305 message. However in practical terms, if I'm encrypting a multi-GB file, I'm not quite clear as to how you would use libsodium for that ? Because obviously you would only be passing the contents of the fread buffer to crypto_aead_xchacha20poly1305_ietf_encrypt?
IMPORTANT NOTE TO THOSE WHO THINK THIS IS OFF TOPIC
After bowing to peer pressure, I did delete this post. However I have re-opened it at the request of #MaartenBodewes who felt strongly that it was on-topic, and so strongly that he put in some effort into writing an answer. Therefore out of respect for his effort, I have undeleted the post. Please, spare me more "off-topic" comments, I've read enough of them!

In the introduction of libsodium it reads: "Its goal is to provide all of the core operations needed to build higher-level cryptographic tools."
Libsodium is therefore a relatively high level library that provides limited access to the underlying structures.
That said, there are some inherent difficulties of encrypting such large files using an authenticated cipher. The problem is that you either need to first verify the authenticity and then start to decrypt or you need to decrypt online before verifying the authentication tag. That in turn means that you have to write / destroy the contents if verification fails.
Generally you can get around that by encrypting in e.g. blocks of 16KiB or so and then add an authentication tag for the block. Of course you would need to make sure that you increase the nonce (making sure that the counter of the stream cipher doesn't repeat). This will add some overhead of course, but nothing spectacular - and you'd have some overhead anyway. The disadvantage is that you cannot decrypt in place anymore (as that would leave gaps).
You could also store all the authentication tags at the end if you want to make a really advanced scheme. Or buffer all the authentication tags in memory and calculate a single (HMAC) tag over all the collected tags.
So calling crypto_aead_xchacha20poly1305_ietf_encrypt multiple times could be considered an option. You may want to calculate a file specific key if you go that way so you can start your nonce at zero.
If you just want confidentiality of the file stored you could consider leaving out the authentication tag. In that case you can manually influence the counter used to create the key stream using int crypto_stream_xchacha20_xor_ic:
This permits direct access to any block without having to compute the previous ones.
Obviously you can still add an authentication tag using HMAC-SHA-2 which is also available in libsodium, but this will be rather slower than using poly1305.
Finally, libsodium is open source. If you're exceedingly brave you could go into the gory details and construct your own context/update/finalize. The algorithm certainly supports it (hint: never buffer the authentication tag or nonce during decryption routines if you go this route - directly decrypt).

Related

Fast and robust checksum algo for a small data file (~10KB) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a data file that needs to be pushed to an embedded device. The typical size for the file ranges from a few bytes to about 10K max. My intention is to detect tampering with the contents of this file(chksum to be last element in the data file). The data range is a mix of strings, signed and unsigned integers. I am looking for a robust algo to avoid a lot of collisions as well something that does not use a lot of cycles to compute. I am considering Fletcher16(), CRC-32 and the solution discussed in this post
Any suggestions for a simple algo for my kind of data size/contents?
Thanks in advance!
EDIT:-
Thanks everyone for the insightful answers and suggestions.
Some background: This is not a hyper secure data file. I just want to be able to detect whether someone wrote it by mistake. The file gets generated by a module and should be just read only by the SW. Recently there has been a few instances where folks have pulled it from the target file system, edited and pushed back to the target hoping that would fix their problems. (Which btw it would if edited carefully). But this defeats the very purpose of auto generatating this file and the existence of this module. I would like to detect and such playful "hacks" and abort gracefully.
My intention is to detect tampering with the contents of this file
If you need to detect intentional tampering with a file, you need some sort of cryptographic signature -- not just a hash.
If you can protect a key within the device, using HMAC as a signature algorithm may be sufficient. However, if the secret is extracted from the device, users will be able to use this to forge signatures.
If you cannot protect a key within the device, you will need to use an asymmetric signature algorithm. Libsodium's crypto_sign APIs provide a nice API for this. Alternatively, if you want to use the underlying algorithms directly, EdDSA is a decent choice.
Either of these options will require a relatively large amount of space (32 to 64 bytes) to be allocated for a signature, and verifying that signature will take significantly more time than a noncryptographic signature. This is largely unavoidable if you need to effectively prevent tampering.
For your purpose, you can use a cryptographic hash such as SHA256. It is very reliable and collisions are abysmally unlikely but you should test if the speed is OK.
There is a sample implementation in this response: https://stackoverflow.com/a/55033209/4593267
To detect intentional tampering with the data, you can add a secret key to the hashed data. The device will need to have a copy of the secret key, so it is not a very secure method as the key could be extracted from the device through reverse engineering or other methods. If the device is well protected against that, for example if it is inside a secure location, a secure chip or in a very remote location such as a satellite in space and you are confident there are no flaws providing remote access, this may be sufficient.
Otherwise an asymmetrical cryptographic system is required, with a private key known only to the legitimate source(s) of those data files, and a public key used by the device to verify the cryptographic hash, as documented in duskwuff's answer.
If you're only concerned about accidental or non-malicious tampering, a CRC should be sufficient.
(I'm using a somewhat circular definition of 'malicious' here: if somebody goes to the trouble of recalculating or manipulating the CRC to get their edits to work, that counts as 'malicious' and we don't defend against it.)

How to add (and use) binary data to compiled executable?

There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.

Modifying data written to disk by Ext4 filesystem

I'm working on the academic project, part of which is applying transparent encryption (AES-CTR) to the selected Ext4 files stored on the disk (I can already mark them as encrypted using new ioctl etc.,).
In order to do so, I need to find the best spot to call my algorithm on the data, while it's read or written from/to the device. Due to large amount of features (like journal, inlines, o-direct, extents) provided by the filesystem, I'm struggling for few days now to find the proper solution - I need to operate on the raw data, as it's stored in the datablocks.
I had few ideas in mind, one was to hook in somewhere on the callpath from sys_read(...) and sys_write(...), more precisely ext4_file_write(...) and generic_file_aio_read(...) - but that wouldn't work with mmap, and probably is not the way to go. Another approach would be to do it through ext4_writepages(...) and ext4_readpages(...) (and it's callback, as it's async), when the memory pages are written down to disk.
Because it's not production version, just a proof of concept - I can switch off some Ext4 features in order to simplify the task. While using the algorithm I need to be able to access the inode's xargs (where the key id is stored), and as well be aware of the block number in order to generate the initial vector used in [en/de]cryption. Do you have any ideas and/or suggestions regarding that issue?
There are many alternatives to design the solution for this.
One way could be to use Wrapfs (a stackable filesystem) which will help you intercept the call from VFS to underlying physical file system. You can choose to add your hook before or after the underlying filesystem call is invoked.
Benefits of doing this way would be.
1. Your code can work with any physical filesystem seamlessly.
2. You need not change/modify original filesystem code.
3. You will have completely different module.
So the call hierarchy would look like,
Application <=> VFS <=> Wrapfs <=> Physical FS (ext3/ext4/etc)
FUSE (Filesystems in userspace) is a good alternative because it is easier to implement in user space than in kernel space. You have a wide set of languages to chose from. This approach will be much easier.

simple AES function (not library) in C?

novice to aes. in reading http://en.wikipedia.org/wiki/AES_implementations, I am a bit surprised. I should need just one function
char16 *aes128(char16 key, char16 *secrets, int len);
where char16 is an 8*16=128bit character type. and, presumably, ignoring memory leaks,
assert( bcmp( anystring, aes128(anykey, aes128(anykey, anystring, len), len )==0 );
I am looking over the description of the algorithm on wikipedia, and although I can see myself making enough coding mistakes to take me a few days to debug my own implementation, it does not seem too complex. maybe 100 lines? I did see versions in C#, such as Using AES encryption in C#. that seem themselves almost as long as the algorithm itself. earlier recommendations on stackoverflow mostly recommend the use of individual functions inside larger libraries, but it would be nice to have a go-to function for this task that one could compile into one's code.
so, is AES implementation too complex to be for the faint of heart? or is it reasonably short and simple?
how many lines does a C implementation take? is there a self-contained aes128() C function already in free form somewhere for the taking?
another question: is each block independently encoded? presumably, it would strengthen the encryption if the first block would create a salt that the second block would then use. otoh, this would mean that disk corruption of one block would make every subsequent block undecryptable.
/iaw
You're not seeing a single function like you expect because there are so many options. For example, the block encoding mechanism you described (CBC) is just one option or mode in AES encryption. See here for more information: http://www.heliontech.com/aes_modes_basic.htm
The general rule of thumb in any language is: Don't reinvent something that's already been done and done well. This is especially true in anything related to cryptography.
well using just the AES function is basically insecure as any block X will always be encoded to block Y with key K which is too much information to give an attacker... (according to cryptographers)
so you use some method to change the block cipher at each block. you can use a nonce or Cipher Block Chaining or some other method. but there is a pretty good example on wikipedia (the penguin picture): http://en.wikipedia.org/wiki/Electronic_code_book#Electronic_codebook_.28ECB.29
so in short you can implement AES in one function that is secure (as a block cipher), but it isn't secure if you have data that is longer than 16 bytes.
also AES is fairly complex because of all the round keys... I wouldn't really want to implement it, especially with all of the many good implementations around, but I guess it wouldn't be so bad if you had a good reason to do it.
so in short, to construct a secure stream cipher from a block cipher you need to adopt some strategy to change the effective key along the stream.
ok, so I found a reasonable standalone implementation:
http://www.literatecode.com/aes256
About 400 lines. I will probably use this one.
hope it helps others, too.

Safely storing AES key

I'm using OpenSSL in a program that decrypts a text file and then re-encrypts it with new text and a new encryption key every time the program starts. I'd like to safely store the key between instances of the program running. Is there an easy/decently safe way of doing this?
If you don't expect hard core attacks on the machine that the application is installed on, you can always hardcode inside your application another encryption key that you would use in order to safely save the previous session AES key in the file system before you close the app and to retrieve it back when you start the app. You could improve a bit the security if:
you don't store the harcoded key into a single string, but instead in several strings that you then concatenate in a function
you save the file in a relatively "unknown"/unpopular location like the Isolated Storage, or Windows\Temp instead of the application folder
you use an asimetric key algorithm (makes cracking harder.. but in this case.. just a little bit)
you put other stuff (bogus) in the file not just the key
If your program is not in a safe area (if its binary code can be inspected to find any key it would contain or any algorithm it would define) there is no simple way:
You could obfuscate your key programmatically and store it in a file, but in that case, breaking your obfuscation algorithm would be sufficient to find the key. So this would reduce the strengh of the encryption to that algorithm, actually. Not a good way to go.
You could also encrypt the key (called A here) itself, using a static key (called B) embedded in your program, but in that case, you would lose the interest of changing the key A every time. This because finding the key B embedded in your program would be sufficient to find any encrypted key A saved to the disk. This would not be satisfactory either.
Considering more complex solutions requires knowing your context a bit more (where can the attack come from, what is the lifecycle of the file, etc). But before going that far... is it needed to go that far? By this I mean: is your program at risk of cracking attempts? And should it be cracked, it that criticial? If not crackable or not critical, the second option above should be sufficient.
If your target host has a TPM chip, you can take advantage of it. OpenSSL can be configured to use TPM, with the help of trousers project

Resources