I'm using OpenSSL in a program that decrypts a text file and then re-encrypts it with new text and a new encryption key every time the program starts. I'd like to safely store the key between instances of the program running. Is there an easy/decently safe way of doing this?
If you don't expect hard core attacks on the machine that the application is installed on, you can always hardcode inside your application another encryption key that you would use in order to safely save the previous session AES key in the file system before you close the app and to retrieve it back when you start the app. You could improve a bit the security if:
you don't store the harcoded key into a single string, but instead in several strings that you then concatenate in a function
you save the file in a relatively "unknown"/unpopular location like the Isolated Storage, or Windows\Temp instead of the application folder
you use an asimetric key algorithm (makes cracking harder.. but in this case.. just a little bit)
you put other stuff (bogus) in the file not just the key
If your program is not in a safe area (if its binary code can be inspected to find any key it would contain or any algorithm it would define) there is no simple way:
You could obfuscate your key programmatically and store it in a file, but in that case, breaking your obfuscation algorithm would be sufficient to find the key. So this would reduce the strengh of the encryption to that algorithm, actually. Not a good way to go.
You could also encrypt the key (called A here) itself, using a static key (called B) embedded in your program, but in that case, you would lose the interest of changing the key A every time. This because finding the key B embedded in your program would be sufficient to find any encrypted key A saved to the disk. This would not be satisfactory either.
Considering more complex solutions requires knowing your context a bit more (where can the attack come from, what is the lifecycle of the file, etc). But before going that far... is it needed to go that far? By this I mean: is your program at risk of cracking attempts? And should it be cracked, it that criticial? If not crackable or not critical, the second option above should be sufficient.
If your target host has a TPM chip, you can take advantage of it. OpenSSL can be configured to use TPM, with the help of trousers project
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a data file that needs to be pushed to an embedded device. The typical size for the file ranges from a few bytes to about 10K max. My intention is to detect tampering with the contents of this file(chksum to be last element in the data file). The data range is a mix of strings, signed and unsigned integers. I am looking for a robust algo to avoid a lot of collisions as well something that does not use a lot of cycles to compute. I am considering Fletcher16(), CRC-32 and the solution discussed in this post
Any suggestions for a simple algo for my kind of data size/contents?
Thanks in advance!
EDIT:-
Thanks everyone for the insightful answers and suggestions.
Some background: This is not a hyper secure data file. I just want to be able to detect whether someone wrote it by mistake. The file gets generated by a module and should be just read only by the SW. Recently there has been a few instances where folks have pulled it from the target file system, edited and pushed back to the target hoping that would fix their problems. (Which btw it would if edited carefully). But this defeats the very purpose of auto generatating this file and the existence of this module. I would like to detect and such playful "hacks" and abort gracefully.
My intention is to detect tampering with the contents of this file
If you need to detect intentional tampering with a file, you need some sort of cryptographic signature -- not just a hash.
If you can protect a key within the device, using HMAC as a signature algorithm may be sufficient. However, if the secret is extracted from the device, users will be able to use this to forge signatures.
If you cannot protect a key within the device, you will need to use an asymmetric signature algorithm. Libsodium's crypto_sign APIs provide a nice API for this. Alternatively, if you want to use the underlying algorithms directly, EdDSA is a decent choice.
Either of these options will require a relatively large amount of space (32 to 64 bytes) to be allocated for a signature, and verifying that signature will take significantly more time than a noncryptographic signature. This is largely unavoidable if you need to effectively prevent tampering.
For your purpose, you can use a cryptographic hash such as SHA256. It is very reliable and collisions are abysmally unlikely but you should test if the speed is OK.
There is a sample implementation in this response: https://stackoverflow.com/a/55033209/4593267
To detect intentional tampering with the data, you can add a secret key to the hashed data. The device will need to have a copy of the secret key, so it is not a very secure method as the key could be extracted from the device through reverse engineering or other methods. If the device is well protected against that, for example if it is inside a secure location, a secure chip or in a very remote location such as a satellite in space and you are confident there are no flaws providing remote access, this may be sufficient.
Otherwise an asymmetrical cryptographic system is required, with a private key known only to the legitimate source(s) of those data files, and a public key used by the device to verify the cryptographic hash, as documented in duskwuff's answer.
If you're only concerned about accidental or non-malicious tampering, a CRC should be sufficient.
(I'm using a somewhat circular definition of 'malicious' here: if somebody goes to the trouble of recalculating or manipulating the CRC to get their edits to work, that counts as 'malicious' and we don't defend against it.)
I was looking through libsodium, and in particular at the symmetric encryption option XChaCha20-Poly1305. What I can't get my head around is that libsodium appears to provide no "context/update/finalise" style of working that you commonly find in crypto libraries.
It is clear from the libsodium that there is "no practical limit" to the size of a XChaCha20-Poly1305 message. However in practical terms, if I'm encrypting a multi-GB file, I'm not quite clear as to how you would use libsodium for that ? Because obviously you would only be passing the contents of the fread buffer to crypto_aead_xchacha20poly1305_ietf_encrypt?
IMPORTANT NOTE TO THOSE WHO THINK THIS IS OFF TOPIC
After bowing to peer pressure, I did delete this post. However I have re-opened it at the request of #MaartenBodewes who felt strongly that it was on-topic, and so strongly that he put in some effort into writing an answer. Therefore out of respect for his effort, I have undeleted the post. Please, spare me more "off-topic" comments, I've read enough of them!
In the introduction of libsodium it reads: "Its goal is to provide all of the core operations needed to build higher-level cryptographic tools."
Libsodium is therefore a relatively high level library that provides limited access to the underlying structures.
That said, there are some inherent difficulties of encrypting such large files using an authenticated cipher. The problem is that you either need to first verify the authenticity and then start to decrypt or you need to decrypt online before verifying the authentication tag. That in turn means that you have to write / destroy the contents if verification fails.
Generally you can get around that by encrypting in e.g. blocks of 16KiB or so and then add an authentication tag for the block. Of course you would need to make sure that you increase the nonce (making sure that the counter of the stream cipher doesn't repeat). This will add some overhead of course, but nothing spectacular - and you'd have some overhead anyway. The disadvantage is that you cannot decrypt in place anymore (as that would leave gaps).
You could also store all the authentication tags at the end if you want to make a really advanced scheme. Or buffer all the authentication tags in memory and calculate a single (HMAC) tag over all the collected tags.
So calling crypto_aead_xchacha20poly1305_ietf_encrypt multiple times could be considered an option. You may want to calculate a file specific key if you go that way so you can start your nonce at zero.
If you just want confidentiality of the file stored you could consider leaving out the authentication tag. In that case you can manually influence the counter used to create the key stream using int crypto_stream_xchacha20_xor_ic:
This permits direct access to any block without having to compute the previous ones.
Obviously you can still add an authentication tag using HMAC-SHA-2 which is also available in libsodium, but this will be rather slower than using poly1305.
Finally, libsodium is open source. If you're exceedingly brave you could go into the gory details and construct your own context/update/finalize. The algorithm certainly supports it (hint: never buffer the authentication tag or nonce during decryption routines if you go this route - directly decrypt).
I'm writing a C application and I wanna know if there is a way to view .gpg file content (then the encrypted content). The .gpg file in question concerns a simple .txt file that I encrypted.
I know a bit GPGME, it's possible with its function? Or other ways...
EDIT: I thought one thing: if my application use "--armor" option, I've a .gpg file in ASCII mode and not binary...so the .gpg file can be read simply, true? The easiest way...
libgcrypt
This is a general purpose
cryptographic library based on the
code from GnuPG. It provides functions
for all cryptograhic building blocks:
symmetric ciphers (AES, DES, Blowfish,
CAST5, Twofish, Arcfour), hash
algorithms (MD4, MD5, RIPE-MD160,
SHA-1, TIGER-192), MACs (HMAC for all
hash algorithms), public key
algorithms (RSA, ElGamal, DSA), large
integer functions, random numbers and
a lot of supporting functions.
You can use GnuPG Made Easy library, here is a mini howto on using it.
Are there algorithms for putting a digest into the file being digested?
In otherwords, are there algorithms or libraries, or is it even possible to have a hash/digest of a file contained in the file being hashed/digested. This would be handy for obvious reasons, such as built in digests of ISOs. I've tried googling things like "MD5 injection" and "digest in a file of a file." No luck (probably for good reason.)
Not sure if it is even mathematically possible. Seems you'd be able to roll through the file but then you'd have to brute the last bit (assuming the digest was the last thing in the file or object.)
Thanks,
Chenz
It is possible in a limited sense:
Non-cryptographically-secure hashes
You can do this with insecure hashes like the CRC family of checksums.
Maclean's gzip quine
Caspian Maclean created a gzip quine, which decompresses to itself. Since the Gzip format includes a CRC-32 checksum (see the spec here) of the uncompressed data, and the uncompressed data equals the file itself, this file contains its own hash. So it's possible, but Maclean doesn't specify the algorithm he used to generate it:
It's quite simple in theory, but the helper programs I used were on a hard disk that failed, and I haven't set up a new working linux system to run them on yet. Solving the checksum by hand in particular would be very tedious.
Cox's gzip, tar.gz, and ZIP quines
Russ Cox created 3 more quines in Gzip, tar.gz, and ZIP formats, and wrote up in detail how he created them in an excellent article. The article covers how he embedded the checksum: brute force—
The second obstacle is that zip archives (and gzip files) record a CRC32 checksum of the uncompressed data. Since the uncompressed data is the zip archive, the data being checksummed includes the checksum itself. So we need to find a value x such that writing x into the checksum field causes the file to checksum to x. Recursion strikes back.
The CRC32 checksum computation interprets the entire file as a big number and computes the remainder when you divide that number by a specific constant using a specific kind of division. We could go through the effort of setting up the appropriate equations and solving for x. But frankly, we've already solved one nasty recursive puzzle today, and enough is enough. There are only four billion possibilities for x: we can write a program to try each in turn, until it finds one that works.
He also provides the code that generated the files.
(See also Zip-file that contains nothing but itself?)
Cryptographically-secure digests
With a cryptographically-secure hash function, this shouldn't be possible without either breaking the hash function (particularly, a secure digest should make it "infeasible to generate a message that has a given hash"), or applying brute force.
But these hashes are much longer than 32 bits, precisely in order to deter that sort of attack. So you can write a brute-force algorithm to do this, but unless you're extremely lucky you shouldn't expect it to finish before the universe ends.
MD5 is broken, so it might be easier
The MD5 algorithm is seriously broken, and a chosen-prefix collision attack is already practical (as used in the Flame malware's forged certificate; see http://www.cwi.nl/news/2012/cwi-cryptanalist-discovers-new-cryptographic-attack-variant-in-flame-spy-malware, http://arstechnica.com/security/2012/06/flame-crypto-breakthrough/). I don't know of what you want having actually been done, but there's a good chance it's possible. It's probably an open research question.
For example, this could be done using a chosen-prefix preimage attack, choosing the prefix equal to the desired hash, so that the hash would be embedded in the file. A
preimage attack is more difficult than collision attacks, but there has been some progress towards it. See Does any published research indicate that preimage attacks on MD5 are imminent?.
It might also be possible to find a fixed point for MD5; inserting a digest is essentially the same problem. For discussion, see md5sum a file that contain the sum itself?.
Related questions:
Is there any x for which SHA1(x) equals x?
Is a hash result ever the same as the source value?
The only way to do this is if you define your file format so the hash only applies to the part of the file that doesn't contain the hash.
However, including the hash inside a file (like built into an ISO) defeats the whole security benefit of the hash. You need to get the hash from a different channel and compare it with your file.
No, because that would mean that the hash would have to be a hash of itself, which is not possible.
Update2:
Thanks for the input. I have implemented the algorithm and it is available for download at SourceForge. It is my first open source project so be merciful.
Update:
I am not sure I was clear enough or everyone responding to this understands how shells consume #! type of input. A great book to look at is Advanced Unix Programming. It is sufficient to call popen and feed its standard input as demonstrated here.
Original Question:
Our scripts run in highly distributed environment with many users. Using permissions to hide them is problematic for many reasons.
Since the first line can be used to designate the "interpreter" for a script the initial line can be used to define a a decrypter
#!/bin/decryptandrun
*(&(*S&DF(*SD(F*SDJKFHSKJDFHLKJHASDJHALSKJD
SDASDJKAHSDUAS(DA(S*D&(ASDAKLSDHASD*(&A*SD&AS
ASD(*A&SD(*&AS(D*&AS(*D&A(SD&*(A*S&D(A*&DS
Given that I can write the script to encrypt and place the appropriate header I want to decrypt the script (which itself may have an interpreter line such as #!/bin/perl at the top of it) without doing anything dumb like writing it out to a temporary file. I have found some silly commercial products to do this. I think this could be accomplished in a matter of hours. Is there a well known method to do this with pipes rather than coding the system calls? I was thinking of using execvp but is it better to replace the current process or to create a child process?
If your users can execute the decryptandrun program, then they can read it (and any files it needs to read such as decryption keys). So they can just extract the code to decrypt the scripts themselves.
You could work around this by making the decrtyptandrun suid. But then any bug in it could lead to the user getting root privileges (or at least privileges to the account that holds the decryption keys). So that's probably not a good idea. And of course, if you've gone to all the trouble of hiding the contents or keys of these decryption scripts by making them not readable to the user... then why can't you do the same with the contents of the scripts you're trying to hide?
Also, you can't have a #! interpreted executable as an interpreter for another #! interpreted executable.
And one of the fundamental rules of cryptography is, don't invent your own encryption algorithm (or tools) unless you're an experienced cryptanalyst.
Which leads me to wonder why you feel the need to encrypt scripts that your users will be running. Is there anything wrong with them seeing the contents of the scripts?
Brian Campbell's answer has the right idea, I'll spell it out:
You need to make your script unreadable but executable by the user (jbloggs), and to make decodeandrun setuid. You could make it setuid root, but it would be much safer to make it setgid for some group decodegroup instead, and then set the script file's group to decodegroup. You need to make sure that decodegroup has both read and execute permissions on the script file and that jbloggs is not a member of this group.
Note that decodegroup needs read permission for decodeandrun to be able to read the text of the script file.
With this setup, it is then possible (on Linux at least) for jbloggs to execute the script but not to look at it. But observe that this makes the decryption process itself unnecessary -- the script file might as well be plaintext, since jbloggs can't read it.
[UPDATE: Just realised that this strategy doesn't handle the case where the encrypted contents is itself a script that starts with #!. Oh well.]
You're solving the wrong problem. The problem is that you have data which you don't want your users to access, and that data's stored in a location to which the users have access. Start by attempting to fix the problem of users with more access than they require...
If you can't protect the whole script, you may want to look into just protecting the data. Move it to a separate location and encrypt it. Encrypt the data with a key only accessible by a specific ID (preferably not root), and write a small suid program to access the data. In your setuid program, do your validation of who should be running the program, and compare the name / checksum of the calling program (you can inspect the command line for the process in combination with the calling process's cwd to find the path, use lsof or the /proc filesystem) with the expected value before decrypting.
If it takes more than that, you really need to reevaluate the state of users on the system - they either have too much access or you have too little trust. :)
All of the exec()-family functions you link to accept a filename, not a memory address. I'm not sure at all how you would go about doing what you want, i.e. "hooking" in a decryption routine and then re-directing to the decrypted script's #! interpreter.
This would require you to decrypt the script into a temporary file, and pass that filename to the exec() call, but you (very reasonably) said you didn't want to expose the script by putting it in a temporary file.
If it were possible to tell the kernel to replace a new process with an existing one in memory, you would have a path to follow, but as far as I know, it isn't. So I don't think it will be very easy to do this "chained" #! following.