Computing the key of a symmetric encryption given the plaintext and ciphertext

Computing the key of a symmetric encryption given the plaintext and ciphertext - c

As part of an assignment I need to make an algorithm that takes 2 files as input, one containing a plaintext and one containing a ciphertext. Considering the encryption model is hardcoded/known, and is a symmetric encryption, is there a way to use openSSL to compute the key used to encrypt the provided plaintext into the provided ciphertext?
For convenience i used 5 paragraphs of Lorem Ipsum as a plaintext, and blowfish as the cipher.
The openSSL documentation and Google have proved less than useful.
Thank you!

No, the ability to do that would pretty much defeat the entire purpose of cryptography. There might be tools that can do that sort of thing with trivial systems (Caesar cipher for example) but if keys could be computed in reasonable times for current cryptosystems they would be broken.

What you are looking at is a "Known Plaintext Attack": if the attacker knows both the ciphertext and the plaintext, can the key be found?
All good modern ciphers, including Blowfish, are designed to resist this attack. Hence, as has been said, the answer to your question is, "No, you can't find the key."

No you can't.
Not for the blowfish algorithm.
The reason for that is however not that any encryption scheme would be broken if it were possible to derive the key from a pair of plain text and cipher, even if it is easy to do so.
The rest of this answer is to explain that.
There is at least one encryption scheme which is secure in spite of allowing to derive the key. It is the one-time-pad encryption scheme, which happens to be the only known truly secure encryption scheme, for being proveably unbreakable.
The point is that deriving the key of one message only breaks an encryption scheme, if the knowing the key of one message allows decryption of all future messages. This in turn is only applicable, if the same key is reused.
The specialty of the one-time-pad encryption is
a) each key is used for only a single message and never again
(this is why it is called "pad", referring to a notepad with many keys, from which the sheet with a used key is easily taken away and destroyed)
b) the key is as long as the message
(otherwise deriving the key for a part of the cipher with a partial known plain text would allow decrypting the rest of the message)
With those attributes, encrypting even with the humble XOR is unbreakable, each bit in the message corresponding to its own dedicated bit in the key. This is also as fast as de-/encryption gets and never increases the message length.
There is of course a huge disadvantage to the one-time-pad encryption, namely key logistics. Using this encryption is hardly ever applicable, because of the need to provide the receiver of a message with many large keys (or better a very long key which can be used partially for any size of message) and to do so beforehand.
This is the reason for the one-time-pad encryption not being used in spite of the fact that it is safer and faster than all used others and at least as size-efficient.
Other encryption schemes are considered practically secure, otherwise they would of course not be used.
It is however necessary to increase the key sizes in parallel with any noticable progress of crypto-analysis. There is no mathmatical proof that any other algorithm is underivable (meaning it is impossible to derive the key from a plain-cipher-pair). No math expert accepts "I cannot think of any way to do that." proof for something being impossible. On top of that, new technologies could reduce the time for key derivation, or for finding plain text without key, to a fraction, spelling sudden doom to commonly used keylengths.
The symmetry or asymmetry of the algorithm is irrelevant by the way. Both kinds can be derivable or not.
Only the keysize in relation to message length is important. Even with the one-time-pad encryption, a short key (message length being a multiple of key length)
has to be used more than once. If the first part of a cipherhas a known plain text and allows to derive the key, reusing it allows to find the unknown plain for the rest of the message.
This is also true for block cipher schemes, which change the key for each block, but still allow finding the new key with the knowledge of the previous key, if it is the same. Hybrid schemes which use one (possibly asymmetric) main key to create multiple (usually symmetric) block keys which cannot be derived from each other are, for the sake of this answer, considered derivable if the main key can be derived. There is of course no widely used algorithm for which this is true.
For any scheme, the risk of being derivable increases with the ration of the number of bits in key to the number of bits in the message. The more pairs of cipher bits and plain bits relate to each key bit, the more information is available for analysis. For a one to one relation, restricting the information of one plain-cipher pair to that single pair is possible.
Because of this any derivable encryption requires a key length equal to message length.
In reverse, this means that only non-derivable encryptions can have short keys. And having short keys is of course an advantage, especially if key length implies processing duration. Most encryption schemes take longer with longer keys. The one-time-pad however is equally fast for any key length.
So any algorithm with easy key logistics (no need to agree on huge amounts of keybits beforehand) will be non-derivable. Also any algorithm with acceptable speed will be non-derivable.
Both is true for any widely used algorithm, including blowfish.
It is however not true for all algorithms, especially not for the only truly safe one, the one-time-pad encryption (XOR).
So the answer to your specific question is indeed:
You can't with blowfish and most algorithms you probably think of. But ...

Related

Encrypt hash map keys while keeping constant lookup speed

I would like to encrypt the keys and values of a hash map with AES256 CBC, individually.
The challenge is to encrypt the keys while maintaining the constant lookup speed and security (mainly against dictionary attacks).
I read about blind indices, but these need some randomness at creation (salt, nonce) and it is impossible for the lookup function to recreate the nonce when searching. At lookup we would need to know where to fetch the nonce from for a particular key, which in the end would mean to be vulnerable elsewhere.
So far, I can only think of two options.
First one would be to just not encrypt keys, although I would prefer to do it.
The second one would be to obtain the blind indices by applying a transformation like
blind_index(key) = encrypt(digest(key))
but the problem here is that you need a unique initialisation vector for each key encryption, which brings us again to the problem described above: having a table of IVs used, in order for the lookup function to be able to reconstruct the blind index when searching, which is moving the same problem elsewhere.
For the second approach, my thought was: since I always encrypt unique values (keys are unique and even if they are substrings of one another, e.g. 'awesome' and 'awesome_key', they are hashed before encryption, so they look quite different in their 'hashed & unencrypted' form) I could use a global IV for all encryptions, which can be easily accessible to the lookup function. Since the lookup function requires the encryption key, only the owner will be able to compute the blind index correctly and in the map itself there will be no visible similarities between keys that are similar in plaintext.
The big problem I have with the second approach is that it violates the idea of never using IVs for more than one encryption. I could obfuscate the IV 'to make it more secure,' but that's again a bad idea since IVs are supposed to be plaintext.
More details about the circumstances:
app for mobile
map will be dumped to a file
map will be accessible for lookup through a REST API
Maybe I should use a different algorithm (e.g. EBC)?
Thanks in advance!

This is completely in the realm of Format Preserving Encryption (FPE). However, applying it is hard and libraries that handle it well are not all that common. FPE takes a an amount of bits or even a range and then returns an encrypted value of the same size or in the same range. This ciphertext is pseudo-random in the given domain as long as the input values are unique (which, for keys in a hash table, they are by definition).
If you may expand your ciphertext compared to the plaintext then you could also look at SIV modes (AES-SIV or AES-GCM_SIV, which are much easier to handle. These return a byte array, which could turn into a String, e.g. by using base64 encoding. Otherwise you could wrap the byte array and provide your own equals and hashCode method. Note that these expand your plaintext relatively significantly; these are authenticated modes. Advantage: the IV gets calculated from the input and any change in the input will randomize the ciphertext again.
Finally, you could of course simply use an IV or nonce to produce your ciphertext and prefix it to the value. However, beware that reencryption of changed values using the same IV would be rather dangerous, as you may leak information through repetition. In some modes this could entirely break the confidentiality offered. So you would have to prevent reuse of the IV.
The use of ECB is certainly not recommended for strings. A single block encrypt would work of course if the input is (or can be expanded to) a single block.

SQL Server hash algorithms

If my input length is less than the hash output length, are there any hashing algorithms that can guarantee no collisions.
I know by nature that a one way hash can have collisions across multiple inputs due to the lossy nature of the hashing, especially when considering input size is often greater than output size, but does that still apply with smaller input sizes?

Use a symmetric block cipher with a randomly chosen static key. Encryption can never produce a duplicate because that would prevent unambiguous decryption.
This scheme will force a certain output length which is a multiple of the cipher block size. If you can make use a variable-length output you can use a stream cipher as well.

Your question sounds like you're looking for a perfect hash function. The problem with perfect hash functions is they tend to be tailored towards a specific set of data.
The following assumes you're not trying to hide, secure or encrypt the data...
To think of it another way, the easiest way to "generate" a perfect hash function that will accept your inputs is to map the data you want to store to a table and associate those inputs with a surrogate primary key. You then create a unique constraint for the column (or columns) to ensure the input you're mapping only maps to a single surrogate value.
The surrogate key could be int, bigint or a guid. It all depends on how many rows you're looking to store.

If your input lengths are known to be small, such as 32 bits, then you could actually enumerate through all possible inputs and check the resulting hashes for collisions. That's only going to be 4294967296 possible inputs, and shouldn't take to terribly long to enumerate all of them. Essentially you'd be building a rainbow table to test for collisions.
If there is some security relying on this though, one of the issues is if an attacker knows your input lengths are constrained, it makes it easy for them to also perform the same enumeration to create a map/table that will map hashes back to the original values. "attacker" is a pretty terrible term here though because I have no context of how you are using these hashes and whether you are concerned about being able to reverse them.

How to convert a passphrase into 128bit/256bit WEP key?

I have a passphrase and I want to generate 128bit or 256bit WEP key from that. Any pointers or links will be helpful on how to generate WEP key from a plain text.

Hopefully the original poster has found the answer by now but I for one found this information SHOCKINGLY difficult to come by as it's beyond deprecated. As mentioned repeatedly, in this thread and others, WEP is horribly unsecure but for some reason nobody is willing to give a straight answer otherwise and there are a lot of suggestions to go learn something else.
To the original question, the 128-bit key is an MD5 hash of a 64-byte string. This 64-byte string is the ASCII pass phrase repeated over and over then truncated at 64-bytes. In SQL Server for instance, this would appear as,
SELECT CAST(HASHBYTES('MD5', LEFT(REPLICATE(#phrase, CEILING(64.0 / LEN(#phrase))), 64)) AS varbinary(13))

Start by reading about Key Derivation Functions. High quality modern KDFs are scrypt, bcrypt and PBKDF2. All of them have open source implementations.
For PBKDF2, you can specify the length of the derived key. For scrypt, you can pick the first N bits of the output and use them as the key.
The most straightforward way of doing this without using a KDF is to concatenate your passphrase with a 24 bit IV (initialization vector) and form an RC4 key.
Mind that WEP combines your key and IV to seed an RC4 stream which keys the data stream; for this reason, WEP has a number of shortcomings, which make it unable to provide adequate data confidentiality. You can read more about it by following the Wikipedia page links.
Do NOT use a cryptographic hash as a derived key.

Generate an initialization vector without a good source of randomness

For a password storing plugin (written in C) for Rockbox I need to generate initialization vectors.
The problem is that I don't have a good source of randomness. The Rockbox-supplied random() is not a cryptographic RNG. And I've got hardly any sources of randomness I can access (no mouse movements, ... on an IPod running Rockbox).
The key is currently derived via PBKDF2 from a user-supplied password and a salt (which is a constant prefix + some data from random()). I think the pseudo-random data should be good enough for a salt with 10000 iterations of PBKDF2.
However, where do I take my initialization vector from? Is it OK if I take some semi-random data (time + random()) and SHA that, say, 10000 times? Should I take arc4random with a seed taken from random()?
Do I even need an IV if I effectively never use the same key twice (the salt is recomputed every time the stored data is changed)? What's the best way to deal with situations like these?
Edit:
Just a single user (me, owning the IPod), encryption algorithm: AES-CBC 256 bit.
The file just stores a site/account/password list for various websites. It is rarely modified (whenever I create a new account on a website), when that happens a new salt and a new IV is generated.

Generally speaking, with CBC, the IV MUST be random and uniform. "Non-repeating" is not sufficient. To be more precise, the whole point of CBC is to avoid the situation where the same data block is fed twice to the underlying block cipher. Hence, the condition is that if you encrypt two messages with the same key, then the difference of the two IV must be uniformly random. With a 128-bit block cipher such as the AES, the probability that the same block is obtained twice is sufficiently low as to be neglected -- as long as the IV is randomly chosen with uniform probability over the whole space of 128-bit values. Any structure in the IV selection (such as reusing the same IV, using a counter, or a low-quality random generator) increases that probability, because you are encrypting data which has itself a lot of structure.
There is a bright side to that: if you never use the same key twice, then you can tolerate a fixed IV. But that is a strong "never".
"Non-repeating IV" is not a good enough property with CBC. However, there are some encryption modes which can use non-repeating IV. In particular, have a look at EAX and GCM. The trick here is that those mode use the provided IV in a custom PRNG which uses the encryption key; this transform the non-repeating IV (e.g. a counter, or a "random value" of low quality) into something which, from a cryptographic point of view, looks random enough. Do not try to build your own PRNG ! These things are subtle and there is no sure way to test the quality of the result.

The IV does not need to be random, it just needs to be unique for a given pair of key and data (assuming we are talking about an IV for CBC).
So random() should be okay for this purpose.

GREAT NEWS! The initialization vector doesn't need to be random, it just needs to be different for every encryption. So you can use the user's name as the salt. If you use both the user's name and the time then an attacker won't be able to detect password reuse.

How do you implement truncated sha1 database keys?

I'm working on a multi-tenant application that will be implementing service APIs. I don't want to expose the default auto increment key for security reasons and data migration/replication concerns so I'm looking at alternative keys. GUID/UUID is an obvious choice but they make the URL a bit long and while reading an article about them I saw that Google uses "truncated SHA1" for their URL IDs.
How does this work? It's my understanding that you hash part/all of the object contents to come up with the key. My objects can change over time so hashing the whole object wouldn't work since the key will need to remain the same over time. Could I implement UUIDs and hash those? What limitations/issues are there in using SHA1 for keys (e.g. max records, collision, etc.)?
I've been searching Google but haven't come up with the right search query.
/* edit: more information about environment */
Currently we are a Java shop using Spring/Hibernate with MySQL in back. We are in process to switch core development to Grails which is where this idea will be implemented.

I thought about a similar problem some time ago and ended up implementing Blowfish in the URL. It's not super safe but gives much shorter URLs than for instance SHA256 and also it's completely collision free.

That's actually a pretty solid idea, though it might make key lookups a little tough (unless you hashed the key and kept it inline in the table, I suppose). You'd just have to hash every key you use, though if you're auto-incrementing, that's no problem. You wouldn't even need a GUID - you could even just hash the key, since it's a one-way operation and can't be easily reversed. You could even "salt" your key before you hash it, which would make it virtually unbreakable by making the key unpredictable.
There is a concern about collision, but with SHA1, your hash is 160 bits, or has 1.46 × 10^48 unique values, which should be enough to support some fraction of that many unique keys without worrying about a collision. If you've got enough keys that you're still worried about a collision, you can upgrade to something like SHA256 or even SHA512, which should be plenty long as to avoid any reasonable concern about a collision.
If you need some hashing code, post the language you're using and I can find some, though there's plenty available online if you know what you're looking for.