Encrypt hash map keys while keeping constant lookup speed - file

I would like to encrypt the keys and values of a hash map with AES256 CBC, individually.
The challenge is to encrypt the keys while maintaining the constant lookup speed and security (mainly against dictionary attacks).
I read about blind indices, but these need some randomness at creation (salt, nonce) and it is impossible for the lookup function to recreate the nonce when searching. At lookup we would need to know where to fetch the nonce from for a particular key, which in the end would mean to be vulnerable elsewhere.
So far, I can only think of two options.
First one would be to just not encrypt keys, although I would prefer to do it.
The second one would be to obtain the blind indices by applying a transformation like
blind_index(key) = encrypt(digest(key))
but the problem here is that you need a unique initialisation vector for each key encryption, which brings us again to the problem described above: having a table of IVs used, in order for the lookup function to be able to reconstruct the blind index when searching, which is moving the same problem elsewhere.
For the second approach, my thought was: since I always encrypt unique values (keys are unique and even if they are substrings of one another, e.g. 'awesome' and 'awesome_key', they are hashed before encryption, so they look quite different in their 'hashed & unencrypted' form) I could use a global IV for all encryptions, which can be easily accessible to the lookup function. Since the lookup function requires the encryption key, only the owner will be able to compute the blind index correctly and in the map itself there will be no visible similarities between keys that are similar in plaintext.
The big problem I have with the second approach is that it violates the idea of never using IVs for more than one encryption. I could obfuscate the IV 'to make it more secure,' but that's again a bad idea since IVs are supposed to be plaintext.
More details about the circumstances:
app for mobile
map will be dumped to a file
map will be accessible for lookup through a REST API
Maybe I should use a different algorithm (e.g. EBC)?
Thanks in advance!

This is completely in the realm of Format Preserving Encryption (FPE). However, applying it is hard and libraries that handle it well are not all that common. FPE takes a an amount of bits or even a range and then returns an encrypted value of the same size or in the same range. This ciphertext is pseudo-random in the given domain as long as the input values are unique (which, for keys in a hash table, they are by definition).
If you may expand your ciphertext compared to the plaintext then you could also look at SIV modes (AES-SIV or AES-GCM_SIV, which are much easier to handle. These return a byte array, which could turn into a String, e.g. by using base64 encoding. Otherwise you could wrap the byte array and provide your own equals and hashCode method. Note that these expand your plaintext relatively significantly; these are authenticated modes. Advantage: the IV gets calculated from the input and any change in the input will randomize the ciphertext again.
Finally, you could of course simply use an IV or nonce to produce your ciphertext and prefix it to the value. However, beware that reencryption of changed values using the same IV would be rather dangerous, as you may leak information through repetition. In some modes this could entirely break the confidentiality offered. So you would have to prevent reuse of the IV.
The use of ECB is certainly not recommended for strings. A single block encrypt would work of course if the input is (or can be expanded to) a single block.

Related

Computing the key of a symmetric encryption given the plaintext and ciphertext

As part of an assignment I need to make an algorithm that takes 2 files as input, one containing a plaintext and one containing a ciphertext. Considering the encryption model is hardcoded/known, and is a symmetric encryption, is there a way to use openSSL to compute the key used to encrypt the provided plaintext into the provided ciphertext?
For convenience i used 5 paragraphs of Lorem Ipsum as a plaintext, and blowfish as the cipher.
The openSSL documentation and Google have proved less than useful.
Thank you!
No, the ability to do that would pretty much defeat the entire purpose of cryptography. There might be tools that can do that sort of thing with trivial systems (Caesar cipher for example) but if keys could be computed in reasonable times for current cryptosystems they would be broken.
What you are looking at is a "Known Plaintext Attack": if the attacker knows both the ciphertext and the plaintext, can the key be found?
All good modern ciphers, including Blowfish, are designed to resist this attack. Hence, as has been said, the answer to your question is, "No, you can't find the key."
No you can't.
Not for the blowfish algorithm.
The reason for that is however not that any encryption scheme would be broken if it were possible to derive the key from a pair of plain text and cipher, even if it is easy to do so.
The rest of this answer is to explain that.
There is at least one encryption scheme which is secure in spite of allowing to derive the key. It is the one-time-pad encryption scheme, which happens to be the only known truly secure encryption scheme, for being proveably unbreakable.
The point is that deriving the key of one message only breaks an encryption scheme, if the knowing the key of one message allows decryption of all future messages. This in turn is only applicable, if the same key is reused.
The specialty of the one-time-pad encryption is
a) each key is used for only a single message and never again
(this is why it is called "pad", referring to a notepad with many keys, from which the sheet with a used key is easily taken away and destroyed)
b) the key is as long as the message
(otherwise deriving the key for a part of the cipher with a partial known plain text would allow decrypting the rest of the message)
With those attributes, encrypting even with the humble XOR is unbreakable, each bit in the message corresponding to its own dedicated bit in the key. This is also as fast as de-/encryption gets and never increases the message length.
There is of course a huge disadvantage to the one-time-pad encryption, namely key logistics. Using this encryption is hardly ever applicable, because of the need to provide the receiver of a message with many large keys (or better a very long key which can be used partially for any size of message) and to do so beforehand.
This is the reason for the one-time-pad encryption not being used in spite of the fact that it is safer and faster than all used others and at least as size-efficient.
Other encryption schemes are considered practically secure, otherwise they would of course not be used.
It is however necessary to increase the key sizes in parallel with any noticable progress of crypto-analysis. There is no mathmatical proof that any other algorithm is underivable (meaning it is impossible to derive the key from a plain-cipher-pair). No math expert accepts "I cannot think of any way to do that." proof for something being impossible. On top of that, new technologies could reduce the time for key derivation, or for finding plain text without key, to a fraction, spelling sudden doom to commonly used keylengths.
The symmetry or asymmetry of the algorithm is irrelevant by the way. Both kinds can be derivable or not.
Only the keysize in relation to message length is important. Even with the one-time-pad encryption, a short key (message length being a multiple of key length)
has to be used more than once. If the first part of a cipherhas a known plain text and allows to derive the key, reusing it allows to find the unknown plain for the rest of the message.
This is also true for block cipher schemes, which change the key for each block, but still allow finding the new key with the knowledge of the previous key, if it is the same. Hybrid schemes which use one (possibly asymmetric) main key to create multiple (usually symmetric) block keys which cannot be derived from each other are, for the sake of this answer, considered derivable if the main key can be derived. There is of course no widely used algorithm for which this is true.
For any scheme, the risk of being derivable increases with the ration of the number of bits in key to the number of bits in the message. The more pairs of cipher bits and plain bits relate to each key bit, the more information is available for analysis. For a one to one relation, restricting the information of one plain-cipher pair to that single pair is possible.
Because of this any derivable encryption requires a key length equal to message length.
In reverse, this means that only non-derivable encryptions can have short keys. And having short keys is of course an advantage, especially if key length implies processing duration. Most encryption schemes take longer with longer keys. The one-time-pad however is equally fast for any key length.
So any algorithm with easy key logistics (no need to agree on huge amounts of keybits beforehand) will be non-derivable. Also any algorithm with acceptable speed will be non-derivable.
Both is true for any widely used algorithm, including blowfish.
It is however not true for all algorithms, especially not for the only truly safe one, the one-time-pad encryption (XOR).
So the answer to your specific question is indeed:
You can't with blowfish and most algorithms you probably think of. But ...

SQL Server hash algorithms

If my input length is less than the hash output length, are there any hashing algorithms that can guarantee no collisions.
I know by nature that a one way hash can have collisions across multiple inputs due to the lossy nature of the hashing, especially when considering input size is often greater than output size, but does that still apply with smaller input sizes?
Use a symmetric block cipher with a randomly chosen static key. Encryption can never produce a duplicate because that would prevent unambiguous decryption.
This scheme will force a certain output length which is a multiple of the cipher block size. If you can make use a variable-length output you can use a stream cipher as well.
Your question sounds like you're looking for a perfect hash function. The problem with perfect hash functions is they tend to be tailored towards a specific set of data.
The following assumes you're not trying to hide, secure or encrypt the data...
To think of it another way, the easiest way to "generate" a perfect hash function that will accept your inputs is to map the data you want to store to a table and associate those inputs with a surrogate primary key. You then create a unique constraint for the column (or columns) to ensure the input you're mapping only maps to a single surrogate value.
The surrogate key could be int, bigint or a guid. It all depends on how many rows you're looking to store.
If your input lengths are known to be small, such as 32 bits, then you could actually enumerate through all possible inputs and check the resulting hashes for collisions. That's only going to be 4294967296 possible inputs, and shouldn't take to terribly long to enumerate all of them. Essentially you'd be building a rainbow table to test for collisions.
If there is some security relying on this though, one of the issues is if an attacker knows your input lengths are constrained, it makes it easy for them to also perform the same enumeration to create a map/table that will map hashes back to the original values. "attacker" is a pretty terrible term here though because I have no context of how you are using these hashes and whether you are concerned about being able to reverse them.

Does knowledge about value length compromise hash integrity

If I am storing a hashed value in a database, but the length of the original value being hashed is fixed (eg. always 4 characters), does this compromise the one-way nature of the hashing function?
More precisely, I have sensitive strings which I then encrypt and store in a database. In order to search for these strings, I don't want to decrypt every entry in the database, so I also store the hash of the first 4 characters of the string in another column. When I want to search the database I generate the hash of the first 4 characters of the search term and compare it to the stored hashes to find which entries match or could match and then decrypt those entries to check for collisions and get the rest of the data related to that entry.
My worry is that since an attacker would know that the length of the strings being hashed is constant (4 characters), he/she would only need to generate a table of all possible 4 letter strings and their hashes and look-up the hashed values stored in my database (thereby giving away the first 4 characters of the original sensitive string).
You're pretty much right in your conclusion. If an attacker knows that your hash is of a 4 character string, it's pretty trivial to find the plain text via brute force. In addition to giving the attacker knowledge of first 4 characters of your sensitive data, it could also allow them to gain knowledge of part of the key you are using to encrypt your data (on a simple level encryption is key XOR plaintext, that means plaintext XOR encrypted = key). While it would be challenging to use that information to break the rest of the encryption, cryptographic attacks have been build on less.
Depending on the type of search you want to perform there are a couple options you may use to improve upon your scheme:
Search for full strings: You could encrypt the search term and query on that. Or store the hash of the encrypted string, and query on that if your strings are very long.
Search for partial strings: Alter your scheme by using a keyed hash instead of a simple hash.

How to convert a passphrase into 128bit/256bit WEP key?

I have a passphrase and I want to generate 128bit or 256bit WEP key from that. Any pointers or links will be helpful on how to generate WEP key from a plain text.
Hopefully the original poster has found the answer by now but I for one found this information SHOCKINGLY difficult to come by as it's beyond deprecated. As mentioned repeatedly, in this thread and others, WEP is horribly unsecure but for some reason nobody is willing to give a straight answer otherwise and there are a lot of suggestions to go learn something else.
To the original question, the 128-bit key is an MD5 hash of a 64-byte string. This 64-byte string is the ASCII pass phrase repeated over and over then truncated at 64-bytes. In SQL Server for instance, this would appear as,
SELECT CAST(HASHBYTES('MD5', LEFT(REPLICATE(#phrase, CEILING(64.0 / LEN(#phrase))), 64)) AS varbinary(13))
Start by reading about Key Derivation Functions. High quality modern KDFs are scrypt, bcrypt and PBKDF2. All of them have open source implementations.
For PBKDF2, you can specify the length of the derived key. For scrypt, you can pick the first N bits of the output and use them as the key.
The most straightforward way of doing this without using a KDF is to concatenate your passphrase with a 24 bit IV (initialization vector) and form an RC4 key.
Mind that WEP combines your key and IV to seed an RC4 stream which keys the data stream; for this reason, WEP has a number of shortcomings, which make it unable to provide adequate data confidentiality. You can read more about it by following the Wikipedia page links.
Do NOT use a cryptographic hash as a derived key.

Generate an initialization vector without a good source of randomness

For a password storing plugin (written in C) for Rockbox I need to generate initialization vectors.
The problem is that I don't have a good source of randomness. The Rockbox-supplied random() is not a cryptographic RNG. And I've got hardly any sources of randomness I can access (no mouse movements, ... on an IPod running Rockbox).
The key is currently derived via PBKDF2 from a user-supplied password and a salt (which is a constant prefix + some data from random()). I think the pseudo-random data should be good enough for a salt with 10000 iterations of PBKDF2.
However, where do I take my initialization vector from? Is it OK if I take some semi-random data (time + random()) and SHA that, say, 10000 times? Should I take arc4random with a seed taken from random()?
Do I even need an IV if I effectively never use the same key twice (the salt is recomputed every time the stored data is changed)? What's the best way to deal with situations like these?
Edit:
Just a single user (me, owning the IPod), encryption algorithm: AES-CBC 256 bit.
The file just stores a site/account/password list for various websites. It is rarely modified (whenever I create a new account on a website), when that happens a new salt and a new IV is generated.
Generally speaking, with CBC, the IV MUST be random and uniform. "Non-repeating" is not sufficient. To be more precise, the whole point of CBC is to avoid the situation where the same data block is fed twice to the underlying block cipher. Hence, the condition is that if you encrypt two messages with the same key, then the difference of the two IV must be uniformly random. With a 128-bit block cipher such as the AES, the probability that the same block is obtained twice is sufficiently low as to be neglected -- as long as the IV is randomly chosen with uniform probability over the whole space of 128-bit values. Any structure in the IV selection (such as reusing the same IV, using a counter, or a low-quality random generator) increases that probability, because you are encrypting data which has itself a lot of structure.
There is a bright side to that: if you never use the same key twice, then you can tolerate a fixed IV. But that is a strong "never".
"Non-repeating IV" is not a good enough property with CBC. However, there are some encryption modes which can use non-repeating IV. In particular, have a look at EAX and GCM. The trick here is that those mode use the provided IV in a custom PRNG which uses the encryption key; this transform the non-repeating IV (e.g. a counter, or a "random value" of low quality) into something which, from a cryptographic point of view, looks random enough. Do not try to build your own PRNG ! These things are subtle and there is no sure way to test the quality of the result.
The IV does not need to be random, it just needs to be unique for a given pair of key and data (assuming we are talking about an IV for CBC).
So random() should be okay for this purpose.
GREAT NEWS! The initialization vector doesn't need to be random, it just needs to be different for every encryption. So you can use the user's name as the salt. If you use both the user's name and the time then an attacker won't be able to detect password reuse.

Resources