Does knowledge about value length compromise hash integrity - database

If I am storing a hashed value in a database, but the length of the original value being hashed is fixed (eg. always 4 characters), does this compromise the one-way nature of the hashing function?
More precisely, I have sensitive strings which I then encrypt and store in a database. In order to search for these strings, I don't want to decrypt every entry in the database, so I also store the hash of the first 4 characters of the string in another column. When I want to search the database I generate the hash of the first 4 characters of the search term and compare it to the stored hashes to find which entries match or could match and then decrypt those entries to check for collisions and get the rest of the data related to that entry.
My worry is that since an attacker would know that the length of the strings being hashed is constant (4 characters), he/she would only need to generate a table of all possible 4 letter strings and their hashes and look-up the hashed values stored in my database (thereby giving away the first 4 characters of the original sensitive string).

You're pretty much right in your conclusion. If an attacker knows that your hash is of a 4 character string, it's pretty trivial to find the plain text via brute force. In addition to giving the attacker knowledge of first 4 characters of your sensitive data, it could also allow them to gain knowledge of part of the key you are using to encrypt your data (on a simple level encryption is key XOR plaintext, that means plaintext XOR encrypted = key). While it would be challenging to use that information to break the rest of the encryption, cryptographic attacks have been build on less.

Depending on the type of search you want to perform there are a couple options you may use to improve upon your scheme:
Search for full strings: You could encrypt the search term and query on that. Or store the hash of the encrypted string, and query on that if your strings are very long.
Search for partial strings: Alter your scheme by using a keyed hash instead of a simple hash.

Related

Encrypt hash map keys while keeping constant lookup speed

I would like to encrypt the keys and values of a hash map with AES256 CBC, individually.
The challenge is to encrypt the keys while maintaining the constant lookup speed and security (mainly against dictionary attacks).
I read about blind indices, but these need some randomness at creation (salt, nonce) and it is impossible for the lookup function to recreate the nonce when searching. At lookup we would need to know where to fetch the nonce from for a particular key, which in the end would mean to be vulnerable elsewhere.
So far, I can only think of two options.
First one would be to just not encrypt keys, although I would prefer to do it.
The second one would be to obtain the blind indices by applying a transformation like
blind_index(key) = encrypt(digest(key))
but the problem here is that you need a unique initialisation vector for each key encryption, which brings us again to the problem described above: having a table of IVs used, in order for the lookup function to be able to reconstruct the blind index when searching, which is moving the same problem elsewhere.
For the second approach, my thought was: since I always encrypt unique values (keys are unique and even if they are substrings of one another, e.g. 'awesome' and 'awesome_key', they are hashed before encryption, so they look quite different in their 'hashed & unencrypted' form) I could use a global IV for all encryptions, which can be easily accessible to the lookup function. Since the lookup function requires the encryption key, only the owner will be able to compute the blind index correctly and in the map itself there will be no visible similarities between keys that are similar in plaintext.
The big problem I have with the second approach is that it violates the idea of never using IVs for more than one encryption. I could obfuscate the IV 'to make it more secure,' but that's again a bad idea since IVs are supposed to be plaintext.
More details about the circumstances:
app for mobile
map will be dumped to a file
map will be accessible for lookup through a REST API
Maybe I should use a different algorithm (e.g. EBC)?
Thanks in advance!
This is completely in the realm of Format Preserving Encryption (FPE). However, applying it is hard and libraries that handle it well are not all that common. FPE takes a an amount of bits or even a range and then returns an encrypted value of the same size or in the same range. This ciphertext is pseudo-random in the given domain as long as the input values are unique (which, for keys in a hash table, they are by definition).
If you may expand your ciphertext compared to the plaintext then you could also look at SIV modes (AES-SIV or AES-GCM_SIV, which are much easier to handle. These return a byte array, which could turn into a String, e.g. by using base64 encoding. Otherwise you could wrap the byte array and provide your own equals and hashCode method. Note that these expand your plaintext relatively significantly; these are authenticated modes. Advantage: the IV gets calculated from the input and any change in the input will randomize the ciphertext again.
Finally, you could of course simply use an IV or nonce to produce your ciphertext and prefix it to the value. However, beware that reencryption of changed values using the same IV would be rather dangerous, as you may leak information through repetition. In some modes this could entirely break the confidentiality offered. So you would have to prevent reuse of the IV.
The use of ECB is certainly not recommended for strings. A single block encrypt would work of course if the input is (or can be expanded to) a single block.

SQL Server hash algorithms

If my input length is less than the hash output length, are there any hashing algorithms that can guarantee no collisions.
I know by nature that a one way hash can have collisions across multiple inputs due to the lossy nature of the hashing, especially when considering input size is often greater than output size, but does that still apply with smaller input sizes?
Use a symmetric block cipher with a randomly chosen static key. Encryption can never produce a duplicate because that would prevent unambiguous decryption.
This scheme will force a certain output length which is a multiple of the cipher block size. If you can make use a variable-length output you can use a stream cipher as well.
Your question sounds like you're looking for a perfect hash function. The problem with perfect hash functions is they tend to be tailored towards a specific set of data.
The following assumes you're not trying to hide, secure or encrypt the data...
To think of it another way, the easiest way to "generate" a perfect hash function that will accept your inputs is to map the data you want to store to a table and associate those inputs with a surrogate primary key. You then create a unique constraint for the column (or columns) to ensure the input you're mapping only maps to a single surrogate value.
The surrogate key could be int, bigint or a guid. It all depends on how many rows you're looking to store.
If your input lengths are known to be small, such as 32 bits, then you could actually enumerate through all possible inputs and check the resulting hashes for collisions. That's only going to be 4294967296 possible inputs, and shouldn't take to terribly long to enumerate all of them. Essentially you'd be building a rainbow table to test for collisions.
If there is some security relying on this though, one of the issues is if an attacker knows your input lengths are constrained, it makes it easy for them to also perform the same enumeration to create a map/table that will map hashes back to the original values. "attacker" is a pretty terrible term here though because I have no context of how you are using these hashes and whether you are concerned about being able to reverse them.

Indexing an encrypted column in sql server

I have patient health information stored in a SQL Server 2012 database. When I do a search on a patient's name, their names are encrypted, so the search is very slow. How can I add an index on an encrypted column ?
I am using Symmetric Key encryption (256-bit AES) on varbinary fields.
There are separate encrypted fields for Patient's first name, last name, address, phone number, DOB, SSN. All of these are searchable (partial also) except SSN.
To build on the answer that #PhillipH provided: if you are performing an exact search on (say) last name you can include a computed column defined as CHECKSUM(encrypt(last_name)) (with encrypt your encryption operation). This is secure in that it does not divulge any information -- a checksum on the encrypted value does not reveal anything about the plaintext.
Create an index on this computed column. To search on the name, instead of just doing WHERE encrypted_last_name = encrypt(last_name), add a search on the hash: WHERE encrypted_last_name = encrypt(last_name) AND CHECKSUM(encrypt(last_name)) = hashed_encrypted_last_name. This is much faster because SQL Server only has to search an index for a small integer value, then verify that the name in fact matches, reducing the amount of data to check considerably. Note that no data is decrypted in this scheme, with or without the CHECKSUM -- we search for the encrypted value only. The speedup does not come from reducing the amount of data that is encrypted/decrypted (only the data you pass in is encrypted) but the amount of data that needs to be indexed and compared for equality.
The only drawback is that this does not allow partial searches, or even case variation, and indeed, doing that securely is not trivial. Case is relatively simple (hash encrypted(TOUPPER(name)), making sure you use a different key to avoid correlation), but partial matches require specialized indexes. The simplest approach I can think of is to use a separate service like Lucene to do the indexing, but make it use secure storage for its files (i.e. Encrypting File System (EFS) in Windows). Of course, that does mean a separate system that needs to be certified -- but I can't think of any convenient solution that remains entirely in SQL Server and does not require additional code.
If you can still change the database design/storage, you may wish to consider Transparent Data Encryption (TDE) which has the huge advantage that it's, well, transparent and integrated in SQL Server at the engine level. Not only should partial matching be much faster since individual rows don't need decrypting (just whole pages), if it's not fast enough you can create a full-text index which will also be encrypted. I don't know if TDE works with your security requirements, though.
As a programmatic solution, if you dont need a partial match, you could store a hash in the clear on another field and use the same hashing algorithm on the client/app server and match on hash. This would have the possibility of a false positive match but would negate the need to decrypt the data.
If you are using Microsoft SQL server implicit encryptbykey function, there is no benefit of using index on that column because sql sever encryptbykey function will have different output every time for same input because of random iv used by sql server itself.

how does salt works on crypt function in c? [duplicate]

This question already has answers here:
How does password salt help against a rainbow table attack?
(10 answers)
Closed 8 years ago.
I read the man crypt and didn't understand what the phrase below means: salt is a two-character string chosen from the set [a-zA-Z0-9./]. This string is used to perturb the algorithm in one of 4096 different ways.
The primary function of salts is to defend against dictionary attacks versus a list of password hashes and against pre-computed rainbow table attacks.
Salt (cryptography)
Basically adding a little bit of unknown data into the hash prevents an attacker from precomputing all hashes for a given dictionary and then just looking up in the table to find the unhashed value.
Usually to encrypt sensitive data a salt is used.
What this means is your sensitive data (say password) is concatenated with a string(salt), encrypted and then stored.
This protects it against table attacks, in which an attacker, has most dictionary words and their popular algorithm encryption (md5, sha1, etc) Strings in a table. So if he were to have access to the db, he would be able to decipher all of your sensitive data.
Using a salt makes it harder for the attacker since - The attacker needs to know the algorithm used with which the salt was added and would need a specific dictionary for that specific salt, making his life harder.

Preferred Method of Storing Passwords In Database

What is your preferred method/datatype for storing passwords in a database (preferably SQL Server 2005). The way I have been doing it in several of our applications is to first use the .NET encryption libraries and then store them in the database as binary(16). Is this the preferred method or should I be using a different datatype or allocating more space than 16?
I store the salted hash equivalent of the password in the database and never the password itself, then always compare the hash to the generated one of what the user passed in.
It's too dangerous to ever store the literal password data anywhere. This makes recovery impossible, but when someone forgets or loses a password you can run through some checks and create a new password.
THE preferred method: never store passwords in your DB. Only hashes thereof. Add salt to taste.
I do the same thing you've described, except it is stored as a String. I Base64 encode the encrypted binary value. The amount of space to allocate depends on the encryption algorithm/cipher strength.
I think you are doing it right (given that you use a Salt).
store the hash of the salted-password, such as bcrypt(nounce+pwd). You may prefer bcrypt over SHA1 or MD5 because it can be tuned to be CPU-intensive, therefore making a brute force attack way longer.
add a captcha to the login form after a few login errors (to avoid brute-force attacks)
if your application has a "forgot my password" link, make sure it does not send the new password by email, but instead it should send a link to a (secured) page allowing the user to define a new password (possibly only after confirmation of some personal information, such as the user's birth date, for example). Also, if your application allows the user to define a new password, make sure you require the user to confirm the current password.
and obviously, secure the login form (typically with HTTPS) and the servers themselves
With these measures, your user's passwords will be fairly well protected against:
=> offline dictionary attacks
=> live dictionary attacks
=> denial of service attacks
=> all sorts of attacks!
Since the result of a hash function is a series of byte in the range 0 to 255 (or -128 to 127, depending the signed-ness of your 8-bit data type), storing it as a raw binary field makes the most sense, as it is the most compact representation and requires no additional encoding and decoding steps.
Some databases or drivers don't have great support for binary data types, or sometimes developers just aren't familiar enough with them to feel comfortable. In that case, using a binary-to-text encoding like Base-64 or Base-85, and storing the resulting text in a character field is acceptable.
The size of the field necessary is determined by the hash function that you use. MD5 always outputs 16 bytes, SHA-1 always outputs 20 bytes. Once you select a hash function, you are usually stuck with it, as changing requires a reset of all existing passwords. So, using a variable-size field doesn't buy you anything.
Regarding the "best" way to perform the hashing, I've tried to provide many answers to other SO questions on that topic:
Encrypting passwords
Encrypting passwords
Encrypting passwords in .NET
Salt
Salt: Secret or public?
Hash iterations
I use the sha hash of the username, a guid in the web config, and the password, stored as a varchar(40). If they want to brute force / dictionary they'll need to hack the web server for the guid as well. The username breaks creating a rainbow table across the whole database if they do find the password. If a user wants to change their username, I just reset the password at the same time.
System.Web.Security.FormsAuthentication.HashPasswordForStoringInConfigFile(
username.ToLower().Trim(),
ConfigurationManager.AppSettings("salt"),
password
);
A simple hash of the password, or even (salt + password) is not generally adequate.
see:
http://www.matasano.com/log/958/enough-with-the-rainbow-tables-what-you-need-to-know-about-secure-password-schemes/
and
http://gom-jabbar.org/articles/2008/12/03/why-you-should-use-bcrypt-to-store-your-passwords
Both recommend the bcrypt algorithms. Free implementations can be found online for most popular languages.
You can use multiple hashes in your database, it just requires a little bit of extra effort. It's well worth it though if you think there's the remotest chance you'll need to support additional formats in the future. I'll often use password entries like
{hashId}${salt}${hashed password}
where "hashId" is just some number I use internally to recognize that, e.g., I'm using SHA1 with a specific hash pattern; "salt" is a base64-encoded random salt; and "hashed password" is a base64-encoded hash. If you need to migrate hashes you can intercept people with an old password format and make them change their password the next time they log in.
As others have mentioned you want to be careful with your hashes since it's easy to do something that's not really secure, e.g., H(salt,password) is far weaker than H(password,salt), but at the same time you want to balance the effort put into this with the value of the site content. I'll often use H(H(password,salt),password).
Finally, the cost of using base64-encoded passwords is modest when compared to the benefits of being able to use various tools that expect text data. Yeah, they should be more flexible, but are you ready to tell your boss that he can't use his favorite third party tool because you want to save a few bytes per record? :-)
Edited to add one other comment: if I suggested deliberately using an algorithm that burned even a 1/10th of a second hashing each password I would be lucky to just be laughed out of my boss's office. (Not so lucky? He would jot something down to discuss at my next annual review.) Burning that time isn't a problem when you have dozens, or even hundreds, of users. If you're pushing 100k users you'll usually have multiple people logging in at the same time. You need something fast and strong, not slow and strong. The "but what about the credit card information?" is disingenuous at best since stored credit card information shouldn't be anywhere near your regular database, and would be encrypted by the application anyway, not individual users.
If you are working with ASP.Net you can use the built in membership API.
It supports many types of storage options, inlcuding; one way hash, two way encryption, md5 + salt. http://www.asp.net/learn/security for more info.
If you dont need anything too fancy, this is great for websites.
If you are not using ASP.Net here is a good link to a few articles from 4guys and codeproject
https://web.archive.org/web/20210519000117/http://aspnet.4guysfromrolla.com/articles/081705-1.aspx
https://web.archive.org/web/20210510025422/http://aspnet.4guysfromrolla.com/articles/103002-1.aspx
http://www.codeproject.com/KB/security/SimpleEncryption.aspx
Since your question is about storage method & size I will address that.
Storage type can be either binary or text representation (base64 is the most common). Binary is smaller but I find working with text easier. If you are doing per user salting (different salt per password) then it is easier to store salt+hash as a single combined string.
The size is hash algorithm dependent. The output of MD5 is always 16 bytes, SHA1 is always 20 bytes. SHA-256 & SHA-512 are 32 & 64 bytes respectively. If you are using text encoding you will need slightly more storage depending on the encoding method. I tend to use Base64 because storage is relatively cheap. Base64 is going to require roughly 33% larger field.
If you have per user salting you will need space for the hash also. Putting it all together 64bit salt + SHA1 hash (160 bit) base64 encoded takes 40 characters so I store it as char(40).
Lastly if you want to do it right you shouldn't be using a single hash but a key derivation function like RBKDF2. SHA1 and MD5 hashes are insanely fast. Even a single threaded application can hash about 30K to 50K passwords per second thats up to 200K passwords per second on quad core machine. GPUs can hash 100x to 1000x as many passwords per second.With speeds like that brute force attacking becomes an acceptable intrusion method. RBKDF2 allows you to specify the number of iterations to fine tune how "slow" your hashing is. The point isn' to bring the system to its knees but to pick a number of iterations so that you cap upper limit on hash throughput (say 500 hashes per second). A future proof method would be to include the number of iterations in the password field (iterations + salt + hash). This would allow increasing iterations in the future to keep pace with more powerful processors. To be even more flexible use varchar to allow potentially larger/alternative hashes in the future.
The .Net implementation is RFC2892DeriveBytes
http://msdn.microsoft.com/en-us/library/system.security.cryptography.rfc2898derivebytes.aspx

Resources