MD5 checksums with salt - md5

I thought that when salt is used, MD5 is computed from concatenation string + salt. So the word 'aaa' and salt 'aa' should be the same like 'aaaa' with salt 'a' or 'aaaaa' without salt.
But this is what I got...
md5pass aaa aa
$1$aa$EeTKacbSboHIR0fSp2UVf0
md5pass aaaa a
$1$a$M2jh3iKJcBEuJdTGjNcsh0
Could you please explain why checksums are different?
Thank you,
Martin

I mixed up two different things - MD5 checksum and password hash
MD5 checksum is used for checking that a file was not modified. No salt is used, result is usually a hexadecimal number.
MD5 password hash is used to store passwords in non-readable form. It uses MD5(password + salt) in many iterations, result starts with $1$.
md5pass computes password hash from given passphrase and salt. There are many iterations of md5(pass + salt + result_from_previous_iteration) so not just MD5(pass+salt) as I thought.
http://en.wikipedia.org/wiki/Crypt_%28Unix%29

Related

How to decrypt a MD5 Password that is exactly 9 digits long?

I have a hash from my school work that I need to decrypt.
The hash is: 68728d8fa7977d2567c6363381eda037.
It looks like it uses either MD4 or MD5 hashing algorithm
How to decrypt it?
You "decrypt" a hash by making a lookup table. In your case, if you know that the password is exactly 9 digits long, you'd create a hash for every possible 9-digit password and compare each one to the hash you're trying to decrypt.
By definitions hash is one way encryption so you can't decrypt it. You can use kind of tables with all possible combinations. Which is CPU consuming process. Here is one simple code on meta language:
for i from 0 to 999999999
h=md5(i)
write(file,h)
endoffor
line=search(file,inputhash)
print(line)

using HASHBYTES function to hash data

I have data I need to hash.
I have list of numbers to compare for hashed numbers.
as far as someone could tell me, the data was hashed with SHA_256
I have only one example for input and output and I need to find out the hashing logic in SQL Server.
This is the original number: 02229747
And this is the hash number: 4ad54f5b376038f49a44d411e6d551ae4c8dd147c8605a7eec32ba850080b326
I have tried using the following but I can't manage to get the same result.
declare #number bigint = 022529747 
DECLARE #HashId varbinary(50) = HashBytes('SHA2_256', cast(#number as varbinary(50)))
select #HashId
Hashbytes can take a number of different algorithms as input. Try each one:
select
a = hashbytes('MD2', '022529747'),
b = hashbytes('MD4', '022529747'),
c = hashbytes('MD5', '022529747'),
d = hashbytes('SHA', '022529747'),
e = hashbytes('SHA1', '022529747'),
f = hashbytes('SHA2_256', '022529747'),
g = hashbytes('SHA2_512', '022529747')
Column f returns the value you are looking for, so the algorithm used was SHA2_256. Note that I am putting the data in as a string (varchar), not an integer (bigint).. The bytes which represent 022529747 as a varchar are very different from the bytes which represent 022529747 as a bigint.
Background:
Hashing and encryption are different.
SHA stands for "secure hashing algorithm". It takes some input, and produces an output hash. If the input changes, the hash changes (with the limit of the birthday problem. But you can't go backwards. You can't take the output hash, and turn it back into the input data. The best you can do is try every different possible input, and see if that input generates the hash. See this 3Blue1Brown video for an illustrative explanation.
SHA is a family of cryptographic hash functions, but don't let the name fool you. "Cryptographic" doesn't mean the same thing as "encryption". It just means that it's "hard to guess" what the input data might be based on the output, because the output appears random. See This thread for the difference between a hash function and a cryptographic hash function
AES stands for "advanced encryption standard". This is a symmetric key encryption. Data encrypted with AES can be decrypted back to the original input. The "symmetric" part means one key is used to both encrypt, and decrypt (compared to, e.g., PGP encryption, which uses different keys to encrypt and decrypt).
The SQL hashbytes function can use a number of different algorithms, but none of the are reversible. They are all hashing algorithms, not encryption algorithms.
If you need encryption and decryption in code, the correct SQL functionality to use is EncryptByKey and DecryptByKey

Are standard hash functions like MD5 or SHA1 quaranteed to be unique for small input (4 bytes)?

Scenario:
I'm writing web service, that will act like identity provider for 3pty application. I have to send to this 3pty application some unique identifier of our user. In our database, unique user identifier is integer (4 bytes, 32 bites). Per our security rules I can't send those in plain form - so sending them out hashed (trough function like MD5 or SHA1) was my first idea.
Problem:
The result of MD5 is 16 bytes, result of SHA1 is 40 bytes, I know they can't be unique for larger input sets, but given the fact my input set is only 4 bytes long (smaller then hashed results) - are they guaranteed to be unique, or am I doomed to some poor-man hash function (like xoring the integer input with some number, shifting bites, adding predefined bites, etc.) ?
For what you're trying to achieve (preventing a 3rd party from determining your user identifier), a straight MD5 or SHA1 hash is insufficient. 32 bits = about 4 billion values, it would take less than 2 hours for the 3rd party to brute force every value (#1m hashes/sec). I'd really suggest using HMAC-SHA1 instead.
As for collisions, this question has an extremely good answer on their likelihood. tl;dr For 32-bits of input, a collision is excessively small.
If your user identifiers aren't random (they increment by 1 or there is a known algorithm for creating them), then there's no reason you can't generate every hash to make sure that no collision will occur.
This will check the first 10,000,000 integers for a collision with HMAC-SHA1 (will take about 2 minutes to run):
public static bool checkCollisionHmacSha1(byte[] key){
HMACSHA1 mac = new HMACSHA1(key);
HashSet<byte[]> values = new HashSet<byte[]>();
bool collision = false;
for(int i = 0; i < 10000000 && collision == false; i++){
byte[] value = BitConverter.GetBytes(i);
collision = !values.Add(mac.ComputeHash(value));
if (collision)
break;
}
return collision;
}
First, SHA1 is 20 bytes not 40 bytes.
Second, although input is very small, there still may be a collision. It is best to test this, but I do not know a feasible way to do that.
In order to prevent any potential collision:
1 - Hash your input and produce the 16/20 bytes of hash
2 - Spray your actual integer onto this hash.
Like put a byte of your int every 4/5 bytes.
This will guarantee the uniqueness by using the input itself.
Also, take a look at Collision Column part

AES: Storing a clear part of the encrypted data for search purpose

I encrypt sensitive text tokens with AES and store them in a database. I would like to do a partial search for these tokens and not only search exact matches.
Decrypting all tokens would be too slow so my idea would be to store the beginning of the token as clear text in an other column of the database.
A token is 90 characters and is unique for each user. I would store for example the 20 first characters.
If someone gets a copy of the database, would it be a security issue, I mean would it be easier to reconstruct the complete token having a clear part of it?
My AES encryption settings are :
AES-128 with a 32 bytes encryption key.
encryption mode is CBC.
IV are unique for each token.
Create a hash for each value (or the first part of each value if they are long), such as the first 20 bytes of PBKDF2 with 10,000 iterations, and store that as a separate field/column. To check a value, perform the same operation and check it against the new field. The value in the new field is not reversable and the comparison operations are cheap (straight binary comparison).

Is there any difference between md5 and sha1 in this situation?

It is known that
1. if ( md5(a) == md5(b) )
2. then ( md5(a.z) == md5(b.z) )
3. but ( md5(z.a) != md5(z.b) )
where the dots concatenate the strings.
EDIT ---
Here you can find a and b:
http://www.mscs.dal.ca/~selinger/md5collision/
Check these links:
hexpaste.com/qzNCBRYb/1 - this is a.md5(a)."kutykurutty"
hexpaste.com/mSXMl13A/1 - this is b.md5(b)."kutykurutty"
They share the same md5 hash, yet they are different. But you can call these strings a' and b', because they have the same md5.
--- EDIT
What happens in the second row if we change all the md5 to sha1? So:
1. if ( sha1(c) == sha1(d) )
2. then ( sha1(c.z) ?= sha1(d.z) )
I couldn't find two different strings with same sha1, that's why I'm asking this. Are there any other interesting "rules" about sha1?
SHA1 will behave exactly like MD5 in this scenario.
The only two references I have found are the following -
http://www.iaik.tugraz.at/content/research/krypto/sha1/MeaningfulCollisions.php
http://www.schneier.com/blog/archives/2005/02/sha1_broken.html#c1654 (See comment by David Schwartz)
From the IAIK website -
Note that for colliding SHA-1 message pairs (as for all other hash functions following a similar design principle) it is always possible to append suffixes to both messages as long as they are the same.
I don't think anybody has found two colliding strings for SHA1, so this is mostly an academic discussion. But from what I understand, when a collision is discovered, it should be possible to create several other collisions by using this property.
The first statement will only hold true for very specific z specifically computed for given a and b. It is true that you can generate an MD5 collision, but this is not trivial - some computational effort is required and certainly you can't expect that any z will do.
Currently SHA-1 is believed to be cryptographically secure which means noone has come up with a way to generate SHA-1 collisions. It doesn't mean that it is really secure and collision generation is not possible - maybe there is a yet uncovered vulnerability. Even if there is a vulnerability it's highly unlikely that the same strings will at the same time form both an MD5 and a SHA-1 collision.
Sha1 isn't as easily cracked as md5, but they did find some vulnerabilities in it back in '05 I believe.
Your example is wrong in my opinion.
Let me show you why:
md5(a) == md5(b)
When both hashes are the same, the corresponding strings have to be same (this could be collisions, but it's not important in my thesis), so we'll have:
a = b
When you now concatenate both strings with a string z, you will have
a.z = b.z
and their md5-hashes will be the same, because they have the same string-input
md5(a.z) == md5(b.z)
and the md5-hash will a third time be equals while both string inputs are the same
md5(z.a) == md5(z.b)
And this is true for md5 and every other hashing algorithm while they have to be deterministic and side effect free.
So your example will only make sense when z is a special string which will result in an collision. And therefore the behaviour of md5 and sha1 will exactly be the same:
The collision-string appended will result in a collision, but prepended will be different hashes (but there's a really really really low probability you find a collision-string which will be prependend and appended result in an collision, but none example has been yet found in reality)
You only didn't find thwo different string with same sha1 because collisions are harder to find as explained by the people before me.

Resources