Is there any possibilities to generate same hash key using md5? - md5

am generating unique api keies using this function and storing into database.
strtoupper( md5( uniqid(rand(), TRUE ) ) );
is thr any possiblities of generating same hash key by above function???

uniqid is based on the time in milliseconds, combine that with a random prefix and it is extremely unlikely.
See below for more info on uniqid:
PHP: uniqid - Manual

Even though the unique value generated by uniqueid function is really pretty unique, there is considerable probability of collision when using MD5 hash algorithm (i.e. different values will result in the same hash value). In fact, MD5 is considered "cryptographically broken and unsuitable for further use" by US-CERT and if you're concerned about collisions, you should consider using SHA256/SHA512 algorithms instead.

Related

Combining two GUID/UUIDs with MD5, any reasons this is a bad idea?

I am faced with the need of deriving a single ID from N IDs and at first a i had a complex table in my database with FirstID, SecondID, and a varbinary(MAX) with remaining IDs, and while this technically works its painful, slow, and centralized so i came up with this:
simple version in C#:
Guid idA = Guid.NewGuid();
Guid idB = Guid.NewGuid();
byte[] data = new byte[32];
idA.ToByteArray().CopyTo(data, 0);
idB.ToByteArray().CopyTo(data, 16);
byte[] hash = MD5.Create().ComputeHash(data);
Guid newID = new Guid(hash);
now a proper version will sort the IDs and support more than two, and probably reuse the MD5 object, but this should be faster to understand.
Now security is not a factor in this, none of the IDs are secret, just saying this 'cause everyone i talk to react badly when you say MD5, and MD5 is particularly useful for this as it outputs 128 bits and thus can be converted directly to a new Guid.
now it seems to me that this should be just dandy, while i may increase the odds of a collision of Guids it still seems like i could do this till the sun burns out and be no where near running into a practical issue.
However i have no clue how MD5 is actually implemented and may have overlooked something significant, so my question is this: is there any reason this should cause problems? (assume sub trillion records and ideally the output IDs should be just as global/universal as the other IDs)
My first thought is that you would not be generating a true UUID. You would end up with an arbitrary set of 128-bits. But a UUID is not an arbitrary set of bits. See the 'M' and 'N' callouts in the Wikipedia page. I don't know if this is a concern in practice or not. Perhaps you could manipulate a few bits (the 13th and 17th hex digits) inside your MD5 output to transform the hash outbut to a true UUID, as mentioned in this description of Version 4 UUIDs.
Another issue… MD5 does not do a great job of distributing generated values across the range of possible outputs. In other words, some possible values are more likely to be generated more often than other values. Or as the Wikipedia article puts it, MD5 is not collision resistant.
Nevertheless, as you pointed out, probably the chance of a collision is unrealistic.
I might be tempted to try to increase the entropy by repeating your combined value to create a much longer input to the MD5 function. In your example code, take that 32 octet value and use it repeatedly to create a value 10 or 1,000 times longer (320 octects, 32,000 or whatever).
In other words, if working with hex strings for my own convenience here instead of the octets of your example, given these two UUIDs:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1
FCF1B8E4-5548-4C43-995A-8DA2555459C8
…instead of feeding this to the MD5 function:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…feed this:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…or something repeated even longer.

hashing function guaranteed to be unique?

In our app we're going to be handed png images along with a ~200 character byte array. I want to save the image with a filename corresponding to that bytearray, but not the bytearray itself, as i don't want 200 character filenames. So, what i thought was that i would save the bytearray into the database, and then MD5 it to get a short filename. When it comes time to display a particular image, i look up its bytearray, MD5 it, then look for that file.
So far so good. The problem is that potentially two different bytearrays could hash down to the same MD5. Then, one file would effectively overwrite another. Or could they? I guess my questions are
Could two ~200 char bytearrays MD5-hash down to the same string?
If they could, is it a once-per-10-ages-of-the-universe sort of deal or something that could conceivably happen in my app?
Is there a hashing algorithm that will produce a (say) 32 char string that's guaranteed to be unique?
It's logically impossible to get a 32 byte code from a 200 byte source which is unique among all possible 200 byte sources, since you can store more information in 200 bytes than in 32 bytes.
They only exception would be that the information stored in these 200 bytes would also fit into 32 bytes, in which case your source date format would be extremely inefficient and space-wasting.
When hashing (as opposed to encrypting), you're reducing the information space of the data being hashed, so there's always a chance of a collision.
The best you can hope for in a hash function is that all hashes are evenly distributed in the hash space and your hash output is large enough to provide your "once-per-10-ages-of-the-universe sort of deal" as you put it!
So whether a hash is "good enough" for you depends on the consequences of a collision. You could always add a unique id to a checksum/hash to get the best of both worlds.
Why don't you use a unique ID from your database?
The probability of two hashes will likely to collide depends on the hash size. MD5 produces 128-bit hash. So for 2128+1 number of hashes there will be at least one collision.
This number is 2160+1 for SHA1 and 2512+1 for SHA512.
Here this rule applies. The more the output bits the more uniqueness and more computation. So there is a trade off. What you have to do is to choose an optimal one.
Could two ~200 char bytearrays MD5-hash down to the same string?
Considering that there are more 200 byte strings than 32 byte strings (MD5 digests), that is guaranteed to be the case.
All hash functions have that problem, but some are more robust than MD5. Try SHA-1. git is using it for the same purpose.
It may happen that two MD5 hashes collides (are the same). In 1996, a flaw was found in MD5 algorithm, and cryptanalysts advised to switch to SHA-1 hashing algorithm.
So, I will advise you to switch to SHA-1 (40 characters). But do not worry: I doubt that your two pictures will get the same hash. I think you can assume this risk in your application.
As other said before. Hash doesnt give you what you need unless you are fine with risk of collision.
Database is helpful here.
You get unique index for each 200 long string. No collisions here, and you need to set your 200 long names to be indexed, in that way it will use extra memory but it will sort it for you making search very very fast. You get unique id which can be easily used for filenames.
I have'nt worked much on hashing algorithms but as per my understanding there is always a chance of collison in hashing algorithm i.e. two differnce object may be hashed to same hash value but it is guaranteed that every time a object will be hashed to same hash value. There are other techniques that may be used for this , like linear probing.

A perfect hash for known values

Say I have some known values, against which I want to create a hash table. For example,
For 0x78409 -> 1
For 0x89934 -> 2
For 0x89834 -> 3
etc...
But these values (0x78409, 0x89934, 0x89834) are only known at runtime, so switch/case cannot be used. However, they become known at the beginning of execution, so maybe we can create a hash function which adapts itself to make a perfect hash table. So my question is, can we create a perfect hash function for such case.
If the entire domain of inputs is known before the hashmap is created, then this is possible, but requires some form of runtime code generation, either via a VM or JIT (probably through a scripting language, such as LuaJIT), that would allow you to use gperf and its ilk to create a hash at runtime, compile it, then use it to fill and retrieve from the map.
An easier, more viable solution is to use a hash function with extremely low collisions for the given set of input permutations (ie: you might only be using alphabetical, lowercase characters for instance), a minimal perfect hash.
Murmur3 and crapwow are the ones to lookout for (though, I'd be cautious with crapwow), Google's CityHash, and xxHash are also worth looking at. Bob Jenkins also has a good minimal perfect hash based map available here, which should do just fine as well.
Wikipedia gives this page. But are you sure you want a perfect hash function? Perhaps a good and fast hash function can be enough?

Is it possible to break md5 hash using genetic algorithms?

With a knowledge of how md5 works, would it be possible to use a population based algorithm such as genetic programming to break simple passwords?
For examples, given a md5 hash for a string that is between 5 to 10 characters, we are to try to get the string back.
If yes, what could be
A good representation for an individual of the population
Selction criteria
Recombination methods
This is to understand the application of genetic algorithms and to know if anyone has done anything of this sort.
Not really.
With just 5 characters, you could brute force it in not too unreasonable amounts of time, but presumably you're asking more about GAs than you are about breaking MD5. The problem is that there's no exploitable structure in an MD5 hash. Strings that are "close together" do not generate hashes that are "close together" under any useful distance relationship. The fitness function will basically be random.
I think the answer is "no". Because you are not able to get any crossover function. And the fitness function will be boolean. GA with only mutation operator and such fitness function is a bruteforce.
No, it is highly unlikely.
The genetic algorithm is used eg. for finding local/global maximum/minimum of some function. In case of md5 hash, if you change the value for which you calculate md5 hash, the md5 hash changes completely, thus narrowing the input value range is completely unuseful. MD5 algorithm was designed to hash the generated value if input data changes in any way. The only possibility to find the correct value is when you apply mutation, but it results in checking random input values for whether they generate the given hash (which - as oxilumin said - is just a brute force attack).
You can read more about finding value that generated specific md5 hash here (rainbow tables).
Although the answer is probably "no", there is one caveat to consider: The published collisions are strings that only differ by a few key bytes: https://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities
Guessing the plaintext with a genetic algorithm isn't guaranteed, but it may be more efficient to discover a collision that way.
Or if it's in PHP and compares the md5 hash with the == operator... https://eval.in/108854

Is there any difference between md5 and sha1 in this situation?

It is known that
1. if ( md5(a) == md5(b) )
2. then ( md5(a.z) == md5(b.z) )
3. but ( md5(z.a) != md5(z.b) )
where the dots concatenate the strings.
EDIT ---
Here you can find a and b:
http://www.mscs.dal.ca/~selinger/md5collision/
Check these links:
hexpaste.com/qzNCBRYb/1 - this is a.md5(a)."kutykurutty"
hexpaste.com/mSXMl13A/1 - this is b.md5(b)."kutykurutty"
They share the same md5 hash, yet they are different. But you can call these strings a' and b', because they have the same md5.
--- EDIT
What happens in the second row if we change all the md5 to sha1? So:
1. if ( sha1(c) == sha1(d) )
2. then ( sha1(c.z) ?= sha1(d.z) )
I couldn't find two different strings with same sha1, that's why I'm asking this. Are there any other interesting "rules" about sha1?
SHA1 will behave exactly like MD5 in this scenario.
The only two references I have found are the following -
http://www.iaik.tugraz.at/content/research/krypto/sha1/MeaningfulCollisions.php
http://www.schneier.com/blog/archives/2005/02/sha1_broken.html#c1654 (See comment by David Schwartz)
From the IAIK website -
Note that for colliding SHA-1 message pairs (as for all other hash functions following a similar design principle) it is always possible to append suffixes to both messages as long as they are the same.
I don't think anybody has found two colliding strings for SHA1, so this is mostly an academic discussion. But from what I understand, when a collision is discovered, it should be possible to create several other collisions by using this property.
The first statement will only hold true for very specific z specifically computed for given a and b. It is true that you can generate an MD5 collision, but this is not trivial - some computational effort is required and certainly you can't expect that any z will do.
Currently SHA-1 is believed to be cryptographically secure which means noone has come up with a way to generate SHA-1 collisions. It doesn't mean that it is really secure and collision generation is not possible - maybe there is a yet uncovered vulnerability. Even if there is a vulnerability it's highly unlikely that the same strings will at the same time form both an MD5 and a SHA-1 collision.
Sha1 isn't as easily cracked as md5, but they did find some vulnerabilities in it back in '05 I believe.
Your example is wrong in my opinion.
Let me show you why:
md5(a) == md5(b)
When both hashes are the same, the corresponding strings have to be same (this could be collisions, but it's not important in my thesis), so we'll have:
a = b
When you now concatenate both strings with a string z, you will have
a.z = b.z
and their md5-hashes will be the same, because they have the same string-input
md5(a.z) == md5(b.z)
and the md5-hash will a third time be equals while both string inputs are the same
md5(z.a) == md5(z.b)
And this is true for md5 and every other hashing algorithm while they have to be deterministic and side effect free.
So your example will only make sense when z is a special string which will result in an collision. And therefore the behaviour of md5 and sha1 will exactly be the same:
The collision-string appended will result in a collision, but prepended will be different hashes (but there's a really really really low probability you find a collision-string which will be prependend and appended result in an collision, but none example has been yet found in reality)
You only didn't find thwo different string with same sha1 because collisions are harder to find as explained by the people before me.

Resources