Is it possible to convert from one hash to another? - md5

If I give you the MD5 checksum of a string, can you generate the SHA checksum?
Or Vice Versa?
If I give you an MD5 checksum and a SHA checksum, can you tell me whether they're generated from the same source string?
(Obviously I'm excluding anything like locating the source string from a rainbow table, etc. etc.)

No. Both are one-way hashes, so the information contained in the original source string is lost in the checksum.
Even if you do have the "original" string as you say, there will be some other data out there that when hashed, results in a collision. This is because MD5 and SHA1 are not perfect hash functions. In the case of MD5, each hash can potentially have an infinite number of collisions. Then there is no guarantee that the generated SHA checksum of the "original" string is actually what you seek.
Disclaimer: I have very little experience with the theoretical side, so you may want to verify with other resources.

Related

MD5 collision for known input

Is it possible to create a MD5 collision based on a known input value?
So for example I have input string abc with MD5 900150983cd24fb0d6963f7d28e17f72.
Now I want to add bytes to string def to get the same MD5 900150983cd24fb0d6963f7d28e17f72.
(I know this is possible by bruteforcing and waiting a long time; I want to know if there is a more efficient way in doing this)
Unitl now no algorithm has been discovered that allows you to find a matching input that will generate a given md5 hash.
What has been proven is that you can create md5 collisions quite easily, for example with what is known as chosen-prefix-collision: you can create two files yielding the same md5 hash by appending different data to a specified file. If you want to know more or get the program to try it, look here.

MD5Decrypter.co.uk reverses md5 hashes?

There is a site called http://www.MD5Decrypter.co.uk where when you give a md5 hash, it gives the original string. How is that possible. As far as I know md5 is an irreversible hash algorithm or is it? Secondly, can salt be used along with md5?
md5 is a hash algorithme so it allows two words to have the same hashcode. I you do not trust me, I can hash a 5-letter word, 10-letter word, 128-letter word with md5 and it will give me 32 characters every time.
The probleme is that md5 is not cryptographicaly secured. One can analyse it and guess what could have been hash. But the technique used by the site you posted is the rainbow table.
It can also be a dictionary, but it is less common with md5.
If you use a salt with your md5, this generator will not find anything until the rainbow table with your salt is filled.
md5 is usefull to sign a file, a cookie, or the name of a cryptography algorithm. It is not secured to store passwords. Some languages advice you to use whirlpool, bloswfish, salsa20 or sha512 instead of md2/5 sha 1/2/256

how to resize string [ group of numbers and char]?

Good Afternoon all,
I am working over rsa encryption and decryption, for more security i am also using padding in cipher text, for different input (amit) , i am getting different length output like-
plain text- amit
cipher text-10001123A234A987A765A
My problem is- For big plain text ,my algo generate large size cipher text, and i thought,
it is wastage of resources to keep long string in database ,
Is there any way with the help of that i can compact cipher and convert real cipher when i will require?
In order for the algorithm to be encryption and not just hashing, it must be reversible. To be reversible, the output must contain as much information as the input, and so is unlikely to be significantly shorter.
You may compress the data before encryption. There's not a lot else you can do unless you're willing to give up the ability to recover your original text from the ciphertext.
There are a couple of possibilities:
Change your encryption scheme there are schemes where the size is same as the input size
Compress your data before you encrypt, this will be effective only if you have a large block of text to encrypt and then there's the additional overhead of decrypting too.
This doesn't apply to RSA specifically, but: any secure cipher will give output close to indistinguishable from a random bit pattern. A random bit pattern has, per definition, maximum information theoretic entropy, since for each bit, both 0 and 1 are equally likely.
Now, you want a lossless compression scheme, since you need to be able to decompress to the exact data you originally compressed. An optimal compression scheme will maximize the entropy of it's output. However, we know that the output of our cipher already has maximum entropy, so we can't possibly increase the entropy.
And thus, trying to compress encrypted data is useless.
Note: Depending on your encryption method, compression might be possible, for example, when using a block cipher in EBC mode. RSA is a completely different beast altogether though, and, well, compressing won't do anything (except quite possibly make your final output bigger).
[Edit] Also, the length of your RSA ciphertext will be in the order of log n. With n your public modulus. This is the reason that, especially for small plaintexts, public key crypto is extremely 'wasteful'. You normally employ RSA to setup a (smaller, e.g. 128-bit) symmetric key between two parties and then encrypt your data with a symmetric key algorithm such as AES. AES has a block size of 128 bits, so if you do straightforward encryption of your data, the maximum 'overhead' you incur will be length(message) mod 128 bits.
erm ... you wrote in a comment here that you apply RSA encryption to all single characters:
i am using rsa- it perform over
numbers to convert amit in cipher text
first i do a->97 m->109 i->105..and
then apply rsa over 97 ,109 ... then i
get different integers for 109, 105 or
... i joined that all as a string...
a good advice: don't do that since you will lose the security of RSA
if you use RSA in this way, your scheme becomes a substitution-cypher (with only one substitution alphabet) ... given a reasonably long cypher-text or a reasonable number of cypher-texts, this scheme can be broken by analyzing the frequency of cypher-text-chars
see RSAES-OAEP for a padding scheme to apply to your plaintext before encryption

md5 collision database?

I'm writing a file system deduper. The first pass generates md5 checksums, and the second pass compares the files with identical checksums.
Is there a collection of strings which differ but generate identical md5 checksums I can incorporate into my test case collection?
Update: mjv's answer points to these two files, perfect for my test case.
http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate1.cer
http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate2.cer
You can find a couple of different X.509 certificate files with the same MD5 hash at this url.
I do not know of MD5 duplicate files repositories, but you can probably create your own, using the executables and/or the techniques described on Vlastimil Klima's page on MD5 Collision
Indeed MD5 has been know for its weakness with regards to collision resistance, however I wouldn't disqualify it for a project such as your file system de-duper; you may just want to add a couple of additional criteria (which can be very cheap, computationally speaking) to further decrease the possibility of duplicates.
Alternatively, for test purposes, you may simply modify your MD5 compare logic so that it deems some MD5 values identical even though they are not (say if the least significant byte of the MD5 matches, or systematically, every 20 comparisons, or at random ...). This may be less painful than having to manufacture effective MD5 "twins".
http://www.nsrl.nist.gov/ might be what you want.

Will the MD5 cryptographic hash function output be same in all programming languages?

I am basically creating an API in php, and one of the parameters that it will accept is an md5 encrypted value. I don't have much knowledge of different programming languages and also about the MD5. So my basic question is, if I am accepting md5 encrypted values, will the value remain same, generated from any programing language like .NET, Java, Perl, Ruby... etc.
Or there would be some limitation or validations for it.
Yes, correct implementation of md5 will produce the same result, otherwise md5 would not be useful as a checksum. The difference may come up with encoding and byte order. You must be sure that text is encoded to exactly the same sequence of bytes.
It will, but there's a but.
It will because it's spec'd to reliably produce the same result given a repeated series of bytes - the point being that we can then compare that results to check the bytes haven't changed, or perhaps only digitally sign the MD5 result rather than signing the entire source.
The but is that a common source of bugs is making assumptions about how strings are encoded. MD5 works on bytes, not characters, so if we're hashing a string, we're really hashing a particular encoding of that string. Some languages (and more so, some runtimes) favour particular encodings, and some programmers are used to making assumptions about that encoding. Worse yet, some spec's can make assumptions about encodings. This can be a cause of bugs where two different implementations will produce different MD5 hashes for the same string. This is especially so in cases where characters are outside of the range U+0020 to U+007F (and since U+007F is a control, that one has its own issues).
All this applies to other cryptographic hashes, such as the SHA- family of hashes.
Yes. MD5 isn't an encryption function, it's a hash function that uses a specific algorithm.
Yes, md5 hashes will always be the same regardless of their origin - as long as the underlying algorithm is correctly implemented.
A vital point of secure hash functions, such as MD5, is that they always produce the same value for the same input.
However, it does require you to encode the input data into a sequence of bytes (or bits) the same way. For instances, there are many ways to encode a string.

Resources