Retrieve a sha256 from another sha256 - c

I have a project where we are supposed to find a sha256 from another sha256.
For example we have
hash1 = "45f1bc22c29626c6db37d483273afbe0f6c434de02fe86229d50c9e71ed144fc"
and we would have to find
hash0 = "5495a885b7f445a198cc5b67a517a0e0536792ab3e7ead18a12c75f8310a9b89"
hash1 is just the hash0 used in a sha256 function.
Initially, I went on to redo every possibility of sha256 but it may do a lot and take a lot of time.
If you have any idea how I could do this, even see if it's possible?

It's impossible, hashes can't be inverted, regardless if they're hashes of hashes or of other content. You could try all the combinations, sure, knowing that the source is a 256-bit sequence, but this is practically prohibitive, although there's always a possibility of 1 out of 2^256 that you guess it at the first try :-)

Related

MD5 collision for known input

Is it possible to create a MD5 collision based on a known input value?
So for example I have input string abc with MD5 900150983cd24fb0d6963f7d28e17f72.
Now I want to add bytes to string def to get the same MD5 900150983cd24fb0d6963f7d28e17f72.
(I know this is possible by bruteforcing and waiting a long time; I want to know if there is a more efficient way in doing this)
Unitl now no algorithm has been discovered that allows you to find a matching input that will generate a given md5 hash.
What has been proven is that you can create md5 collisions quite easily, for example with what is known as chosen-prefix-collision: you can create two files yielding the same md5 hash by appending different data to a specified file. If you want to know more or get the program to try it, look here.

MD5 hashes and Regular Expressions

I received a MD5 hash and a Regular Expression which have the same plaintext..
How do I use the Regular Expression to crack the MD5 hash and find the text behind the MD5?
b89e49cab317f2681be60fb3d1c0f8f8
[(a|c|d)n-t\|]{8}
The idea would be to use the regex as a template and generate inputs that satisfy it.
You can search for a regex visualizer to see this, but what that one says is any of the characters ()acd| or any character between n and t (inclusive) in any order, repeated eight times. I tested this in hashcat, and the regex is correct despite it looking like it means something else. A shorter way to write that would be [acd|()n-t]{8}.
So you start generating 8 character strings with those values and taking the md5 of them. You can do this in almost any programming language but Python is a good choice. Look up the hashlib library, it has a function md5. You'll call the function hexdigest on that and compare it to the provided hash.
>>> import hashlib
>>> hashlib.md5(b'cybering').hexdigest()
'61e4feebe66ad22349e292d1462afd3a'
Additionally, if you want to use cracking software, look up JohnTheRipper or hashcat. You should be able to provide them a dictionary and have it attempt to break the hash. I was able to solve this with hashcat on my 980ti in ~5 seconds. This tutorial helped me set up the custom charset and mask to perform the attack.
Have fun!
One approach would be to generate all possible eight-character combinations (with repetition) of the 19 characters allowed by the regex. Test each combination by computing the md5 hash and comparing it to the one you were given.
That would be 13^8 = 815,730,721 possible combinations to check. The answer will likely be found before checking all of them.
I was able to whip out a little Node.js program on my laptop that found the solution in about 4 minutes (I split the problem up using workers to take advantage of multiple CPU cores).
Edit: I thought the regex had n-z instead of n-t so the search space was actually much smaller.
You cant crack the md5 hash value it has used one way hashing algorithm.

md5 collision database?

I'm writing a file system deduper. The first pass generates md5 checksums, and the second pass compares the files with identical checksums.
Is there a collection of strings which differ but generate identical md5 checksums I can incorporate into my test case collection?
Update: mjv's answer points to these two files, perfect for my test case.
http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate1.cer
http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate2.cer
You can find a couple of different X.509 certificate files with the same MD5 hash at this url.
I do not know of MD5 duplicate files repositories, but you can probably create your own, using the executables and/or the techniques described on Vlastimil Klima's page on MD5 Collision
Indeed MD5 has been know for its weakness with regards to collision resistance, however I wouldn't disqualify it for a project such as your file system de-duper; you may just want to add a couple of additional criteria (which can be very cheap, computationally speaking) to further decrease the possibility of duplicates.
Alternatively, for test purposes, you may simply modify your MD5 compare logic so that it deems some MD5 values identical even though they are not (say if the least significant byte of the MD5 matches, or systematically, every 20 comparisons, or at random ...). This may be less painful than having to manufacture effective MD5 "twins".
http://www.nsrl.nist.gov/ might be what you want.

Are there algorithms for putting a digest into the file being digested?

Are there algorithms for putting a digest into the file being digested?
In otherwords, are there algorithms or libraries, or is it even possible to have a hash/digest of a file contained in the file being hashed/digested. This would be handy for obvious reasons, such as built in digests of ISOs. I've tried googling things like "MD5 injection" and "digest in a file of a file." No luck (probably for good reason.)
Not sure if it is even mathematically possible. Seems you'd be able to roll through the file but then you'd have to brute the last bit (assuming the digest was the last thing in the file or object.)
Thanks,
Chenz
It is possible in a limited sense:
Non-cryptographically-secure hashes
You can do this with insecure hashes like the CRC family of checksums.
Maclean's gzip quine
Caspian Maclean created a gzip quine, which decompresses to itself. Since the Gzip format includes a CRC-32 checksum (see the spec here) of the uncompressed data, and the uncompressed data equals the file itself, this file contains its own hash. So it's possible, but Maclean doesn't specify the algorithm he used to generate it:
It's quite simple in theory, but the helper programs I used were on a hard disk that failed, and I haven't set up a new working linux system to run them on yet. Solving the checksum by hand in particular would be very tedious.
Cox's gzip, tar.gz, and ZIP quines
Russ Cox created 3 more quines in Gzip, tar.gz, and ZIP formats, and wrote up in detail how he created them in an excellent article. The article covers how he embedded the checksum: brute force—
The second obstacle is that zip archives (and gzip files) record a CRC32 checksum of the uncompressed data. Since the uncompressed data is the zip archive, the data being checksummed includes the checksum itself. So we need to find a value x such that writing x into the checksum field causes the file to checksum to x. Recursion strikes back.
The CRC32 checksum computation interprets the entire file as a big number and computes the remainder when you divide that number by a specific constant using a specific kind of division. We could go through the effort of setting up the appropriate equations and solving for x. But frankly, we've already solved one nasty recursive puzzle today, and enough is enough. There are only four billion possibilities for x: we can write a program to try each in turn, until it finds one that works.
He also provides the code that generated the files.
(See also Zip-file that contains nothing but itself?)
Cryptographically-secure digests
With a cryptographically-secure hash function, this shouldn't be possible without either breaking the hash function (particularly, a secure digest should make it "infeasible to generate a message that has a given hash"), or applying brute force.
But these hashes are much longer than 32 bits, precisely in order to deter that sort of attack. So you can write a brute-force algorithm to do this, but unless you're extremely lucky you shouldn't expect it to finish before the universe ends.
MD5 is broken, so it might be easier
The MD5 algorithm is seriously broken, and a chosen-prefix collision attack is already practical (as used in the Flame malware's forged certificate; see http://www.cwi.nl/news/2012/cwi-cryptanalist-discovers-new-cryptographic-attack-variant-in-flame-spy-malware, http://arstechnica.com/security/2012/06/flame-crypto-breakthrough/). I don't know of what you want having actually been done, but there's a good chance it's possible. It's probably an open research question.
For example, this could be done using a chosen-prefix preimage attack, choosing the prefix equal to the desired hash, so that the hash would be embedded in the file. A
preimage attack is more difficult than collision attacks, but there has been some progress towards it. See Does any published research indicate that preimage attacks on MD5 are imminent?.
It might also be possible to find a fixed point for MD5; inserting a digest is essentially the same problem. For discussion, see md5sum a file that contain the sum itself?.
Related questions:
Is there any x for which SHA1(x) equals x?
Is a hash result ever the same as the source value?
The only way to do this is if you define your file format so the hash only applies to the part of the file that doesn't contain the hash.
However, including the hash inside a file (like built into an ISO) defeats the whole security benefit of the hash. You need to get the hash from a different channel and compare it with your file.
No, because that would mean that the hash would have to be a hash of itself, which is not possible.

Making a perfect hash (all consecutive buckets full), gperf or alternatives?

Let's say I want to build a perfect hash table for looking up an array where the predefined keys are 12 Months, thus I would want
hash("January")==0
hash("December")==11
I run my Month names through gperf and got a nice hash function, but it appears to give out 16 buckets(or rather the range is 16)!
#define MIN_HASH_VALUE 3
#define MAX_HASH_VALUE 18
/* maximum key range = 16, duplicates = 0 */
Looking at the generated gperf code, its hash function code does a simple return of len plus char value lookup from a 256 size table. Somehow, in my head I imagined a fancy looking function... :)
What if I want exactly 12 buckets(that is I do not want to skip over unused buckets)? For small sets as this, it really doesn't matter, but when I have 1000 predefined keys and want exactly 1000 buckets in a row?
Can one find a deterministic way to do this?
I was interested in the answer to this question & came to it via a search for gperf. I tried gperf, but it was very slow on a large input file and thus did not seem suitable. I tried cmph but I wasn't happy with it. It requires building a file which is then loaded into the C program at run time. Further, the program is so fragile (crashes with "segmentation fault" on any kind of mistaken input) that I did not trust it. A further Google search led me to this page, and onward to mph. I downloaded mph and found it is very nice. It has an optional program to generate a C file, called "emitc", and using it like
mph < systemdictionaryfile | emitc > output.c
worked almost instantly (a few seconds with a dictionary of about 200,000 words) and created a working C file which compiles with no problems. My tests also indicate that it works. I haven't tested the performance of the hashing algorithm yet though.
The only alternative to gperf I know is cmph : http://cmph.sourceforge.net/ but, as Jerome said in the comment, having 16 buckets provides you some speed benefit.
When I first looked at minimal perfect hasihing I found very interesting readings on CiteseerX but I resisted the temptation to try coding one of those solutions myself. I know I would end up with an inferior solution respect to gperf or cmph or, even assuming the solution was comparable, I would have to spend a lot of time on it.
There are many MPH solutions and algorithms, gerf doesn't yet do MPH's, but I'm working on it. Esp. for large sets. See https://gitlab.com/rurban/gperf/-/tree/hashfuncs
The classic cmph has a lot of constant overhead and is only recommended for huge key sets.
There's the NetBSD nbperf and my improved variant: https://github.com/rurban/nbperf
which does CHM, CHM3 and BZD, with integer key support, optimizations for smaller key sets and alternate hash functions.
There's Bob Jenkin's generator, and Taj Khattra's mph-1.2.
There are also two perl libraries to generate C lookups, one in PostgresQL (PerfectHash.pm) and one for late perl5 unicode lookups (regen/mph.pl), and a tool to compare various generators: https://github.com/rurban/Perfect-Hash

Resources