MD5 hashes and Regular Expressions - md5

I received a MD5 hash and a Regular Expression which have the same plaintext..
How do I use the Regular Expression to crack the MD5 hash and find the text behind the MD5?
b89e49cab317f2681be60fb3d1c0f8f8
[(a|c|d)n-t\|]{8}

The idea would be to use the regex as a template and generate inputs that satisfy it.
You can search for a regex visualizer to see this, but what that one says is any of the characters ()acd| or any character between n and t (inclusive) in any order, repeated eight times. I tested this in hashcat, and the regex is correct despite it looking like it means something else. A shorter way to write that would be [acd|()n-t]{8}.
So you start generating 8 character strings with those values and taking the md5 of them. You can do this in almost any programming language but Python is a good choice. Look up the hashlib library, it has a function md5. You'll call the function hexdigest on that and compare it to the provided hash.
>>> import hashlib
>>> hashlib.md5(b'cybering').hexdigest()
'61e4feebe66ad22349e292d1462afd3a'
Additionally, if you want to use cracking software, look up JohnTheRipper or hashcat. You should be able to provide them a dictionary and have it attempt to break the hash. I was able to solve this with hashcat on my 980ti in ~5 seconds. This tutorial helped me set up the custom charset and mask to perform the attack.
Have fun!

One approach would be to generate all possible eight-character combinations (with repetition) of the 19 characters allowed by the regex. Test each combination by computing the md5 hash and comparing it to the one you were given.
That would be 13^8 = 815,730,721 possible combinations to check. The answer will likely be found before checking all of them.
I was able to whip out a little Node.js program on my laptop that found the solution in about 4 minutes (I split the problem up using workers to take advantage of multiple CPU cores).
Edit: I thought the regex had n-z instead of n-t so the search space was actually much smaller.

You cant crack the md5 hash value it has used one way hashing algorithm.

Related

Retrieve a sha256 from another sha256

I have a project where we are supposed to find a sha256 from another sha256.
For example we have
hash1 = "45f1bc22c29626c6db37d483273afbe0f6c434de02fe86229d50c9e71ed144fc"
and we would have to find
hash0 = "5495a885b7f445a198cc5b67a517a0e0536792ab3e7ead18a12c75f8310a9b89"
hash1 is just the hash0 used in a sha256 function.
Initially, I went on to redo every possibility of sha256 but it may do a lot and take a lot of time.
If you have any idea how I could do this, even see if it's possible?
It's impossible, hashes can't be inverted, regardless if they're hashes of hashes or of other content. You could try all the combinations, sure, knowing that the source is a 256-bit sequence, but this is practically prohibitive, although there's always a possibility of 1 out of 2^256 that you guess it at the first try :-)

What is the conflict probability of md5 digestion if input string only contains alphanumericals

The input strings have the following conditions:
Only contain alphanumericals ([a-zA-Z0-9])
The size of a string is always less than 256 bytes
Total number of input strings is less then 1000,000
So what is the conflict probability of md5 digestion if the input strings are all under the above conditions? Can I just assume that there has no conflict?
If the inputs are random the likelihood of a collision in that input set is very low. That being said MD5 is a broken algorithm and a human can easily use software to find a collision. So you probably just shouldn't use MD5, but it depends on what you're using it for. I'm not sure why you would ever want to use MD5 anymore. You should look into the blake2 family or the newer SHAs (SHA256, SHA512, not SHA-1). If these are passwords you should pretty much definitely be using a hash designed for passwords like PBKDF2 or one of the Argons. To be honest I'd recommend just using libsodium's defaults for most things.

Case-insensitive, exact substring matching/index for Node.js or C (no full-text search)

What libraries provide case-insensitive, exact substring matching in Node.js against a large corpus of strings? I'm specifically looking for index-based solutions.
As an example, consider a corpus consists with millions of strings:
"Abc Gef gHi"
"Def Ghi xYz"
…
I need a library such that a search for "C ge" returns the first string above, but a search for "C ge" (note the multiple spaces) does not. In order words, I'm not looking for fuzzy, intelligent, full-text search with stemming and stop words; rather, the most simple (and fast) exact substring matcher with an index that works on a large scale.
Solutions in JavaScript are welcome, and so are solutions in C (as they can be turned into a native Node.js module). Alternatively, solutions in other programming languages such as Java are also possible; they can be used through the command-line. Preferably, solutions are disk-space-bound rather than memory-bound (e.g., rather not Redis), and they should write an index to disk so that subsequent startup time is low.
The problem with most solutions I found (such as the ones here), is that they are too intelligent. I.e., they apply different kinds of stemming or normalization, so the matches are not exact.
Thanks in advance for your help!
I'll list some of the solutions I found.
The most simple, but fitting would be https://github.com/martijnversluis/JsSuffixTrie
Then, more elaborate, hash based: https://github.com/fergiemcdowall/search-index
I can also suggest http://redis.io/. It's advanced, but still quite low-level. Not too many fancy packaging.
Finally, this blog post discusses tries in javascript, where the problem seems to be mostly loading time: http://ejohn.org/blog/javascript-trie-performance-analysis/
On the top of my head I can think of two possible solutions.
One is to use case-insensitive regex (having the string you search for (e.g. "C ge") being the regex) matching.
Another is to store an all lower (or upper) case copy of all strings and use those for the searching while returning the unmodified string. Of course the search-string need to be made all lower (or upper) case for this to work.
It depends of course on the size of your dataset and minimum response times.
For many use cases standard Unix tools such as sed and grep are pretty unbeatable when it comes to pattern matching.

MD5 collision for known input

Is it possible to create a MD5 collision based on a known input value?
So for example I have input string abc with MD5 900150983cd24fb0d6963f7d28e17f72.
Now I want to add bytes to string def to get the same MD5 900150983cd24fb0d6963f7d28e17f72.
(I know this is possible by bruteforcing and waiting a long time; I want to know if there is a more efficient way in doing this)
Unitl now no algorithm has been discovered that allows you to find a matching input that will generate a given md5 hash.
What has been proven is that you can create md5 collisions quite easily, for example with what is known as chosen-prefix-collision: you can create two files yielding the same md5 hash by appending different data to a specified file. If you want to know more or get the program to try it, look here.

Making a perfect hash (all consecutive buckets full), gperf or alternatives?

Let's say I want to build a perfect hash table for looking up an array where the predefined keys are 12 Months, thus I would want
hash("January")==0
hash("December")==11
I run my Month names through gperf and got a nice hash function, but it appears to give out 16 buckets(or rather the range is 16)!
#define MIN_HASH_VALUE 3
#define MAX_HASH_VALUE 18
/* maximum key range = 16, duplicates = 0 */
Looking at the generated gperf code, its hash function code does a simple return of len plus char value lookup from a 256 size table. Somehow, in my head I imagined a fancy looking function... :)
What if I want exactly 12 buckets(that is I do not want to skip over unused buckets)? For small sets as this, it really doesn't matter, but when I have 1000 predefined keys and want exactly 1000 buckets in a row?
Can one find a deterministic way to do this?
I was interested in the answer to this question & came to it via a search for gperf. I tried gperf, but it was very slow on a large input file and thus did not seem suitable. I tried cmph but I wasn't happy with it. It requires building a file which is then loaded into the C program at run time. Further, the program is so fragile (crashes with "segmentation fault" on any kind of mistaken input) that I did not trust it. A further Google search led me to this page, and onward to mph. I downloaded mph and found it is very nice. It has an optional program to generate a C file, called "emitc", and using it like
mph < systemdictionaryfile | emitc > output.c
worked almost instantly (a few seconds with a dictionary of about 200,000 words) and created a working C file which compiles with no problems. My tests also indicate that it works. I haven't tested the performance of the hashing algorithm yet though.
The only alternative to gperf I know is cmph : http://cmph.sourceforge.net/ but, as Jerome said in the comment, having 16 buckets provides you some speed benefit.
When I first looked at minimal perfect hasihing I found very interesting readings on CiteseerX but I resisted the temptation to try coding one of those solutions myself. I know I would end up with an inferior solution respect to gperf or cmph or, even assuming the solution was comparable, I would have to spend a lot of time on it.
There are many MPH solutions and algorithms, gerf doesn't yet do MPH's, but I'm working on it. Esp. for large sets. See https://gitlab.com/rurban/gperf/-/tree/hashfuncs
The classic cmph has a lot of constant overhead and is only recommended for huge key sets.
There's the NetBSD nbperf and my improved variant: https://github.com/rurban/nbperf
which does CHM, CHM3 and BZD, with integer key support, optimizations for smaller key sets and alternate hash functions.
There's Bob Jenkin's generator, and Taj Khattra's mph-1.2.
There are also two perl libraries to generate C lookups, one in PostgresQL (PerfectHash.pm) and one for late perl5 unicode lookups (regen/mph.pl), and a tool to compare various generators: https://github.com/rurban/Perfect-Hash

Resources