Not sure if this is possible but I want to be able to start with a string, and then figure out what the input must be into the crypt in order to get this string out.
Or maybe it's impossible, which would be the whole purpose of the thing anyways?
Yes, there is a salt in the code where I am trying this.
By design intent, crypt() is a one-way hash. As everyone has said, that means that the intent is that it would be computationally infeasible to discover a plaintext string that produces the same hash.
A couple of factors have an effect on that design intent.
Computation is a lot cheaper than it was when crypt() was designed. Worse, the rate at which computation got cheaper was not anticipated, so it is a lot cheaper now than it was ever imagined it could be.
DES hasn't held up as well as it was thought it would. It was probably the best choice given the public state of knowledge at the time, however.
Even if computation isn't yet cheap enough to do your own cracking, the cloud that is the internet has already done a lot of the work for you. People have been computing and publishing Rainbow Tables which make it possible to shortcut a lot of the computation required to reverse a particular hash. (Jeff had a blog post on rainbow tables too.) Salt helps protect against rainbow tables (because you would need a table set for each possible value of the salt), but the size of the salt used in the classic implementation of crypt() is only 12 bits so that isn't as huge a block as might be hoped.
Worse yet, for certain high-valued hash functions (like the LM hash invented for old Microsoft Lan Manager passwords but used for short password in all versions of Windows before Vista) nearly complete dictionaries of hashes and their inverses exist.
If it's an old implementation of crypt(3), using DES, then you can almost (but not quite) brute-force it.
In that scheme, the input is truncated to 8 characters, and each character to 7 bits, which means there's a 56 bit space of distinct passwords to search.
For DES alone, you can search the whole space in about 18 days on $10K worth of FPGAs (http://en.wikipedia.org/wiki/Data_Encryption_Standard#Brute_force_attack), so the expected time is 9 days. But I'm assuming you don't have $10K to spend on the problem. Give it a few more years, and who knows whether DES crackers will run in plausible time on a PC's GPU.
Even then, crypt(3) traditionally involves 25 rounds of DES, with slight modifications to the algorithm based on the salt, so you'd expect it to be at least 25 times slower to brute-force.
Newer implementations of crypt(3) are way beyond brute force, since they're based on better hash algorithms than the DES-based one that old crypt(3) used.
Of course if the string isn't random (e.g. if it's a password chosen by some human), then you may be able to get a much better expected time than brute force.
No, that's the idea behind one way hash functions, but you can use google to help you in some cases.
To answer to a comment to this answer (google won't help if there's a salt) I say: yes and no. The salt increases the solutions' space, making the creation of a full dictionary less easy (because for each word you have to compute and store one crypted version for each possible two-letter salt). If you assume the internet to be a giant database, and google its index, what google does is to search if there's somewhere the occurrence of the encrypted string around in the web. The presence of salt reduces the chance you will find it, but if you are lucky enough that the occurrence is present, and it is also together with the cleartext, then you have the password.
See also this article on slashdot.
Concluding: the salt will reduce the chance of finding that specific encrypted string around on the web, true, but google is indifferent to any amount of salt, and can still help somehow if you happen to be lucky (as it was for the case I gave).
No.
crypt() is not a reversible algorithm (it uses a one-way function) which is rendered more difficult to brute force by the addition of salt to the encrypted value.
Edited per comments.
No it's not possible take a look at this site (assuming you are using the GNU C library) http://www.gnu.org/s/libc/manual/html_node/crypt.html
The way that the crypt is salted will pretty much guarantee that what you're trying to do isn't going to work.
That function being one-way is the backbone of every password scheme in the world. If anybody here were to answer "yes, and here's how..", the government would be forced to immediately delete their comment, go burn their house down, and wisk them away to an undisclosed location.
In short, no.
Nope .. it's a one-way function.
Related
Is it possible to optimise the function:
MD5_Update(&ctx_d, buf, num);
if you know that buf contains only zeros?
Or is this mathematically impossible?
Likewise for SHA1.
If you control the input of the hash function then you could use a simple count instead of all the zero's, maybe using some kind of escape. E.g. 000020 in hex could mean 32 zero's. A (very) basic compression function may be much faster than MD5 or SHA1.
Obviously this solution will only be faster if you save one or more blocks of hash calculations. E.g. it does not matter if you hash 3 bytes or 16 bytes, as the input will be padded and expanded by the hash function before it is used.
Cryptographic hashes are actually supposed to produce significant changes in output for small changes in input, see http://en.wikipedia.org/wiki/Avalanche_effect . It sounds like you're looking for some relationship between some hashed data, and some hashed data pre-padded with zeros. By design this change in your input should produce output that isn't clearly related.
EDIT: To answer your question directly, by design "a small change in either the key or the plaintext should cause a drastic change in the ciphertext" which means it's meant to be mathematically difficult to do.
You'd probably get some speedup, but it'd be relatively minor. The most important thing for high performance hashing is choosing an optimized implementation, and to use GPUs(or even FPGA/ASIC) to exploit parallelism if that's possible.
There is a known speedup for SHA-1 with fixed IV and messages that differ only a little. That speedup is around 21%. See New attack makes some password cracking faster - Ars Technica.
You might get a similar speedup when you have a completely fixed message but a variable IV. But it'd be a lot of work to implement this, especially as a non expert. Buying additional hardware is probably much cheaper than speeding up your code a few percent.
If the beginning of your message consists of multiple constant blocks, you can hash them once, and cache the intermediate state of the hashfunction. Might or might not be applicable to your situation.
I know that SHA-256 is favored over MD5 for security, etc., but, if I am to use a method to only check file integrity (that is, nothing to do with password encryption, etc.), is there any advantage of using SHA-256?
Since MD5 is 128-bit and SHA-256 is 256-bit (therefore twice as big)...
Would it take up to twice as long to encrypt?
Where time is not of essence, like in a backup program, and file integrity is all that is needed, would anyone argue against MD5 for a different algorithm, or even suggest a different technique?
Does using MD5 produce a checksum?
Both SHA256 and MD5 are hashing algorithms. They take your input data, in this case your file, and output a 256/128-bit number. This number is a checksum. There is no encryption taking place because an infinite number of inputs can result in the same hash value, although in reality collisions are rare.
SHA256 takes somewhat more time to calculate than MD5, according to this answer.
Offhand, I'd say that MD5 would be probably be suitable for what you need.
Every answer seems to suggest that you need to use secure hashes to do the job but all of these are tuned to be slow to force a bruteforce attacker to have lots of computing power and depending on your needs this may not be the best solution.
There are algorithms specifically designed to hash files as fast as possible to check integrity and comparison (murmur, XXhash...). Obviously these are not designed for security as they don't meet the requirements of a secure hash algorithm (i.e. randomness) but have low collision rates for large messages. This features make them ideal if you are not looking for security but speed.
Examples of this algorithms and comparison can be found in this excellent answer: Which hashing algorithm is best for uniqueness and speed?.
As an example, we at our Q&A site use murmur3 to hash the images uploaded by the users so we only store them once even if users upload the same image in several answers.
To 1):
Yes, on most CPUs, SHA-256 is about only 40% as fast as MD5.
To 2):
I would argue for a different algorithm than MD5 in such a case. I would definitely prefer an algorithm that is considered safe. However, this is more a feeling. Cases where this matters would be rather constructed than realistic, e.g. if your backup system encounters an example case of an attack on an MD5-based certificate, you are likely to have two files in such an example with different data, but identical MD5 checksums. For the rest of the cases, it doesn't matter, because MD5 checksums have a collision (= same checksums for different data) virtually only when provoked intentionally.
I'm not an expert on the various hashing (checksum generating) algorithms, so I can not suggest another algorithm. Hence this part of the question is still open.
Suggested further reading is Cryptographic Hash Function - File or Data Identifier on Wikipedia. Also further down on that page there is a list of cryptographic hash algorithms.
To 3):
MD5 is an algorithm to calculate checksums. A checksum calculated using this algorithm is then called an MD5 checksum.
The underlying MD5 algorithm is no longer deemed secure, thus while md5sum is well-suited for identifying known files in situations that are not security related, it should not be relied on if there is a chance that files have been purposefully and maliciously tampered. In the latter case, the use of a newer hashing tool such as sha256sum is highly recommended.
So, if you are simply looking to check for file corruption or file differences, when the source of the file is trusted, MD5 should be sufficient. If you are looking to verify the integrity of a file coming from an untrusted source, or over from a trusted source over an unencrypted connection, MD5 is not sufficient.
Another commenter noted that Ubuntu and others use MD5 checksums. Ubuntu has moved to PGP and SHA256, in addition to MD5, but the documentation of the stronger verification strategies are more difficult to find. See the HowToSHA256SUM page for more details.
No, it's less fast but not that slow
For a backup program it's maybe necessary to have something even faster than MD5
All in all, I'd say that MD5 in addition to the file name is absolutely safe. SHA-256 would just be slower and harder to handle because of its size.
You could also use something less secure than MD5 without any problem. If nobody tries to hack your file integrity this is safe, too.
It is technically approved that MD5 is faster than SHA256 so in just verifying file integrity it will be sufficient and better for performance.
You are able to checkout the following resources:
Speed Comparison of Popular Crypto Algorithms
Comparison of cryptographic hash functions
Yes, on most CPUs, SHA-256 is two to three times slower than MD5, though not primarily because of its longer hash. See other answers here and the answers to this Stack Overflow questions.
Here's a backup scenario where MD5 would not be appropriate:
Your backup program hashes each file being backed up. It then stores
each file's data by its hash, so if you're backing up the same file
twice you only end up with one copy of it.
An attacker can cause the system to backup files they control.
The attacker knows the MD5 hash of a file they want to remove from the
backup.
The attacker can then use the known weaknesses of MD5 to craft a new
file that has the same hash as the file to remove. When that file is
backed up, it will replace the file to remove, and that file's backed up
data will be lost.
This backup system could be strengthened a bit (and made more efficient)
by not replacing files whose hash it has previously encountered, but
then an attacker could prevent a target file with a known hash from
being backed up by preemptively backing up a specially constructed bogus
file with the same hash.
Obviously most systems, backup and otherwise, do not satisfy the
conditions necessary for this attack to be practical, but I just wanted
to give an example of a situation where SHA-256 would be preferable to
MD5. Whether this would be the case for the system you're creating
depends on more than just the characteristics of MD5 and SHA-256.
Yes, cryptographic hashes like the ones generated by MD5 and SHA-256 are a type of checksum.
Happy hashing!
If I use getenv() to disable some verifications of my program (for instance license checking) will a hacker be able to discover easily the concerned environment variable (using strace or other ?)
Exemple of code:
if (! getenv("my_secret_env_variable")) checkLicense();
(If, on the other hand, I checked the presence of a specific file, the hacker would see it immediately with strace)
Let me add to the existing answers to give you a bit of a broader view about software protection.
Hackers won't just use strace, they'll use whatever tools they have in their tool chest but in order of increasing complexity, perhaps merely starting with something as simple as strings in most cases. Most hackers I know are inherently lazy and will therefore take the path of least resistance. (NB: by hacker I mean a technically very skilled person, not a cracker - the latter often has the same skill set, but a different set of ethics).
Generally speaking from the perspective of the reverse engineer, just about anything can be "cracked" or worked around. The question is how much time and/or determination the attacker has. Consider that some student may do this just for giggles, while some "release groups" do this for fame within their "scene".
Let's consider hardware dongles for example. Most software authors/companies think that they somehow magically "buy" security when licensing some dongle. However, if they aren't careful with the implementation of the system it is as simple to work around as your attempt. Even when they are careful enough, it is often still possible to emulate a dongle although it will require some skill to extract the information on the dongle. Some dongles (I will not conceal that fact from you) are therefore "smart", meaning they contain a CPU or even a full-fledged embedded system. If vital parts of a software product are executed on the dongle and all that enters the dongle is the input and all that leaves the dongle is the output, that makes for a pretty good protection. However, it will equally annoy honest customers and attackers for the most part.
Or let's consider encryption as another example. Many developers don't seem to grasp the concept of a public and a secret key and think that "hiding" the secret key inside the code makes it somehow safer. Well, it doesn't. The code contains the algorithm and the secret key now, how convenient is that for the attacker?
The general problem in most of these cases is that on one hand you trust the users (because you sell to them), but on the other hand you don't trust them (because you try to protect your software somehow). When you look at it this way you can see how futile it actually is. Most of the time you will disgruntle the honest customers, while only delaying the attacker a little (software protection is binary: either it gives protection or it doesn't, i.e. it's already cracked).
Consider instead the path the makers of IDA Pro took. They watermark all their binaries before the user gets them. Then, in case those binaries get leaked, legal measures can be taken. And even without taking legal measures into account they can shame (and have shamed) those that leaked their product publicly. Also, if you are responsible for a leak you won't be sold any upgrades to the software and the makers of IDA will not do business with your employer. That's quite an incentive to keep your copy of IDA safe. Now, I get it, IDA is somewhat of a niche product, but still the approach is fundamentally different and doesn't have the same issues as the conventional attempts at protecting software.
Another alternative is of course to offer a service rather than a software. So you could give the user a token that the software sends to your server. The server then offers an update (or whatever the service) based on decoding the token (which we assume to be an encrypted message) and checking validity. In this case the user would only receive the token but never the secret key to decode it, which your server on the other hand would have to validate the token. Call it product key or whatever, there are dozens of ways one can imagine. The point is that you don't end up in that contradiction of trusting and mistrusting the user at the same time. You only mistrust the user and can for example blacklist her token if that has been abused.
Yes. Any hardcoded string is trivially easy to discover inside of a compiled binary. The library call will also be easy to see. It's also possible to change the string inside of the binary to something else.
We also can LD_PRELOAD the getenv function to display the parameter receive
the hacker would see it immediately with strace - maybe you should take a look at ltrace as well?
While you might not be able to hide the variable name, why not require a value? In particular you could use an integer for a valid value (atoi) since they are a lot harder to spot in code, or even a combination of ints and single chars. However remember that the environment block is an easy part of memory to find, especially in a core dump.
Yes - Easy. They just use strings to find out what to try.
I read somewhere that md5 is not 100% secure. Hence, the question.
You seem to be asking 2 separate but related questions.
The probability of a random collision is highly dependent on the size of the data that you're working with; the more strings you're hashing, the more likely a collision is to occur. See the first table at Wikipedia: Birthday Attack for exact probabilities. MD5 uses 128 bits, so to achieve a 50% collision probability, you'll need 2.2E19 strings.
However, while random collisions are suitably rare for small data sets, MD5 has been shown to be completely insecure against intentional collisions. According to the Wikipedia article on MD5, a collision attack exists that can be run in seconds on a 2.6Ghz Pentium4 processor. For security, MD5 is completely broken, and has been considered so since 2005.
If you need to securely hash something, use one of the more modern hashing algorithms, such as SHA-2, SHA-3 (when it's development is finished), or Whirlpool.
I would like to prevent duplicate content. I do not want to keep a copies of content, so I decided to keep just the md5 signatures.
I read that md5 collisions do happen, different content could give in the same md5 signature.
Do you think md5 is enough?
Should I use md5 and sh1 together?
People have been able to deliberately produce MD5 collisions under contrived circumstances, but for preventing duplicate content (in the absence of malicious users) it's more than adequate.
Having said that, if you can use SHA-1 (or SHA-2) you should - you'll be fractionally but measurably safer from collisions.
MD5 should be fine, collisions are very rare, but if you're really worried, you can use sha-1 as well.
Though I guess the signatures really aren't that large, so if you have the spare processing cycles and the disk space, you could do both. But if space or speed is limited, I'd just go with one.
Why not simply compare the content byte for byte if there is a hash collision? hash collisions are very rare, and so you're only going to have to do a byte for byte check very rarely. That way duplicates will only be detected if the items are actually duplicated
md5 should be enough. Yes, there can be collisions, but the chances of that happening are so incredibly small that I wouldn't worry about it unless you were literally tracking many billions of pieces of content.
If you're really afraid of accidental collisions just do both MD5 and SHA1 hashes and compare them. If they both match, it's the same content. If either one differs, it's different content.
Combining algorithms serves to only obfuscate, but does not increase security in a hashing algorithm.
MD5 is too broken to use anyway, IMHO. Forging MD5 hashes is proven by researchers, where they demonstrated being able to forge content that generates an MD5 collision, thereby opening the door to generating a forged CSR to buy a cert from RapidSSL for a domain name they don't own. Security Now! episode 179 explains the process.
For me, SHA-based hashes are stronger and most development platforms support it so the choice is easy. The remaining deciding factor is then the block size.
A timestamp + md5 together are safe enough.
MD5 is broken and SHA1 is close to it. Use SHA2.
edit
Based on an update from the OP, it doesn't seem that intentional collisions are a serious concern here. For unintentional ones, any decent hash with at least a 64-bit output would be fine.
I would still avoid MD5 and even SHA1, in general, but there's no reason to be dogmatic about it. If the tool fits here, then by all means use it.