using HASHBYTES function to hash data - sql-server

I have data I need to hash.
I have list of numbers to compare for hashed numbers.
as far as someone could tell me, the data was hashed with SHA_256
I have only one example for input and output and I need to find out the hashing logic in SQL Server.
This is the original number: 02229747
And this is the hash number: 4ad54f5b376038f49a44d411e6d551ae4c8dd147c8605a7eec32ba850080b326
I have tried using the following but I can't manage to get the same result.
declare #number bigint = 022529747 
DECLARE #HashId varbinary(50) = HashBytes('SHA2_256', cast(#number as varbinary(50)))
select #HashId

Hashbytes can take a number of different algorithms as input. Try each one:
select
a = hashbytes('MD2', '022529747'),
b = hashbytes('MD4', '022529747'),
c = hashbytes('MD5', '022529747'),
d = hashbytes('SHA', '022529747'),
e = hashbytes('SHA1', '022529747'),
f = hashbytes('SHA2_256', '022529747'),
g = hashbytes('SHA2_512', '022529747')
Column f returns the value you are looking for, so the algorithm used was SHA2_256. Note that I am putting the data in as a string (varchar), not an integer (bigint).. The bytes which represent 022529747 as a varchar are very different from the bytes which represent 022529747 as a bigint.
Background:
Hashing and encryption are different.
SHA stands for "secure hashing algorithm". It takes some input, and produces an output hash. If the input changes, the hash changes (with the limit of the birthday problem. But you can't go backwards. You can't take the output hash, and turn it back into the input data. The best you can do is try every different possible input, and see if that input generates the hash. See this 3Blue1Brown video for an illustrative explanation.
SHA is a family of cryptographic hash functions, but don't let the name fool you. "Cryptographic" doesn't mean the same thing as "encryption". It just means that it's "hard to guess" what the input data might be based on the output, because the output appears random. See This thread for the difference between a hash function and a cryptographic hash function
AES stands for "advanced encryption standard". This is a symmetric key encryption. Data encrypted with AES can be decrypted back to the original input. The "symmetric" part means one key is used to both encrypt, and decrypt (compared to, e.g., PGP encryption, which uses different keys to encrypt and decrypt).
The SQL hashbytes function can use a number of different algorithms, but none of the are reversible. They are all hashing algorithms, not encryption algorithms.
If you need encryption and decryption in code, the correct SQL functionality to use is EncryptByKey and DecryptByKey

Related

How to decrypt a MD5 Password that is exactly 9 digits long?

I have a hash from my school work that I need to decrypt.
The hash is: 68728d8fa7977d2567c6363381eda037.
It looks like it uses either MD4 or MD5 hashing algorithm
How to decrypt it?
You "decrypt" a hash by making a lookup table. In your case, if you know that the password is exactly 9 digits long, you'd create a hash for every possible 9-digit password and compare each one to the hash you're trying to decrypt.
By definitions hash is one way encryption so you can't decrypt it. You can use kind of tables with all possible combinations. Which is CPU consuming process. Here is one simple code on meta language:
for i from 0 to 999999999
h=md5(i)
write(file,h)
endoffor
line=search(file,inputhash)
print(line)

How can we turn hash sha256 of a passphase into an EC_key private key?

i have a question, i just practice C & OpenSSL recently & notice this is the common way to create EC_Key:
EC_KEY *eckey = EC_KEY_new();
EC_GROUP *ecgroup= EC_GROUP_new_by_curve_name(NID_secp192k1);
int set_group_status = EC_KEY_set_group(eckey,ecgroup);
int gen_status = EC_KEY_generate_key(eckey);
This method generate EC_key based on a random interger. May i ask if is there any code that we can declare a hash sha256 of a passphase & make it private key of a EC_key we just create since i read that EC_key's private key has the same format with hash sha256?
//Example
char* exam = "somewhere over the rainbow";
unsigned char output[32];
SHA256(exam, strlen(exam), output);
Not directly for that curve.
An ECC private key is actually a random integer less than the order of the base point, or equivalently the order of the group generated by the base point.
Although it is not true for all ECC curves (groups), the X9/Certicom/NIST prime curves were generated so that the generated group order is equal to the curve order (formally, cofactor = 1), and the curve order is always close to the underlying field order which for these curves was chosen very close to 2N.
Thus a private key for a 256-bit prime curve, like P-256/secp256r1 (commonly used in TLS, and SSH, and some other applications) or secp256k1 (used in Bitcoin and some derivative coins), is almost a random 256-bit string -- close enough that in practice it will work.
Similarly for secp192k1 a random 192-bit string is close enough, and could be generated by taking the first 192 bits of a SHA-256 output (or last, or middle, if you prefer) as long as it was computed on input (your passphrase) having sufficient entropy to provide the desired security.
If by passphrase you mean a phrase chosen by a person, no. There is abundant evidence that people do not choose randomly even when they try to, and passwords and passphrases chosen by people, and not 'strengthened' cryptographically which your method does not, are regularly broken. As an example, this was tried in the Bitcoin community a few years ago under the name 'brain wallet' -- i.e. your private key, giving access to your bitcoins, is in your brain. Many of these keys were broken and the bitcoins stolen.
If you mean a series of words (not really a meaningful phrase) generated randomly by the computer to have sufficient entropy, or by some other process that actually is random like rolling fair dice, then yes. The current standard in Bitcoin for a 'seed phrase' is 12 words from a list of 2048 giving 128 bits of entropy plus 4 bits of redundancy; for your curve you only need 96 bits of entropy so 9 such words would work (although it isn't standard). Numerous other similar schemes have been developed and used over the years. In practice you will probably have to write this 'phrase' down and/or store it somewhere, and then secure that storage appropriately.

Combining two GUID/UUIDs with MD5, any reasons this is a bad idea?

I am faced with the need of deriving a single ID from N IDs and at first a i had a complex table in my database with FirstID, SecondID, and a varbinary(MAX) with remaining IDs, and while this technically works its painful, slow, and centralized so i came up with this:
simple version in C#:
Guid idA = Guid.NewGuid();
Guid idB = Guid.NewGuid();
byte[] data = new byte[32];
idA.ToByteArray().CopyTo(data, 0);
idB.ToByteArray().CopyTo(data, 16);
byte[] hash = MD5.Create().ComputeHash(data);
Guid newID = new Guid(hash);
now a proper version will sort the IDs and support more than two, and probably reuse the MD5 object, but this should be faster to understand.
Now security is not a factor in this, none of the IDs are secret, just saying this 'cause everyone i talk to react badly when you say MD5, and MD5 is particularly useful for this as it outputs 128 bits and thus can be converted directly to a new Guid.
now it seems to me that this should be just dandy, while i may increase the odds of a collision of Guids it still seems like i could do this till the sun burns out and be no where near running into a practical issue.
However i have no clue how MD5 is actually implemented and may have overlooked something significant, so my question is this: is there any reason this should cause problems? (assume sub trillion records and ideally the output IDs should be just as global/universal as the other IDs)
My first thought is that you would not be generating a true UUID. You would end up with an arbitrary set of 128-bits. But a UUID is not an arbitrary set of bits. See the 'M' and 'N' callouts in the Wikipedia page. I don't know if this is a concern in practice or not. Perhaps you could manipulate a few bits (the 13th and 17th hex digits) inside your MD5 output to transform the hash outbut to a true UUID, as mentioned in this description of Version 4 UUIDs.
Another issue… MD5 does not do a great job of distributing generated values across the range of possible outputs. In other words, some possible values are more likely to be generated more often than other values. Or as the Wikipedia article puts it, MD5 is not collision resistant.
Nevertheless, as you pointed out, probably the chance of a collision is unrealistic.
I might be tempted to try to increase the entropy by repeating your combined value to create a much longer input to the MD5 function. In your example code, take that 32 octet value and use it repeatedly to create a value 10 or 1,000 times longer (320 octects, 32,000 or whatever).
In other words, if working with hex strings for my own convenience here instead of the octets of your example, given these two UUIDs:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1
FCF1B8E4-5548-4C43-995A-8DA2555459C8
…instead of feeding this to the MD5 function:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…feed this:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…or something repeated even longer.

Are standard hash functions like MD5 or SHA1 quaranteed to be unique for small input (4 bytes)?

Scenario:
I'm writing web service, that will act like identity provider for 3pty application. I have to send to this 3pty application some unique identifier of our user. In our database, unique user identifier is integer (4 bytes, 32 bites). Per our security rules I can't send those in plain form - so sending them out hashed (trough function like MD5 or SHA1) was my first idea.
Problem:
The result of MD5 is 16 bytes, result of SHA1 is 40 bytes, I know they can't be unique for larger input sets, but given the fact my input set is only 4 bytes long (smaller then hashed results) - are they guaranteed to be unique, or am I doomed to some poor-man hash function (like xoring the integer input with some number, shifting bites, adding predefined bites, etc.) ?
For what you're trying to achieve (preventing a 3rd party from determining your user identifier), a straight MD5 or SHA1 hash is insufficient. 32 bits = about 4 billion values, it would take less than 2 hours for the 3rd party to brute force every value (#1m hashes/sec). I'd really suggest using HMAC-SHA1 instead.
As for collisions, this question has an extremely good answer on their likelihood. tl;dr For 32-bits of input, a collision is excessively small.
If your user identifiers aren't random (they increment by 1 or there is a known algorithm for creating them), then there's no reason you can't generate every hash to make sure that no collision will occur.
This will check the first 10,000,000 integers for a collision with HMAC-SHA1 (will take about 2 minutes to run):
public static bool checkCollisionHmacSha1(byte[] key){
HMACSHA1 mac = new HMACSHA1(key);
HashSet<byte[]> values = new HashSet<byte[]>();
bool collision = false;
for(int i = 0; i < 10000000 && collision == false; i++){
byte[] value = BitConverter.GetBytes(i);
collision = !values.Add(mac.ComputeHash(value));
if (collision)
break;
}
return collision;
}
First, SHA1 is 20 bytes not 40 bytes.
Second, although input is very small, there still may be a collision. It is best to test this, but I do not know a feasible way to do that.
In order to prevent any potential collision:
1 - Hash your input and produce the 16/20 bytes of hash
2 - Spray your actual integer onto this hash.
Like put a byte of your int every 4/5 bytes.
This will guarantee the uniqueness by using the input itself.
Also, take a look at Collision Column part

How to crack a weakened TEA block cipher?

At the moment I am trying to crack the TEA block cipher in C. It is an assignment and the tea cipher has been weakend so that the key is 2 16-bit numbers.
We have been given the code to encode plaintext using the key and to decode the cipher text with the key also.
I have the some plaintext examples:
plaintext(1234,5678) encoded (3e08,fbab)
plaintext(6789,dabc) encoded (6617,72b5)
Update
The encode method takes in plaintext and a key, encode(plaintext,key1). This occurs again with another key to create the encoded message, encode(ciphertext1,key), which then creates the encoded (3e08,fbab) or encoded (6617,72b5).
How would I go about cracking this cipher?
At the moment, I encode the known plaintext with every possible key; the key size being hex value ffffffff. I write this to file.
But now I am stuck and in need of direction.
How could I use the TEA's weakness of equivalent keys to lower the amount of time it would take to crack the cipher? Also, I am going to use a man in the middle attack.
As when I encode with known plaintext and all key 1s it will create all the encrypted text with associated key and store it in a table.
Then I will decrypt with the known ciphertext that is in my assignment with all the possible values of key2. This will leave me with a table of decrypts that has only been decrypted once.
I can then compare the 2 tables together to see if any of encrpts with key1 match the decrypts with key2.
I would like to use the equilenvent weakness as well, if someone could help me with implmenting this in code that would be great. Any ideas?
This is eerily similar to the Double Crypt problem from the IOI '2001 programming contest. The general solution is shown here, it won't give you the code but might point you in the right direction.
Don't write your results to a file -- just compare each ciphertext you produce to the known ciphertext, encoding the known plain text with every possible key until one of them produces the right ciphertext. At that point, you've used the correct key. Verify that by encrypting the second known plaintext with the same key to check that it produces the correct output as well.
Edit: the encoding occurring twice is of little consequence. You still get something like this:
for (test_key=0; test_key<max; test_key++)
if (encrypt(plaintext, test_key) == ciphertext)
std::cout << "Key = " << test_key << "\n";
The encryption occurring twice means your encrypt would look something like:
return TEA_encrypt(TEA_encrypt(plaintext, key), key);
Edit2: okay, based on the edited question, you apparently have to do the weakened TEA twice, each with its own 16-bit key. You could do that with a single loop like above, and split up the test_key into two independent 16-bit keys, or you could do a nested loop, something like:
for (test_key1=0; test_key1<0xffff; test_key1++)
for (test_key2=0; test_key2<0xffff; test_key2++)
if (encrypt(encrypt(plaintext, test_key1), test_key2) == ciphertext)
// we found the keys.
I am not sure if this property holds for 16-bit keys, but 128-bit keys have the property that four keys are equivalent, reducing your search space by four-fold. I do not off the top of my head remember how to find equivalent keys, only that the key space is not as large as it appears. This means that it's susceptible to a related-key attack.
You tagged this as homework, so I am not sure if there are other requirements here, like not using brute force, which it appears that you are attempting to do. If you were to go for a brute force attack, you would probably need to know what the plaintext should look like (like knowing it English, for example).
The equivalent keys are easy enough to understand and cut key space by a factor of four. The key is split into four parts. Each cycle of TEA has two rounds. The first uses the first two parts of the key while the second uses the 3rd and 4th parts. Here is a diagram of a single cycle (two rounds) of TEA:
(unregistered users are not allowed to include images so here's a link)
https://en.wikipedia.org/wiki/File:TEA_InfoBox_Diagram.png
Note: green boxes are addition red circles are XOR
TEA operates on blocks which it splits into two halves. During each round, one half of the block is shifted by 4,0 or -5 bits to the left, has a part of the key or the round constant added to it and then the XOR of the resulting values is added to the other half of the block. Flipping the most significant bit of either key segment flips the same bit in the sums it is used for and by extension the XOR result but has no other effect. Flipping the most significant bit of both key segments used in a round flips the same bit in the XOR product twice leaving it unchanged. Flipping those two bits together doesn't change the block cipher result making the flipped key equivalent to the original. This can be done for both the (first/second) and (third/fourth) key segments reducing the effective number of keys by a factor of four.
Given the (modest) size of your encryption key, you can afford to create a pre-calculated table (use the same code given above, and store data in large chuncks of memory - if you don have enough RAM, dump the chuncks to disk and keep an addressing scheme so you can lookup them in a proper order).
Doing this will let you cover the whole domain and finding a solution will then be done in real-time (one single table lookup).
The same trick (key truncation) was used (not a long time ago) in leading Office software. They now use non-random data to generate the encryption keys -which (at best) leads to the same result. In practice, the ability to know encryption keys before they are generated (because the so-called random generator is predictable) is even more desirable than key-truncation (it leads to the same result -but without the hurdle of having to build and store rainbow tables).
This is called the march of progress...

Resources