Algorithm for unique KEY generation

Algorithm for unique KEY generation - c

i need to generate unique values by using ipv4 and ipv6 addresses i.e if i input 192.37.4.60; a unique key should be generated and when i enter 192.60.4.37 a another key should be generated. How can i do this can any one help me out and i can also input ipv6 address also how to generate unique values to each input. can any one propse any algorithm or any present algorithm

Convert the IP into its numerical (decimal) representation:
10.0.0.1 -> 00001010 00000000 00000000 00000001 -> 167772161
This is how a lot of IP addresses are stored internally. It's nice because it only requires 32 bits of storage. You can do this for IPv6 too, but it's going to require something bigger than a uint32.

The IPs are pretty unique :) Especially IPv6 addresses.
Also, you can always use a hash algorithm (e.g. MD5, SHA1 etc.) to create a "key". It will be unique, as long as the input data is also unique :)

Output the input IP address: voilà, requirements met!
(If my solution doesn't work for you, it means you need to add more details to your question)

one possible solution can be to use left shift operator and add. For example if a, b, c and d represent the octets then following code will give you a unique value
int a=1;
int b=2;
int c=3;
int d=4;
int value =(a<<24)+(b<<16)+(c<<8)+d;

There are a couple of solution possible depending of what are the needs of your problem.
You could use the IP address themselves, but keep in mind that an IP address can be spoofed.
If you intend to use this key amongst multiple peers in order to secure a communication channel, then you might want to take a look towards the symmetric-key or public-key algorithms
If you only want to use them for some static data you can use either of these : MD5, AES and SHA*.
You might want to look to use multiple source for your algorithm. Consider using, in combination with the MAC address, any other material-related information that you can obtain from the machine/client on which the application will run

You don't state any required properties of the keys excet that thy should be unique, so the obvious solution is to use the canonicalized IP addresses as keys. You can turn the addresses into numbers the obvious way, but be warned that IPv6 addresses make for huge numbers, so you'll need the BigInt implementation in whatever language you use.
(If you didn't actually mean that all 340 undecillion addresses should have unique keys, then of course you should look at normal hash functions instead.)

Another option can be to use the inet_pton directly.

Related

Combining two GUID/UUIDs with MD5, any reasons this is a bad idea?

I am faced with the need of deriving a single ID from N IDs and at first a i had a complex table in my database with FirstID, SecondID, and a varbinary(MAX) with remaining IDs, and while this technically works its painful, slow, and centralized so i came up with this:
simple version in C#:
Guid idA = Guid.NewGuid();
Guid idB = Guid.NewGuid();
byte[] data = new byte[32];
idA.ToByteArray().CopyTo(data, 0);
idB.ToByteArray().CopyTo(data, 16);
byte[] hash = MD5.Create().ComputeHash(data);
Guid newID = new Guid(hash);
now a proper version will sort the IDs and support more than two, and probably reuse the MD5 object, but this should be faster to understand.
Now security is not a factor in this, none of the IDs are secret, just saying this 'cause everyone i talk to react badly when you say MD5, and MD5 is particularly useful for this as it outputs 128 bits and thus can be converted directly to a new Guid.
now it seems to me that this should be just dandy, while i may increase the odds of a collision of Guids it still seems like i could do this till the sun burns out and be no where near running into a practical issue.
However i have no clue how MD5 is actually implemented and may have overlooked something significant, so my question is this: is there any reason this should cause problems? (assume sub trillion records and ideally the output IDs should be just as global/universal as the other IDs)

My first thought is that you would not be generating a true UUID. You would end up with an arbitrary set of 128-bits. But a UUID is not an arbitrary set of bits. See the 'M' and 'N' callouts in the Wikipedia page. I don't know if this is a concern in practice or not. Perhaps you could manipulate a few bits (the 13th and 17th hex digits) inside your MD5 output to transform the hash outbut to a true UUID, as mentioned in this description of Version 4 UUIDs.
Another issue… MD5 does not do a great job of distributing generated values across the range of possible outputs. In other words, some possible values are more likely to be generated more often than other values. Or as the Wikipedia article puts it, MD5 is not collision resistant.
Nevertheless, as you pointed out, probably the chance of a collision is unrealistic.
I might be tempted to try to increase the entropy by repeating your combined value to create a much longer input to the MD5 function. In your example code, take that 32 octet value and use it repeatedly to create a value 10 or 1,000 times longer (320 octects, 32,000 or whatever).
In other words, if working with hex strings for my own convenience here instead of the octets of your example, given these two UUIDs:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1
FCF1B8E4-5548-4C43-995A-8DA2555459C8
…instead of feeding this to the MD5 function:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…feed this:
78BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C878BC2A6B-4F03-48D0-BB74-051A6A75CCA1FCF1B8E4-5548-4C43-995A-8DA2555459C8
…or something repeated even longer.

CSV String vs Arrays: Is this too stringly typed?

I came across some existing code in our production environment given to us by our vendor. They use a string to store comma seperated values to store filtered results from a DB. Keep in mind that this is for a proprietary scripting language called PowerOn that interfaces with a database residing on an AIX system, but it's a language that supports strings, integers, and arrays.
For example, we have;
Account
----------------
123
234
3456
28390
The psuedo code might look like;
Define accounts As String
For Each Account
accounts=accounts + CharCast(Account) + ","
End
as opposed to something I would expect to see like
Define accounts As Integer Array(99)
Define index as Integer=0
For Each Account
accounts(index)=Account
index=index+1
End
By the time the loop is done, accounts will look like; 123,234,3456,28390,. The string is later used to test if a specific instance exists like so
If CharSearch("28390", accounts) > 0 Then Call DoSomething
In the example, the statement evaluates to true and DoSomething gets called. Given the option of arrays, why would want to store integer values whithin a string of comma seperated values? Every language I've come across, it's almost always more expensive to perform string based operations than integer based operations.
Considering I haven't seen this technique before and my experience is somewhat limitted, is there a name for this? Is this common practice or is this just another example of being too stringly typed? To extend the existing code, should I continue using string method? Did we get cruddy code from our vendor?

What I put in the comment still holds but my real answer is: It's probably a design decision with respect to compatibility/portability. In your integer-array case (and a low enough level of the API) you'd typically find yourself asking questions like, what's a safe guess of the size of an integer on "today"'s machines. What about endianness.
The most portable and most flexible of all data formats always has been and always will be printed representation. It may not be as fast to process that but that's where adapters/converters or so kick in. I wouldn't be surprised to find (human-readable) printed representation of something especially in database APIs like you describe.
If you want something fast, just take whatever is given to you, convert it to a more efficient internal format, do you processing and convert it back.

There's nothing inherently wrong with using comma-separated strings instead of arrays. Sure you can't readily access a random n's element of such a collection, but if such random access is not needed then there's no penalty for it, right?
As far as I know Oracle DB stores NUMBER values as strings (and if my memory is correct - for DATEs as well) for very practical reasons.
In your specific example looks like using strings is an overkill when dealing with passing data around without crossing the process boundaries. But could it be that the choice of string data type makes more sense when sending data over wire or storing on disk?

hashing function guaranteed to be unique?

In our app we're going to be handed png images along with a ~200 character byte array. I want to save the image with a filename corresponding to that bytearray, but not the bytearray itself, as i don't want 200 character filenames. So, what i thought was that i would save the bytearray into the database, and then MD5 it to get a short filename. When it comes time to display a particular image, i look up its bytearray, MD5 it, then look for that file.
So far so good. The problem is that potentially two different bytearrays could hash down to the same MD5. Then, one file would effectively overwrite another. Or could they? I guess my questions are
Could two ~200 char bytearrays MD5-hash down to the same string?
If they could, is it a once-per-10-ages-of-the-universe sort of deal or something that could conceivably happen in my app?
Is there a hashing algorithm that will produce a (say) 32 char string that's guaranteed to be unique?

It's logically impossible to get a 32 byte code from a 200 byte source which is unique among all possible 200 byte sources, since you can store more information in 200 bytes than in 32 bytes.
They only exception would be that the information stored in these 200 bytes would also fit into 32 bytes, in which case your source date format would be extremely inefficient and space-wasting.

When hashing (as opposed to encrypting), you're reducing the information space of the data being hashed, so there's always a chance of a collision.
The best you can hope for in a hash function is that all hashes are evenly distributed in the hash space and your hash output is large enough to provide your "once-per-10-ages-of-the-universe sort of deal" as you put it!
So whether a hash is "good enough" for you depends on the consequences of a collision. You could always add a unique id to a checksum/hash to get the best of both worlds.

Why don't you use a unique ID from your database?

The probability of two hashes will likely to collide depends on the hash size. MD5 produces 128-bit hash. So for 2128+1 number of hashes there will be at least one collision.
This number is 2160+1 for SHA1 and 2512+1 for SHA512.
Here this rule applies. The more the output bits the more uniqueness and more computation. So there is a trade off. What you have to do is to choose an optimal one.

Could two ~200 char bytearrays MD5-hash down to the same string?
Considering that there are more 200 byte strings than 32 byte strings (MD5 digests), that is guaranteed to be the case.
All hash functions have that problem, but some are more robust than MD5. Try SHA-1. git is using it for the same purpose.

It may happen that two MD5 hashes collides (are the same). In 1996, a flaw was found in MD5 algorithm, and cryptanalysts advised to switch to SHA-1 hashing algorithm.
So, I will advise you to switch to SHA-1 (40 characters). But do not worry: I doubt that your two pictures will get the same hash. I think you can assume this risk in your application.

As other said before. Hash doesnt give you what you need unless you are fine with risk of collision.
Database is helpful here.
You get unique index for each 200 long string. No collisions here, and you need to set your 200 long names to be indexed, in that way it will use extra memory but it will sort it for you making search very very fast. You get unique id which can be easily used for filenames.

I have'nt worked much on hashing algorithms but as per my understanding there is always a chance of collison in hashing algorithm i.e. two differnce object may be hashed to same hash value but it is guaranteed that every time a object will be hashed to same hash value. There are other techniques that may be used for this , like linear probing.

How to crack a weakened TEA block cipher?

At the moment I am trying to crack the TEA block cipher in C. It is an assignment and the tea cipher has been weakend so that the key is 2 16-bit numbers.
We have been given the code to encode plaintext using the key and to decode the cipher text with the key also.
I have the some plaintext examples:
plaintext(1234,5678) encoded (3e08,fbab)
plaintext(6789,dabc) encoded (6617,72b5)
Update
The encode method takes in plaintext and a key, encode(plaintext,key1). This occurs again with another key to create the encoded message, encode(ciphertext1,key), which then creates the encoded (3e08,fbab) or encoded (6617,72b5).
How would I go about cracking this cipher?
At the moment, I encode the known plaintext with every possible key; the key size being hex value ffffffff. I write this to file.
But now I am stuck and in need of direction.
How could I use the TEA's weakness of equivalent keys to lower the amount of time it would take to crack the cipher? Also, I am going to use a man in the middle attack.
As when I encode with known plaintext and all key 1s it will create all the encrypted text with associated key and store it in a table.
Then I will decrypt with the known ciphertext that is in my assignment with all the possible values of key2. This will leave me with a table of decrypts that has only been decrypted once.
I can then compare the 2 tables together to see if any of encrpts with key1 match the decrypts with key2.
I would like to use the equilenvent weakness as well, if someone could help me with implmenting this in code that would be great. Any ideas?

This is eerily similar to the Double Crypt problem from the IOI '2001 programming contest. The general solution is shown here, it won't give you the code but might point you in the right direction.

Don't write your results to a file -- just compare each ciphertext you produce to the known ciphertext, encoding the known plain text with every possible key until one of them produces the right ciphertext. At that point, you've used the correct key. Verify that by encrypting the second known plaintext with the same key to check that it produces the correct output as well.
Edit: the encoding occurring twice is of little consequence. You still get something like this:
for (test_key=0; test_key<max; test_key++)
if (encrypt(plaintext, test_key) == ciphertext)
std::cout << "Key = " << test_key << "\n";
The encryption occurring twice means your encrypt would look something like:
return TEA_encrypt(TEA_encrypt(plaintext, key), key);
Edit2: okay, based on the edited question, you apparently have to do the weakened TEA twice, each with its own 16-bit key. You could do that with a single loop like above, and split up the test_key into two independent 16-bit keys, or you could do a nested loop, something like:
for (test_key1=0; test_key1<0xffff; test_key1++)
for (test_key2=0; test_key2<0xffff; test_key2++)
if (encrypt(encrypt(plaintext, test_key1), test_key2) == ciphertext)
// we found the keys.

I am not sure if this property holds for 16-bit keys, but 128-bit keys have the property that four keys are equivalent, reducing your search space by four-fold. I do not off the top of my head remember how to find equivalent keys, only that the key space is not as large as it appears. This means that it's susceptible to a related-key attack.
You tagged this as homework, so I am not sure if there are other requirements here, like not using brute force, which it appears that you are attempting to do. If you were to go for a brute force attack, you would probably need to know what the plaintext should look like (like knowing it English, for example).

The equivalent keys are easy enough to understand and cut key space by a factor of four. The key is split into four parts. Each cycle of TEA has two rounds. The first uses the first two parts of the key while the second uses the 3rd and 4th parts. Here is a diagram of a single cycle (two rounds) of TEA:
(unregistered users are not allowed to include images so here's a link)
https://en.wikipedia.org/wiki/File:TEA_InfoBox_Diagram.png
Note: green boxes are addition red circles are XOR
TEA operates on blocks which it splits into two halves. During each round, one half of the block is shifted by 4,0 or -5 bits to the left, has a part of the key or the round constant added to it and then the XOR of the resulting values is added to the other half of the block. Flipping the most significant bit of either key segment flips the same bit in the sums it is used for and by extension the XOR result but has no other effect. Flipping the most significant bit of both key segments used in a round flips the same bit in the XOR product twice leaving it unchanged. Flipping those two bits together doesn't change the block cipher result making the flipped key equivalent to the original. This can be done for both the (first/second) and (third/fourth) key segments reducing the effective number of keys by a factor of four.

Given the (modest) size of your encryption key, you can afford to create a pre-calculated table (use the same code given above, and store data in large chuncks of memory - if you don have enough RAM, dump the chuncks to disk and keep an addressing scheme so you can lookup them in a proper order).
Doing this will let you cover the whole domain and finding a solution will then be done in real-time (one single table lookup).
The same trick (key truncation) was used (not a long time ago) in leading Office software. They now use non-random data to generate the encryption keys -which (at best) leads to the same result. In practice, the ability to know encryption keys before they are generated (because the so-called random generator is predictable) is even more desirable than key-truncation (it leads to the same result -but without the hurdle of having to build and store rainbow tables).
This is called the march of progress...

Store RGB values in database

I never had to do this before and never even thought about this before. How can i or what is the best way of storing RGB values in the database.
I thought of couple of options. The most obvious one being 3 byte columns to store the R,G and the B.(I dont want to go this route)
Another option is to store it in a 32 bit int column. ( I am leaning towards this one)
or may be i am just missing something trivial.

The "wasted" space of 32-bit integer column would allow you to store an alpha channel as well, should the need ever arise for it.

First and foremost: what are your requirements?
Do you need to retrieve the color and only the color? Do you ever need to query be components? do you need to search by colorspace distance? Do you need to store colorspace information (Adobe RGB or sRGB)? See also Best Way to represent a color in SQL.

If you're doing storing these numbers for web design, I would suggest simply using a char(6) and storing a string of hex triplets.
Sure, that's two bytes "wasted" over a 32-bit integer, but if you're not comparing them mathematically in some way and just regurgitating them to a CSS file, for instance, storing as a string will remove the need to translate back and forth.
Not that hex triplets to integers is a tough translation, but doing the easiest thing possible rather than optimizing for a few bytes may be worth considering.
If you're doing something other than web-related work, you may want to consider building in room for more than 8 bits per channel.

RGB values are usually described on the web in the format 0xRRGGBB where RR, GG, and BB are the hex values of R, G, and B. While you may be wasting a bit of space with a 32 bit int, I can't imagine it's much compared to the benefit you'll potentially gain from storing the values in a well-known format.
In case you'd like quick primer on how to go about the conversion, wikipedia's got you covered!

Just store it as a 32 bit value. There is no point in breaking down into 3 fields since you will most likely want all 3 components together all the time.

My guess is to store a 32 bit integer.
However if your SQL operations require each component to be of individual columns (meaning to say you need to compare values of R vs G of another column for example) you will have to separate out the values into individual columns. R, G, B, each 0-255 integer.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight