How can I generate unique, non-sequential serial keys without 3rd party software? - c

I'm working on a project that involves writing low-level C software for a hardware implementation. We are wanting to implement a new feature for our devices that our users can unlock when they purchase an associated license key.
The desired implementation steps are simple. The user calls us up, they request the feature and sends us a payment. Next, we email them a product key which they input into their hardware to unlock the feature.
Our hardware is not connected to the internet. Therefore, an algorithm must be implemented in such a way that these keys can be generated from both the server and from within the device. Seeds for the keys can be derived from the hardware serial number, which is available in both locations.
I need a simple algorithm that can take sequential numbers and generate unique, non-sequential keys of 16-20 alphanumeric characters.
UPDATE
SHA-1 looks to be the best way to go. However, what I am seeing from sample output of SHA-1 keys is that they are pretty long (40 chars). Would I obtain sufficient results if I took the 40 char key and, say, truncated all but the last 16 characters?

You could just concatenate the serial number of the device, the feature name/code and some secret salt and hash the result with SHA1 (or another secure hashing algorithm). The device compares the given hash to the hash generated for each feature, and if it finds a match it enables the feature.
By the way, to keep the character count down I'd suggest to use base64 as encoding after the hashing pass.
SHA-1 looks to be the best way to go. However, what I am seeing from sample output of SHA-1 keys is that they are pretty long (40 chars). Would I obtain sufficient results if I took the 40 char result and, say, truncated all but the last 16 characters?
Generally it's not a good idea to truncate hashes, they are designed to exploit all the length of the output to provide good security and resistance to collisions. Still, you could cut down the character count using base64 instead of hexadecimal characters, it would go from 40 characters to 27.
Hex: a94a8fe5ccb19ba61c4c0873d391e987982fbbd3
Base64: qUqP5cyxm6YcTAhz05Hph5gvu9M
---edit---
Actually, #Nick Johnson claims with convincing arguments that hashes can be truncated without big security implications (obviously increasing chances of collisions of two times for each bit you are dropping).
You should also use an HMAC instead of naively prepending or appending the key to the hash. Per Wikipedia:
The design of the HMAC specification was motivated by the existence of
attacks on more trivial mechanisms for combining a key with a hash
function. For example, one might assume the same security that HMAC
provides could be achieved with MAC = H(key ∥ message). However, this
method suffers from a serious flaw: with most hash functions, it is
easy to append data to the message without knowing the key and obtain
another valid MAC. The alternative, appending the key using MAC =
H(message ∥ key), suffers from the problem that an attacker who can
find a collision in the (unkeyed) hash function has a collision in the
MAC. Using MAC = H(key ∥ message ∥ key) is better, however various
security papers have suggested vulnerabilities with this approach,
even when two different keys are used.
For more details on the security implications of both this and length truncation, see sections 5 and 6 of RFC2104.

One option is to use a hash as Matteo describes.
Another is to use a block cipher (e.g. AES). Just pick a random nonce and invoke the cipher in counter mode using your serial numbers as the counter.
Of course, this will make the keys invertible, which may or may not be a desirable property.

You can use an Xorshift random number generator to generate a unique 64-bit key, and then encode that key using whatever scheme you want. If you use base-64, the key is 11 characters long. If you use hex encoding, the key would be 16 characters long.
The Xorshift RNG is basically just a bit mixer, and there are versions that have a guaranteed period of 2^64, meaning that it's guaranteed to generate a unique value for every input.
The other option is to use a linear feedback shift register, which also will generate a unique number for each different input.

Related

What is the conflict probability of md5 digestion if input string only contains alphanumericals

The input strings have the following conditions:
Only contain alphanumericals ([a-zA-Z0-9])
The size of a string is always less than 256 bytes
Total number of input strings is less then 1000,000
So what is the conflict probability of md5 digestion if the input strings are all under the above conditions? Can I just assume that there has no conflict?
If the inputs are random the likelihood of a collision in that input set is very low. That being said MD5 is a broken algorithm and a human can easily use software to find a collision. So you probably just shouldn't use MD5, but it depends on what you're using it for. I'm not sure why you would ever want to use MD5 anymore. You should look into the blake2 family or the newer SHAs (SHA256, SHA512, not SHA-1). If these are passwords you should pretty much definitely be using a hash designed for passwords like PBKDF2 or one of the Argons. To be honest I'd recommend just using libsodium's defaults for most things.

how to resize string [ group of numbers and char]?

Good Afternoon all,
I am working over rsa encryption and decryption, for more security i am also using padding in cipher text, for different input (amit) , i am getting different length output like-
plain text- amit
cipher text-10001123A234A987A765A
My problem is- For big plain text ,my algo generate large size cipher text, and i thought,
it is wastage of resources to keep long string in database ,
Is there any way with the help of that i can compact cipher and convert real cipher when i will require?
In order for the algorithm to be encryption and not just hashing, it must be reversible. To be reversible, the output must contain as much information as the input, and so is unlikely to be significantly shorter.
You may compress the data before encryption. There's not a lot else you can do unless you're willing to give up the ability to recover your original text from the ciphertext.
There are a couple of possibilities:
Change your encryption scheme there are schemes where the size is same as the input size
Compress your data before you encrypt, this will be effective only if you have a large block of text to encrypt and then there's the additional overhead of decrypting too.
This doesn't apply to RSA specifically, but: any secure cipher will give output close to indistinguishable from a random bit pattern. A random bit pattern has, per definition, maximum information theoretic entropy, since for each bit, both 0 and 1 are equally likely.
Now, you want a lossless compression scheme, since you need to be able to decompress to the exact data you originally compressed. An optimal compression scheme will maximize the entropy of it's output. However, we know that the output of our cipher already has maximum entropy, so we can't possibly increase the entropy.
And thus, trying to compress encrypted data is useless.
Note: Depending on your encryption method, compression might be possible, for example, when using a block cipher in EBC mode. RSA is a completely different beast altogether though, and, well, compressing won't do anything (except quite possibly make your final output bigger).
[Edit] Also, the length of your RSA ciphertext will be in the order of log n. With n your public modulus. This is the reason that, especially for small plaintexts, public key crypto is extremely 'wasteful'. You normally employ RSA to setup a (smaller, e.g. 128-bit) symmetric key between two parties and then encrypt your data with a symmetric key algorithm such as AES. AES has a block size of 128 bits, so if you do straightforward encryption of your data, the maximum 'overhead' you incur will be length(message) mod 128 bits.
erm ... you wrote in a comment here that you apply RSA encryption to all single characters:
i am using rsa- it perform over
numbers to convert amit in cipher text
first i do a->97 m->109 i->105..and
then apply rsa over 97 ,109 ... then i
get different integers for 109, 105 or
... i joined that all as a string...
a good advice: don't do that since you will lose the security of RSA
if you use RSA in this way, your scheme becomes a substitution-cypher (with only one substitution alphabet) ... given a reasonably long cypher-text or a reasonable number of cypher-texts, this scheme can be broken by analyzing the frequency of cypher-text-chars
see RSAES-OAEP for a padding scheme to apply to your plaintext before encryption

Will the MD5 cryptographic hash function output be same in all programming languages?

I am basically creating an API in php, and one of the parameters that it will accept is an md5 encrypted value. I don't have much knowledge of different programming languages and also about the MD5. So my basic question is, if I am accepting md5 encrypted values, will the value remain same, generated from any programing language like .NET, Java, Perl, Ruby... etc.
Or there would be some limitation or validations for it.
Yes, correct implementation of md5 will produce the same result, otherwise md5 would not be useful as a checksum. The difference may come up with encoding and byte order. You must be sure that text is encoded to exactly the same sequence of bytes.
It will, but there's a but.
It will because it's spec'd to reliably produce the same result given a repeated series of bytes - the point being that we can then compare that results to check the bytes haven't changed, or perhaps only digitally sign the MD5 result rather than signing the entire source.
The but is that a common source of bugs is making assumptions about how strings are encoded. MD5 works on bytes, not characters, so if we're hashing a string, we're really hashing a particular encoding of that string. Some languages (and more so, some runtimes) favour particular encodings, and some programmers are used to making assumptions about that encoding. Worse yet, some spec's can make assumptions about encodings. This can be a cause of bugs where two different implementations will produce different MD5 hashes for the same string. This is especially so in cases where characters are outside of the range U+0020 to U+007F (and since U+007F is a control, that one has its own issues).
All this applies to other cryptographic hashes, such as the SHA- family of hashes.
Yes. MD5 isn't an encryption function, it's a hash function that uses a specific algorithm.
Yes, md5 hashes will always be the same regardless of their origin - as long as the underlying algorithm is correctly implemented.
A vital point of secure hash functions, such as MD5, is that they always produce the same value for the same input.
However, it does require you to encode the input data into a sequence of bytes (or bits) the same way. For instances, there are many ways to encode a string.

Hash function for short strings

I want to send function names from a weak embedded system to the host computer for debugging purpose. Since the two are connected by RS232, which is short on bandwidth, I don't want to send the function's name literally. There are some 15 chars long function names, and I sometimes want to send those names at a pretty high rate.
The solution I thought about, was to find a hash function which would hash those function names to a single byte, and send this byte only. The host computer would scan all the functions in the source, compute their hash using the same function, and then would translate the hash to the original string.
The hash function must be
Collision free for short strings.
Simple (since I don't want too much code in my embedded system).
Fit a single byte
Obviously, it does not need to be secure by any means, only collision free. So I don't think using cryptography-related hash function is worth their complexity.
An example code:
int myfunc() {
sendToHost(hash("myfunc"));
}
The host would then be able to present me with list of times where the myfunc function was executed.
Is there some known hash function which holds the above conditions?
Edit:
I assume I will use much less than 256 function-names.
I can use more than a single byte, two bytes would have me pretty covered.
I prefer to use a hash function instead of using the same function-to-byte map on the client and the server, because (1) I have no map implementation on the client, and I'm not sure I want to put one for debugging purposes. (2) It requires another tool in my build chain to inject the function-name-table into my embedded system code. Hash is better in this regard, even if that means I'll have a collision once in many while.
Try minimal perfect hashing:
Minimal perfect hashing guarantees that n keys will map to 0..n-1 with no collisions at all.
C code is included.
Hmm with only 256 possible values, since you will parse your source code to know all possible functions, maybe the best way to do it would be to attribute a number to each of your function ???
A real hash function would probably won't work because you have only 256 possible hashes.
but you want to map at least 26^15 possible values (assuming letter-only, case-insensitive function names).
Even if you restricted the number of possible strings (by applying some mandatory formatting) you would be hard pressed to get both meaningful names and a valid hash function.
You could use a Huffman tree to abbreviate your function names according to the frequency they are used in your program. The most common function could be abbreviated to 1 bit, less common ones to 4-5, very rare functions to 10-15 bits etc. A Huffman tree is not very hard to implement but you will have to do something about the bit alignment.
No, there isn't.
You can't make a collision free hash code, or even close to it, with just an eight bit hash. If you allow strings that are longer than one character, you have more possible strings than there are possible hash codes.
Why not just extract the function names and give each function name an id? Then you only need a lookup table on each side of the wire.
(As others have shown you can generate a hash algorithm without collisions if you already have all the function names, but then it's easier to just assign a number to each name to make a lookup table...)
If you have a way to track the functions within your code (i.e. a text file generated at run-time) you can just use the memory locations of each function. Not exactly a byte, but smaller than the entire name and guaranteed to be unique. This has the added benefit of low overhead. All you would need to 'decode' the address is the text file that maps addresses to actual names; this could be sent to the remote location or, as I mentioned, stored on the local machine.
In this case you could just use an enum to identify functions. Declare function IDs in some header file:
typedef enum
{
FUNC_ID_main,
FUNC_ID_myfunc,
FUNC_ID_setled,
FUNC_ID_soundbuzzer
} FUNC_ID_t;
Then in functions:
int myfunc(void)
{
sendFuncIDToHost(FUNC_ID_myfunc);
...
}
If sender and receiver share the same set of function names, they can build identical hashtables from these. You can use the path taken to get to an hash element to communicate this. This can be {starting position+ number of hops} to communicate this. This would take 2 bytes of bandwidth. For a fixed-size table (lineair probing) only the final index is needed to address an entry.
NOTE: when building the two "synchronous" hash tables, the order of insertion is important ;-)
Described here is a simple way of implementing it yourself: http://www.devcodenote.com/2015/04/collision-free-string-hashing.html
Here is a snippet from the post:
It derives its inspiration from the way binary numbers are decoded and converted to decimal number format. Each binary string representation uniquely maps to a number in the decimal format.
if say we have a character set of capital English letters, then the length of the character set is 26 where A could be represented by the number 0, B by the number 1, C by the number 2 and so on till Z by the number 25. Now, whenever we want to map a string of this character set to a unique number , we perform the same conversion as we did in case of the binary format

DES encryption and cipher modes

I need to encrypt an ISO 8583 message... the problem here is that the message is longer than the key. I need some one help me how to encrypt this string.
For example: I have 300 chars in my string; should I encrypt each 16 chars alone then concat them, since my master key length is 16 bytes?
I appreciate your help...
ISO 8583-1:2003 Financial transaction card originated messages -- Interchange message specifications -- Part 1: Messages, data elements and code values.
DES is a block cipher, and block ciphers have different modes of operation.
The mode you mentioned is known as ECB (Electronic Codebook), and is not very secure (actually, neither is DES, but more on that later).
I'd suggest you use CBC or some other mode.
You can read about block cipher modes of operation here: Block cipher modes of operation
As for the cipher itself, I'd suggest you avoid using DES if this is at all possible. DES is extremely easy to crack nowadays. Please use AES, or at least 3DES if AES is not available.
EDIT: In response to the updated question, yes, you would need to pad the last block if the plaintext size is not a multiple of the block size.
There are many different modes of operation for a block cipher.
If you just need to applay ECB to your plain text, just split the plain text into equally sized blocks of size 8 bytes (DES block size) and encrypt each separately.
Depending on what you want to achieve, you could also use
ECB which encrypts block wise with each block being independent from all other (previous) blocks. Drawback: known plain text attacks can reveal patterns in your cipher text because the same plain text will always be encrypted to the same cipher text
CBC here you have an initialization vector and each blocks depends on all previously encrypted blocks. This is why you need an IV for the first block.
CFB this is an interesting one because it allows you to turn your block cipher into a stream cipher, which might be useful if you want to encrypt a video stream or whatever (similarly for OFB which basically generates a key stream)
CTS cipher text stealing might be of use if you want to encrypt data but want to avoid padding. A usage example might be to encrypt a blob in your database, which you cannot resize after it has been written to the DB.
There are still many more modes, but these are the most commonly used ones (imho).
As others have pointed out visit Wikipedia for all the details.
Update:
As for the padding, you have different possibilities. I'd recommend to use the ANSI X.923 standard which basically requires you to pad the last buffer with zeroes and append a counter in the last byte which gives you the number of valid bytes in the last block. The same idea is used in ISO10126 but this time padding is done with random bytes.
Note that you can avoid padding at all when using CTS.
Maybe ask yourself if it's actually easier to use a crypto library to do the job for you.
If you're using C++ go for Crypto++ (not so straightforward, but consistent c++ style), Java and .NET have built in crypto providers. If you want to use plain C i can recommend libTomCrypt (very easy to use).
The key length does not impose a limit on the message size. The message can be as long as you want, and your 128-bit key (nonstandard for DES?) will still be good. The DES cipher operates on blocks of bytes, one block at a time. Standard DES uses a 56-bit key (plus 8 parity bits) and 64-bit blocks.
should I encrypt each 16 chars alone
then concat them, since my master key
length is 16 bytes?
Ciphers in general do not require the key and block sizes to be the same; they can define complicated operations taking a given block of cleartext and transforming it with the key to a block of ciphertext (usually of the same size). When multiple blocks need to be encrypted, a mode of operation is specified to describe how one block relates to the next block in the process.
When operating in the electronic codebook (ECB) mode, the message is divided into blocks, and each block of cleartext is encrypted separately with the same key (the resulting blocks of ciphertext are then concatenated). Like other modes of operation for DES (i.e. CBC, CFB, OFB), this approach has its pros and cons. You will need to pick the mode most suitable for your application.
Btw, you should also be aware that DES is now considered insecure.
You need to look up encryption modes - which have names such as Cipher Block Chaining (CBC) and the 'do not use' mode Electronic Code Book (ECB), and even some exotic names like the Infinite Garble Extension (IGE). That page has a beautiful illustration of why the ECB mode should not be used.
CBC is a standard, solid mode of operation. OFB and CFB are also widely used.
You realize that the US Federal Government no longer uses plain DES because it is not secure enough (because it uses a 56-bit key and can be broken by brute force)? Triple-DES is just about tolerated - it has a 112-bit or 168-bit key, depending on which way you use it. The standard, though, is Advanced Encryption System, AES. Unless you have backwards compatibility reasons, you should use AES and not DES in new production code.
Also, you should know the answers to these questions before trying to write production code. I trust this is in the nature of homework or personal interest.
You might want to encrypt for the following reason:
Transmit secured data that is the PIN Block (Data Element 52), this one is taken care of by the HSM, by the time you translate from the acquirer key to the issuer key.
or to MAC the message, then you would select some fields to hash and append to the end of the message (usually Data Element 124 or 128)
or you want to store the ISO 8583 message and want to comply with PCI DSS regulations, in this case you would encrypt the following data elements if present
DE 2 - Card Number
DE 14 - Expiry Date
DE 35 - Track II
DE 45 - Track I
DE 48 - EMV (Chip) Data (MasterCard TLV)
DE 52 - PIN Block
DE 55 - EMV (Chip) Data (Visa)
one more thing, your Master Key should be 128 bits to comply with Visa mandates (Triple DES Mandates to have LMK at least double length key that is 32 digits - 128 bits key)

Resources