Encrypt file using Hill Cipher and Java? - file

As we know, Hill cipher is a classic cipher in cryptography and is mostly used for encrypting text. I need to encrypt a file (such as .doc, .ppt, .jpeg, etc), and not just the contents of file. I already searched on the Internet, but I didn't find much research that focuses on file encryption.
What I found : encrypting text content in .txt doesn't encrypt the .txt file.
Using Java or .Net or Python (pick one or some), how to implement Hill Cipher to encrypt files as I explained above?
As a note, this question is not for my homework or assignment. I am just confused and curious about how one can implement the Hill Cipher to encrypt a file. Thank you.

The Hill cipher, like most classical ciphers from the pre-computer era, was traditionally used to only encrypt letters: that is, the valid inputs would consist only of the 26 letters from A to Z (and, in some variants, possibly a few extra symbols to make the alphabet size a prime number).
That said, there's no reason why you couldn't use a variant of the Hill cipher with, say, an alphabet size of 256, allowing you to directly encrypt input consisting of arbitrary bytes.
For a key, you'd need a random matrix that is invertible modulo 256, that is, a matrix consisting of random values from 0 to 255 chosen such that its determinant is an odd number. An easy way to generate such matrices is to just pick the matrix elements uniformly at random, calculate the determinant, and start over if it happens to be even. On average, you'll need two tries to succeed. Of course, for decryption, you'll also need to actually calculate the inverse matrix.
All that said, it's worth noting that the Hill cipher, on its own, is very easy to break. Indeed, even its inventor Lester S. Hill realized this, and only recommended it for use in combination with a substitution cipher, in what we might today consider a primitive substitution-permutation network.
Of course, nowadays we have access to much more efficient and secure ciphers, such as, say, AES. For any practical encryption tasks (as opposed to just learning exercises), you should use those rather than trying to develop your own.

Related

What is the conflict probability of md5 digestion if input string only contains alphanumericals

The input strings have the following conditions:
Only contain alphanumericals ([a-zA-Z0-9])
The size of a string is always less than 256 bytes
Total number of input strings is less then 1000,000
So what is the conflict probability of md5 digestion if the input strings are all under the above conditions? Can I just assume that there has no conflict?
If the inputs are random the likelihood of a collision in that input set is very low. That being said MD5 is a broken algorithm and a human can easily use software to find a collision. So you probably just shouldn't use MD5, but it depends on what you're using it for. I'm not sure why you would ever want to use MD5 anymore. You should look into the blake2 family or the newer SHAs (SHA256, SHA512, not SHA-1). If these are passwords you should pretty much definitely be using a hash designed for passwords like PBKDF2 or one of the Argons. To be honest I'd recommend just using libsodium's defaults for most things.

Where check string is AES crypt in C?

Where is correct checking char* array is AES 128/192/256 crypt, or non crypt - is text?
not use OpenSSL, pls.
tldr: If you want a 100%-working solution, it´s completely impossible.
Long version:
First, stop thinking "binary vs text". That´s not how it works.
AES ciphertext surely is binary data in the computer, but "text" is too.
If you want do distinguish AES ciphertext from other non-AES data, it´s impossible because:
AES ciphertext can be some unreadable gargabe, but it can be a poem of Goethe too.
Every possible data thing can be ("is") a AES ciphertext for some plaintext with some key.
Non-AES data can be as much unreadable garbage as AES data. (Pseudo-) Random bytes as example: AES with proper input is an excellent random byte generator.
The other way round; if you want to distinguish proper and sane human-readable text from other things, it´s impossible because: There is no law or something what "text" is in your computer.
If you want to search for english letters, consider following points:
As said above, readable words can be AES ciphertext too.
English letters? What´s about German, Japanese, old Greek, Russian...?
How are letters mapped to bytes? ISO88591, UTF16LE with BOM, EBCDIC, own mappings...?
What´s about file formats like MS Word *.doc? In it, there´s text
too, yet it´s binary "garbage" data. Or compression algorithms: Gzip,
Rar etc. doesn´t make text less sane.
If you finally extracted proper letters, how do you know if it isn´t something like "miodsjoiusdJf"? Recognizing words and their meaning is a very big topic on it´s own, and nearly everything in it is guesswork.

MD5 hashes and Regular Expressions

I received a MD5 hash and a Regular Expression which have the same plaintext..
How do I use the Regular Expression to crack the MD5 hash and find the text behind the MD5?
b89e49cab317f2681be60fb3d1c0f8f8
[(a|c|d)n-t\|]{8}
The idea would be to use the regex as a template and generate inputs that satisfy it.
You can search for a regex visualizer to see this, but what that one says is any of the characters ()acd| or any character between n and t (inclusive) in any order, repeated eight times. I tested this in hashcat, and the regex is correct despite it looking like it means something else. A shorter way to write that would be [acd|()n-t]{8}.
So you start generating 8 character strings with those values and taking the md5 of them. You can do this in almost any programming language but Python is a good choice. Look up the hashlib library, it has a function md5. You'll call the function hexdigest on that and compare it to the provided hash.
>>> import hashlib
>>> hashlib.md5(b'cybering').hexdigest()
'61e4feebe66ad22349e292d1462afd3a'
Additionally, if you want to use cracking software, look up JohnTheRipper or hashcat. You should be able to provide them a dictionary and have it attempt to break the hash. I was able to solve this with hashcat on my 980ti in ~5 seconds. This tutorial helped me set up the custom charset and mask to perform the attack.
Have fun!
One approach would be to generate all possible eight-character combinations (with repetition) of the 19 characters allowed by the regex. Test each combination by computing the md5 hash and comparing it to the one you were given.
That would be 13^8 = 815,730,721 possible combinations to check. The answer will likely be found before checking all of them.
I was able to whip out a little Node.js program on my laptop that found the solution in about 4 minutes (I split the problem up using workers to take advantage of multiple CPU cores).
Edit: I thought the regex had n-z instead of n-t so the search space was actually much smaller.
You cant crack the md5 hash value it has used one way hashing algorithm.

How can I generate unique, non-sequential serial keys without 3rd party software?

I'm working on a project that involves writing low-level C software for a hardware implementation. We are wanting to implement a new feature for our devices that our users can unlock when they purchase an associated license key.
The desired implementation steps are simple. The user calls us up, they request the feature and sends us a payment. Next, we email them a product key which they input into their hardware to unlock the feature.
Our hardware is not connected to the internet. Therefore, an algorithm must be implemented in such a way that these keys can be generated from both the server and from within the device. Seeds for the keys can be derived from the hardware serial number, which is available in both locations.
I need a simple algorithm that can take sequential numbers and generate unique, non-sequential keys of 16-20 alphanumeric characters.
UPDATE
SHA-1 looks to be the best way to go. However, what I am seeing from sample output of SHA-1 keys is that they are pretty long (40 chars). Would I obtain sufficient results if I took the 40 char key and, say, truncated all but the last 16 characters?
You could just concatenate the serial number of the device, the feature name/code and some secret salt and hash the result with SHA1 (or another secure hashing algorithm). The device compares the given hash to the hash generated for each feature, and if it finds a match it enables the feature.
By the way, to keep the character count down I'd suggest to use base64 as encoding after the hashing pass.
SHA-1 looks to be the best way to go. However, what I am seeing from sample output of SHA-1 keys is that they are pretty long (40 chars). Would I obtain sufficient results if I took the 40 char result and, say, truncated all but the last 16 characters?
Generally it's not a good idea to truncate hashes, they are designed to exploit all the length of the output to provide good security and resistance to collisions. Still, you could cut down the character count using base64 instead of hexadecimal characters, it would go from 40 characters to 27.
Hex: a94a8fe5ccb19ba61c4c0873d391e987982fbbd3
Base64: qUqP5cyxm6YcTAhz05Hph5gvu9M
---edit---
Actually, #Nick Johnson claims with convincing arguments that hashes can be truncated without big security implications (obviously increasing chances of collisions of two times for each bit you are dropping).
You should also use an HMAC instead of naively prepending or appending the key to the hash. Per Wikipedia:
The design of the HMAC specification was motivated by the existence of
attacks on more trivial mechanisms for combining a key with a hash
function. For example, one might assume the same security that HMAC
provides could be achieved with MAC = H(key ∥ message). However, this
method suffers from a serious flaw: with most hash functions, it is
easy to append data to the message without knowing the key and obtain
another valid MAC. The alternative, appending the key using MAC =
H(message ∥ key), suffers from the problem that an attacker who can
find a collision in the (unkeyed) hash function has a collision in the
MAC. Using MAC = H(key ∥ message ∥ key) is better, however various
security papers have suggested vulnerabilities with this approach,
even when two different keys are used.
For more details on the security implications of both this and length truncation, see sections 5 and 6 of RFC2104.
One option is to use a hash as Matteo describes.
Another is to use a block cipher (e.g. AES). Just pick a random nonce and invoke the cipher in counter mode using your serial numbers as the counter.
Of course, this will make the keys invertible, which may or may not be a desirable property.
You can use an Xorshift random number generator to generate a unique 64-bit key, and then encode that key using whatever scheme you want. If you use base-64, the key is 11 characters long. If you use hex encoding, the key would be 16 characters long.
The Xorshift RNG is basically just a bit mixer, and there are versions that have a guaranteed period of 2^64, meaning that it's guaranteed to generate a unique value for every input.
The other option is to use a linear feedback shift register, which also will generate a unique number for each different input.

how to resize string [ group of numbers and char]?

Good Afternoon all,
I am working over rsa encryption and decryption, for more security i am also using padding in cipher text, for different input (amit) , i am getting different length output like-
plain text- amit
cipher text-10001123A234A987A765A
My problem is- For big plain text ,my algo generate large size cipher text, and i thought,
it is wastage of resources to keep long string in database ,
Is there any way with the help of that i can compact cipher and convert real cipher when i will require?
In order for the algorithm to be encryption and not just hashing, it must be reversible. To be reversible, the output must contain as much information as the input, and so is unlikely to be significantly shorter.
You may compress the data before encryption. There's not a lot else you can do unless you're willing to give up the ability to recover your original text from the ciphertext.
There are a couple of possibilities:
Change your encryption scheme there are schemes where the size is same as the input size
Compress your data before you encrypt, this will be effective only if you have a large block of text to encrypt and then there's the additional overhead of decrypting too.
This doesn't apply to RSA specifically, but: any secure cipher will give output close to indistinguishable from a random bit pattern. A random bit pattern has, per definition, maximum information theoretic entropy, since for each bit, both 0 and 1 are equally likely.
Now, you want a lossless compression scheme, since you need to be able to decompress to the exact data you originally compressed. An optimal compression scheme will maximize the entropy of it's output. However, we know that the output of our cipher already has maximum entropy, so we can't possibly increase the entropy.
And thus, trying to compress encrypted data is useless.
Note: Depending on your encryption method, compression might be possible, for example, when using a block cipher in EBC mode. RSA is a completely different beast altogether though, and, well, compressing won't do anything (except quite possibly make your final output bigger).
[Edit] Also, the length of your RSA ciphertext will be in the order of log n. With n your public modulus. This is the reason that, especially for small plaintexts, public key crypto is extremely 'wasteful'. You normally employ RSA to setup a (smaller, e.g. 128-bit) symmetric key between two parties and then encrypt your data with a symmetric key algorithm such as AES. AES has a block size of 128 bits, so if you do straightforward encryption of your data, the maximum 'overhead' you incur will be length(message) mod 128 bits.
erm ... you wrote in a comment here that you apply RSA encryption to all single characters:
i am using rsa- it perform over
numbers to convert amit in cipher text
first i do a->97 m->109 i->105..and
then apply rsa over 97 ,109 ... then i
get different integers for 109, 105 or
... i joined that all as a string...
a good advice: don't do that since you will lose the security of RSA
if you use RSA in this way, your scheme becomes a substitution-cypher (with only one substitution alphabet) ... given a reasonably long cypher-text or a reasonable number of cypher-texts, this scheme can be broken by analyzing the frequency of cypher-text-chars
see RSAES-OAEP for a padding scheme to apply to your plaintext before encryption

Resources