Ceaser Cipher crack using C language

Ceaser Cipher crack using C language - c

I am writing a program to decrypt text using ceaser cypher algorithm.
till now my code is working fine and gets all possible decrypted results but I have to show just the correct one, how can I do this?
below is the code to get all decrypted strings.
for my code answer should be "3 hello world".
void main(void)
{
char input[] = "gourz#roohk";
for(int key = 1;x<26;key++)
{
printf("%i",input[I]-x%26);
for(int i = strlen(input)-1;i>=0;i--)
{
printf("%c",input[I]-x%26);
}
}
}

Recall that a Caesar Cipher has only 25 possible shifts. Also, for text of non-trivial length, it's highly likely that only one shift will make the input make sense. One possible approach, then, is to see if the result of the shift makes sense; if it does, then it's probably the correct shift (e.g. compare words against a dictionary to see if they're "real" words; not sure if you've done web services yet, but there are free dictionary APIs available).
Consider the following text: 3 uryyb jbeyq. Some possible shifts of this:
3 gdkkn vnqkc (12)
3 xubbe mehbt (3)
3 hello world (13)
3 jgnnq yqtnf (15)
Etc.
As you can see, only the shift of 13 makes this text contain "real" words, so the correct shift is probably 13.
Another possible solution (albeit more complicated) is through frequency analysis (i.e. see if the resulting text has the same - or similar - statistical characteristics as English). For example, in English the most frequent letter is "e," so the correct shift will likely have "e" as the most frequent letter. By way of example, the first paragraph of this answer contains 48 instances of the letter "e", but if you shift it by 15 letters, it only has 8:
Gtrpaa iwpi p Rpthpg Rxewtg wph dcan 25 edhhxqat hwxuih. Pahd, udg
itmi du cdc-igxkxpa atcviw, xi'h wxvwan axztan iwpi dcan dct hwxui
lxaa bpzt iwt xceji bpzt htcht. Dct edhhxqat peegdprw, iwtc, xh id htt
xu iwt gthjai du iwt hwxui bpzth htcht; xu xi sdth, iwtc xi'h egdqpqan
iwt rdggtri hwxui (t.v. rdbepgt ldgsh pvpxchi p sxrixdcpgn id htt xu
iwtn'gt "gtpa" ldgsh; cdi hjgt xu ndj'kt sdct ltq htgkxrth nti, qji
iwtgt pgt ugtt sxrixdcpgn PEXh pkpxapqat).
The key word here is "likely" - it's not at all statistically certain (especially for shorter texts) and it's possible to write text that's resistant to that technique to some degree (e.g. through deliberate misspellings, lipograms, etc.). Note that I actually have an example of an exception above - "3 xubbe mehbt" has more instances of the letter "e" than "3 hello world" even though the second one is clearly the correct shift - so you probably want to apply several statistical tests to increase your confidence (especially for shorter texts).

Hello to make an attack on caesar cipher more speed way is the frequency analysis attack where you count the frequency of each letter in your text and how many times it appeared and compare this letter to the most appearing letters in English in this link
( https://www3.nd.edu/~busiforc/handouts/cryptography/letterfrequencies.html )
then by applying this table to the letters you can git the text or use this link its a code on get hub (https://github.com/tombusby/understanding-cryptography-exercises/blob/master/Chapter-01/ex1.2.py)
in python for letter frequency last resort answer is the brute force because its more complexity compared to the frequency analysis
brute force here is 26! which means by getting a letter the space of search of letters decrease by one
if you want to use your code you can make a file for the most popular strings in english and every time you decrypt you search in this file but this is high cost of time to do so letter frequency is more better

Related

How to Sort By using J

I'm used to sort by operation which many languages afford. It takes some comparator and sorts by it.
What I want to do is to sort the following words firstly by length and then by letter order. Help me please.
I didn't find anything about it in Phrases or Dictionary on jsoftware, except from sorting and grading numerical values.
words=: >;:'CLOUD USB NETWORK LAN SERVER FIREWIRE CLIENT PEER'
] alpha=: a. {~ (i.26) + a.i.'A'
ABCDEFGHIJKLMNOPQRSTUVWXYZ
;/ words/: alpha i. words
┌────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│CLIENT │CLOUD │FIREWIRE│LAN │NETWORK │PEER │SERVER │USB │
└────────┴────────┴────────┴────────┴────────┴────────┴────────┴────────┘
My first crazy idea is to shift each word to right boundary of the array, e.g.
ABC
DEFG
XY
Then for whitespace assign the extreme ranking (in y argument of sorting primitive). And then shift each word back :D. It would be highly inefficient, I can't see another J-way.
Update
Here is the Wolfram Language code for my problem:
StringSplit # "CLOUD USB NETWORK LAN SERVER FIREWIRE CLIENT PEER"
~SortBy~
(Reverse # ComposeList[{StringLength}, #] &)
If I want to prioritize longer words, I just append Minus #* to StringLength.
Basically my sorting order here is {{5, "CLOUD"}, {3, "USB"}, {7, "NETWORK"}, ...}.
I can make the same array in J using (,.~ #&.>) applied to boxed words, but how do I use sorting primitives then? Maybe this is the right first step? I'm still not sure, but it sound much better than my first guess :).

As requested, I have promoted answers suggested in the comments to the main body of the answer.
First of all, I don't think it is a good idea to apply > to the list of words because that will add fill which destroys information about the length of each word. So start with
words=: ;:'CLOUD USB NETWORK LAN SERVER FIREWIRE CLIENT PEER'
Then you need a function foo that turns each word into a value that will sort in the order you wish using \: or /:
words \: foo words
will do the trick. Or, depending on whether you think hooks are pretty or ugly, you could express that as
(/: foo) words
My original suggestion for foo used #. to express each word as a single number:
foo=: (128&#.)#(a.&i.)#>
foo words
18145864388 1403330 345441182148939 1253582 2870553715410 39730390339053893 2322657797972 168911570
(/: foo) words
┌───┬───┬────┬─────┬──────┬──────┬───────┬────────┐
│LAN│USB│PEER│CLOUD│CLIENT│SERVER│NETWORK│FIREWIRE│
└───┴───┴────┴─────┴──────┴──────┴───────┴────────┘
In a comment, Danylo Dubinin pointed out that instead of encoding the word, we can simply catenate the word length to the front of the index vector and sort using that:
(/: (# , a.&i.)#>) words
┌───┬───┬────┬─────┬──────┬──────┬───────┬────────┐
│LAN│USB│PEER│CLOUD│CLIENT│SERVER│NETWORK│FIREWIRE│
└───┴───┴────┴─────┴──────┴──────┴───────┴────────┘

How can I detect what alphabet is being used in utf8?

I want a quick and dirty way of determining what language the user is writing in. I know that there is a Google API which will detect the difference between French and Spanish (even though they both use mostly the same alphabet), but I don't want the latency. Essentially, I know that the Latin alphabet has a lot of confusion as to what language it is using. Other alphabets, however, don't. For example, if there is a character using hiragana (part of the Japanese writing system) there is no confusion as to the language. Therefore, I don't need to ask Google.
Therefore, I would like to be able to do something simple like say that שלום uses the Hebrew alphabet and こんにちは uses Japanese characters. How do I get that alphabet string?
"Bonjour", "Hello", etc. should return "Latin" or "English" (Then I'll ask Google for the real language). "こんにちは" should return "Hiragana" or "Japanese". "שלום" should return "Hebrew".

I'd suggest looking at the Unicode "Script" property. The latest database can be found here.
For a quick and dirty implementation, I'd try scanning all of the characters in the target text and looking up the script name for each one. Pick whichever script has the most characters.

Use an N-gram model and then give a sufficiently large set of training data. A full example describing this technique is to be found at this page, among others:
http://phpir.com/language-detection-with-n-grams/
Although the article assumes you are implementing in PHP and by "language" you mean something like English, Italian, etc... the description may be implemented in C if you require this, and instead of using "language" as in English, etc. for the training, just use your notion of "alphabet" for the training. For example, look at all of your "Latin alphabet" strings together and consider their n-grams for n=2:
Bonjour: "Bo", "on", "nj", "jo", "ou", "ur"
Hello: "He", "el", "ll", "lo"
With enough training data, you will discover dominant combinations that are likely for all Latin text, for example, perhaps "Bo" and "el" are quite probable for text written in the "Latin alphabet". Likewise, these combinations are probably quite rare in text that is written in the "Hiragana alphabet". Similar discoveries will be made with any other alphabet classification for which you can provide sufficient training data.
This technique is also known as a Hidden Markov model or a Markov chain; searching for these keywords will give more ideas for implementation. For "quick and dirty" I would use n=2 and gather just enough training data such that the least common letter from each alphabet is encountered at least once... e.g. at least one 'z' and at least one 'ぅ' *little hiragana u.
EDIT:
For a simpler solution than N-Grams, use only basic statistical tests -- min, max and average -- to compare your Input (a string given by the user) with an Alphabet (a string of all characters in one of the alphabets you are interested).
Step 1. Place all the numerical values of the Alphabet (e.g. utf8 codes) in an array. For example, if the Alphabet to be tested against is "Basic Latin", make an array DEF := {32, 33, 34, ..., 122}.
Step 2. Place all the numerical values of the Input into an array, for example, make an array INP := {73, 102, 32, ...}.
Step 3. Calculate a score for the input based on INP and DEF. If INP really comes from the same alphabet as DEF, then I would expect the following statements to be true:
min(INP) >= min(DEF)
max(INP) <= max(DEF)
avg(INP) - avg(DEF) < EPS, where EPS is a suitable constant
If all statements are true, the score should be close to 1.0. If all are false, the score should close to 0.0. After this "Score" routine is defined, all that's left is to repeat it on each alphabet you are interested in and choose the one whiich gives the highest score for a given Input.

How do Markov Chain Chatbots work?

I was thinking of creating a chatbot using something like markov chains, but I'm not entirely sure how to get it to work. From what I understand, you create a table from data with a given word and then words which follow. Is it possible to attach any sort of probability or counter while training the bot? Is that even a good idea?
The second part of the problem is with keywords. Assuming I can already identify keywords from user input, how do I generate a sentence which uses that keyword? I don't always want to start the sentence with the keyword, so how do I seed the markov chain?

I made a Markov chain chatbot for IRC in Python a few years back and can shed some light how I did it. The text generated does not necessarily make any sense, but it can be really fun to read. Lets break it down in steps. Assuming you have a fixed input, a text file, (you can use input from chat text or lyrics or just use your imagination)
Loop through the text and make a Dictionary, meaning key - value container. And put all pair of words as keys and the word following as a value.
For example: If you have a text "a b c a b k" you start with "a b" as key and "c" as value, then "b c" and "a" as value... the value should be a list or any collection holding 0..many 'items' as you can have more than one value for a given pair of words. In the example above you will have "a b" two times followed fist by "c" then in the end by "k". So in the end you will have a dictionary/hash looking like this: {'a b': ['c','k'], 'b c': ['a'], 'c a': ['b']}
Now you have the needed structure for building your funky text. You can choose to start with a random key or a fixed place! So given the structure we have we can start by saving "a b" then randomly taking a following word from the value, c or k, so the first save in the loop, "a b k" (if "k" was the random value chosen) then you continue by moving one step to the right which in our case is "b k" and save a random value for that pair if you have, in our case no, so you break out of the loop (or you can decide other stuff like start over again). When to loop is done you print your saved text string.
The bigger the input, the more values you will have for you keys (pair of words) and will then have a "smarter bot" so you can "train" your bot by adding more text (perhaps chat input?). If you have a book as input, you can construct some nice random sentences. Please note that you don't have to take only one word that follows a pair as a value, you can take 2 or 10. The difference is that your text will appear more accurate if you use "longer" building blocks. Start with a pair as a key and the following word as a value.
So you see that you basically can have two steps, first make a structure where you randomly choose a key to start with then take that key and print a random value of that key and continue till you do not have a value or some other condition. If you want you can "seed" a pair of words from a chat input from your key-value structure to have a start. Its up to your imagination how to start your chain.
Example with real words:
"hi my name is Al and i live in a box that i like very much and i can live in there as long as i want"
"hi my" -> ["name"]
"my name" -> ["is"]
"name is" -> ["Al"]
"is Al" -> ["and"]
........
"and i" -> ["live", "can"]
........
"i can" -> ["live"]
......
Now construct a loop:
Pick a random key, say "hi my" and randomly choose a value, only one here so its "name"
(SAVING "hi my name").
Now move one step to the right taking "my name" as the next key and pick a random value... "is"
(SAVING "hi my name is").
Now move and take "name is" ... "Al"
(SAVING "hi my name is AL").
Now take "is Al" ... "and"
(SAVING "hi my name is Al and").
...
When you come to "and i" you will randomly choose a value, lets say "can", then the word "i can" is made etc... when you come to your stop condition or that you have no values print the constructed string in our case:
"hi my name is Al and i can live in there as long as i want"
If you have more values you can jump to any keys. The more values the more combinations you have and the more random and fun the text will be.

The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it’s said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.
It isn’t very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.
Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.
I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…
here is the trainer
#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
for word in line.split():
text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
working=[]
check=textset[l]
for w in range(len(text)-1):
if check==text[w] and text[w][-1] not in '(),.?!':
working.append(str(text[w+1]))
follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()
Here is the bot:
#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
if a in successorlist:
return random.choice(successorlist[a])
else:
return 'the'
speech=''
while speech!='quit':
speech=raw_input('>')
s=random.choice(speech.split())
response=''
while True:
neword=nextword(s)
response+=' '+neword
s=neword
if neword[-1] in ',?!.':
break
print response
You tend to get an uncanny feeling when it says something that seems partially to make sense.

You could do like this:
Make a order 1 markov chain generator, using words and not letters.
Everytime someone post something, what he posted is added to bot database.
Also bot would save when he gone to chat and when a guy posted the first post (in multiples of 10 seconds), then he would save the amount of time this same guy waited to post again (in multiples of 10 seconds)...
This second part would be used to see when the guy will post, so he join the chat and after some amount of time based on a table with "after how many 10 seconds the a guy posted after joining the chat", then he would continue to post with the same table thinking "how was the amount of time used to write the the post that was posted after a post that he used X seconds to think about and write"

How to crack a weakened TEA block cipher?

At the moment I am trying to crack the TEA block cipher in C. It is an assignment and the tea cipher has been weakend so that the key is 2 16-bit numbers.
We have been given the code to encode plaintext using the key and to decode the cipher text with the key also.
I have the some plaintext examples:
plaintext(1234,5678) encoded (3e08,fbab)
plaintext(6789,dabc) encoded (6617,72b5)
Update
The encode method takes in plaintext and a key, encode(plaintext,key1). This occurs again with another key to create the encoded message, encode(ciphertext1,key), which then creates the encoded (3e08,fbab) or encoded (6617,72b5).
How would I go about cracking this cipher?
At the moment, I encode the known plaintext with every possible key; the key size being hex value ffffffff. I write this to file.
But now I am stuck and in need of direction.
How could I use the TEA's weakness of equivalent keys to lower the amount of time it would take to crack the cipher? Also, I am going to use a man in the middle attack.
As when I encode with known plaintext and all key 1s it will create all the encrypted text with associated key and store it in a table.
Then I will decrypt with the known ciphertext that is in my assignment with all the possible values of key2. This will leave me with a table of decrypts that has only been decrypted once.
I can then compare the 2 tables together to see if any of encrpts with key1 match the decrypts with key2.
I would like to use the equilenvent weakness as well, if someone could help me with implmenting this in code that would be great. Any ideas?

This is eerily similar to the Double Crypt problem from the IOI '2001 programming contest. The general solution is shown here, it won't give you the code but might point you in the right direction.

Don't write your results to a file -- just compare each ciphertext you produce to the known ciphertext, encoding the known plain text with every possible key until one of them produces the right ciphertext. At that point, you've used the correct key. Verify that by encrypting the second known plaintext with the same key to check that it produces the correct output as well.
Edit: the encoding occurring twice is of little consequence. You still get something like this:
for (test_key=0; test_key<max; test_key++)
if (encrypt(plaintext, test_key) == ciphertext)
std::cout << "Key = " << test_key << "\n";
The encryption occurring twice means your encrypt would look something like:
return TEA_encrypt(TEA_encrypt(plaintext, key), key);
Edit2: okay, based on the edited question, you apparently have to do the weakened TEA twice, each with its own 16-bit key. You could do that with a single loop like above, and split up the test_key into two independent 16-bit keys, or you could do a nested loop, something like:
for (test_key1=0; test_key1<0xffff; test_key1++)
for (test_key2=0; test_key2<0xffff; test_key2++)
if (encrypt(encrypt(plaintext, test_key1), test_key2) == ciphertext)
// we found the keys.

I am not sure if this property holds for 16-bit keys, but 128-bit keys have the property that four keys are equivalent, reducing your search space by four-fold. I do not off the top of my head remember how to find equivalent keys, only that the key space is not as large as it appears. This means that it's susceptible to a related-key attack.
You tagged this as homework, so I am not sure if there are other requirements here, like not using brute force, which it appears that you are attempting to do. If you were to go for a brute force attack, you would probably need to know what the plaintext should look like (like knowing it English, for example).

The equivalent keys are easy enough to understand and cut key space by a factor of four. The key is split into four parts. Each cycle of TEA has two rounds. The first uses the first two parts of the key while the second uses the 3rd and 4th parts. Here is a diagram of a single cycle (two rounds) of TEA:
(unregistered users are not allowed to include images so here's a link)
https://en.wikipedia.org/wiki/File:TEA_InfoBox_Diagram.png
Note: green boxes are addition red circles are XOR
TEA operates on blocks which it splits into two halves. During each round, one half of the block is shifted by 4,0 or -5 bits to the left, has a part of the key or the round constant added to it and then the XOR of the resulting values is added to the other half of the block. Flipping the most significant bit of either key segment flips the same bit in the sums it is used for and by extension the XOR result but has no other effect. Flipping the most significant bit of both key segments used in a round flips the same bit in the XOR product twice leaving it unchanged. Flipping those two bits together doesn't change the block cipher result making the flipped key equivalent to the original. This can be done for both the (first/second) and (third/fourth) key segments reducing the effective number of keys by a factor of four.

Given the (modest) size of your encryption key, you can afford to create a pre-calculated table (use the same code given above, and store data in large chuncks of memory - if you don have enough RAM, dump the chuncks to disk and keep an addressing scheme so you can lookup them in a proper order).
Doing this will let you cover the whole domain and finding a solution will then be done in real-time (one single table lookup).
The same trick (key truncation) was used (not a long time ago) in leading Office software. They now use non-random data to generate the encryption keys -which (at best) leads to the same result. In practice, the ability to know encryption keys before they are generated (because the so-called random generator is predictable) is even more desirable than key-truncation (it leads to the same result -but without the hurdle of having to build and store rainbow tables).
This is called the march of progress...

Determining which inputs to weigh in an evolutionary algorithm

I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?

In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.

I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.

Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight