Checking every "word" from aaa..a to zzz..z - c

My program is supposed to be a brute force password cracker (school assignment).
The input arguments are as follows..
./crack threads keysize target
The program needs to check passwords of length keysize, but also need to check shorter ones.
I am unsure of how to go about writing something that will just change one letter at a time, and then continue doing this..
(keysize max is going to 8)
Example..
keysize = 5, so a loop (I think) would need to modify something that is equal to "aaaaa" to "aaaab" to "aaaac", putting each result into crypt_r(), along with the salt (first two characters of target) until a match is found.
I'm using crypt_r because the next step is to add multi-threading.
Unsure if anything else is really needed to explain this question. Would be glad to clarify.

Let's see. There are 10^n possible n-digit decimal numbers. So there are 26^8 possible 8-character passwords that use only the letters a-z. That works out to 208,827,064,576.
You can keep track of the numbers with a simple 64-bit counter, and then convert the number to a base-26 representation. Something like:
long max = 208827064576;
longlong counter = 0;
while (counter < max)
{
char password[9];
GetPassword(counter, password);
// do whatever you want with the password
++counter;
}
void GetPassword(longlong count, char* pass)
{
int i;
int rem;
if (count == 0)
{
pass[0] = 'a';
pass[1] = '\0';
return;
}
i = 0;
do
{
int rem = count % 26;
pass[i] = 'a' + rem;
++i;
count /= 26;
} while (count > 0)
}
You can easily make this available to multiple threads by using interlocked increments of the counter variable. Or you can split the search space so that one thread starts at 0, one thread starts at 26^7 (which would be baaaaaaa), etc.
200 billion is a reasonably large number. A billion seconds works out to something close to 32 years. Even if you could check a few thousand of these per second (unlikely), it would take you quite some time to do an exhaustive search.

This code will cycle through all passwords no longer than given lengh and containing only letters from a to z:
#include <stdio.h>
int main(void) {
char password[9] = {0};
int keysize = 5;
for (;;) {
// get next password value
// we do it by adding 1 in 26-al system
int level = 0; // current level, starts at 0
while (level < keysize) {
if (password[level] == 0) {
password[level] = 'a';
break;
}
if (password[level] >= 'a' && password[level] < 'z') {
password[level]++;
break;
}
if (password[level] == 'z') {
password[level] = 'a';
level++;
}
}
if (level >= keysize)
break; // we have checked all passwords!
// check if password matches:
//printf("Checking password: '%s'\n", password);
if (check_password(password)) {
printf("Hooray! Password found: %s\n", password);
break;
}
}
return 0;
}
If you limit aphabet to a, b, c and set keysize=4, it checks following passwords:
a b c aa ba ca ab bb cb ac bc cc aaa baa caa aba bba cba aca bca cca aab bab cab abb bbb cbb acb bcb ccb aac bac cac abc bbc cbc acc bcc ccc aaaa baaa caaa abaa bbaa cbaa acaa bcaa ccaa aaba baba caba abba bbba cbba acba bcba ccba aaca baca caca abca bbca cbca acca bcca ccca aaab baab caab abab bbab cbab acab bcab ccab aabb babb cabb abbb bbbb cbbb acbb bcbb ccbb aacb bacb cacb abcb bbcb cbcb accb bccb cccb aaac baac caac abac bbac cbac acac bcac ccac aabc babc cabc abbc bbbc cbbc acbc bcbc ccbc aacc bacc cacc abcc bbcc cbcc accc bccc cccc
See this example at IdeOne DEMO.

Related

How to count the frequency of the two first letters in a word from a dictionary?

I have a 143k lowcase word dictionary and I want to count the frequency of the first two letters
(ie: aa* = 14, ab* = 534, ac = 714 ... za = 65, ... zz = 0 ) and put it in a bidimensional array.
However I have no idea how to even go about iterating them without switches or a bunch of if elses I tried looking on google for a solution to this but I could only find counting amount of letters in the whole word and mostly only things in python.
I've sat here for a while thinking how could I do this and my brain keeps blocking this was what I came up with but I really don't know where to head.
int main (void) {
char *line = NULL;
size_t len = 0;
ssize_t read;
char *arr[143091];
FILE *fp = fopen("large", “r”);
if (*fp == NULL)
{
return 1;
}
int i = 0;
while ((read = getline(&line, &len, fp)) != -1)
{
arr[i] = line;
i++;
}
char c1 = 'a';
char c2 = 'a';
i = 0;
int j = 0;
while (c1 <= 'z')
{
while (arr[k][0] == c1)
{
while (arr[k][1] == c2)
{
}
c2++;
}
c1++;
}
fclose(fp);
if (line)
free(line);
return 0;
}
Am I being an idiot or am I just missing someting really basic? How can I go about this problem?
Edit: I forgot to mention that the dictionary is only lowercase and has some edge cases like just an a or an e and some words have ' (like e'erand e's) there are no accentuated latin characters and they are all accii lowercase
The code assumes that the input has one word per line without leading spaces and will count all words that start with two ASCII letters from 'a'..'z'. As the statement in the question is not fully clear, I further assume that the character encoding is ASCII or at least ASCII compatible. (The question states: "there are no accentuated latin characters and they are all accii lowercase")
If you want to include words that consist of only one letter or words that contain ', the calculation of the index values from the characters would be a bit more complicated. In this case I would add a function to calculate the index from the character value.
Also for non-ASCII letters the simple calculation of the array index would not work.
The program reads the input line by line without storing all lines, checks the input as defined above and converts the first two characters from range 'a'..'z' to index values in range 0..'z'-'a' to count the occurrence in a two-dimensional array.
#include <stdio.h>
#include <stdlib.h>
int main (void) {
char *line = NULL;
size_t len = 0;
ssize_t read;
/* Counter array, initialized with 0. The highest possible index will
* be 'z'-'a', so the size in each dimension is 1 more */
unsigned long count['z'-'a'+1]['z'-'a'+1] = {0};
FILE *fp = fopen("large", "r");
if (fp == NULL)
{
return 1;
}
while ((read = getline(&line, &len, fp)) != -1)
{
/* ignore short input */
if(read >= 2)
{
/* ignore other characters */
if((line[0] >= 'a') && (line[0] <= 'z') &&
(line[1] >= 'a') && (line[1] <= 'z'))
{
/* convert first 2 characters to array index range and count */
count[line[0]-'a'][line[1]-'a']++;
}
}
}
fclose(fp);
if (line)
free(line);
/* example output */
for(int i = 'a'-'a'; i <= 'z'-'a'; i++)
{
for(int j = 'a'-'a'; j <= 'z'-'a'; j++)
{
/* only print combinations that actually occurred */
if(count[i][j] > 0)
{
printf("%c%c %lu\n", i+'a', j+'a', count[i][j]);
}
}
}
return 0;
}
The example input
foo
a
foobar
bar
baz
fish
ford
results in
ba 2
fi 1
fo 3
The idea is to have a two-dimensional array, each dimension holding one of the first two characters of each line. The clever bit is that in C, even a string whose length as reported by strlen() to be 1 has two char's - the character and the trailing 0 at the end, so you don't need to special-case cases like "a". Its frequency is tracked in counts['a'][0].
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
/* Reads input on stdin, outputs to stdout. Using a multibyte
* character encoding will likely cause unusual output; don't do
* that. But it will work with encodings other than ASCII. Also handles
* mixed-cased input.
*/
int main(void) {
int *counts[UCHAR_MAX + 1] = { NULL };
char *line = NULL;
size_t bufsize = 0;
ssize_t len;
// Populate the frequency counts
while ((len = getline(&line, &bufsize, stdin)) > 0) {
if (line[len - 1] == '\n') { // Get rid of newline
line[len - 1] = 0;
}
if (line[0] == 0) { // Skip empty lines
continue;
}
unsigned fc = line[0];
unsigned sc = line[1];
if (!counts[fc]) { // Allocate the second dimension if needed
counts[fc] = calloc(UCHAR_MAX + 1, sizeof(int));
}
counts[fc][sc] += 1;
}
// Print out the frequency table.
for (int fc = 1; fc <= UCHAR_MAX; fc += 1) {
if (!counts[fc]) { // Skip unused first characters
continue;
}
if (counts[fc][0]) { // Single-character line count
printf("%c\t%d\n", fc, counts[fc][0]);
}
for (int sc = 1; sc <= UCHAR_MAX; sc += 1) {
if (counts[fc][sc]) {
printf("%c%c\t%d\n", fc, sc, counts[fc][sc]);
}
}
}
return 0;
}
Example:
$ perl -Ci -ne 'print if /^[[:ascii:]]+$/ && /^[[:lower:]]+$/' /usr/share/dict/american-english-large | ./freqs
a 1
aa 6
ab 483
ac 651
ad 497
ae 112
af 198
ag 235
ah 7
ai 161
etc.
How to count the frequency of the two first letters in a word from a dictionary?
Use a simple state machine to read one character at a time, detect when the character is first 2 letters of a word, then increment a 26x26 table. Words do not need to be on seperate lines. Any word length is allowed.
unsigned long long frequency[26][26] = { 0 }; // Set all to 0
FILE *fp = fopen("large", "r");
...
int ch;
// Below 2 objects are the state machine
int word[2];
int word_length = 0;
while ((ch = fgetc(fp)) != EOF) {
if (isalpha(ch)) {
if (word_length < 2) {
word[word_length++] = tolower(ch);
if (word_length == 2) { // 2nd letter just arrived
assert(word[0] >= 'a' && word[0] <= 'z'); // Note 1
assert(word[1] >= 'a' && word[1] <= 'z');
frequency[word[0] - 'a'][word[1] - 'a']++;
}
}
} else {
word_length = 0; // Make ready for a new word
}
}
for (int L0 = 'a'; L0 <= 'z'; L0++);
for (int L1 = 'a'; L1 <= 'z'; L1++);
unsigned long long sum = frequency[L0 - 'a'][L1 - 'a'];
if (sum) {
printf("%c%c %llu\n", L0, L1, sum);
...
Note 1, in locales that have more than a-z letters, like á, é, í, ó, ú, ü, ñ, additional needed. A simple approach is to use a frequency[256][256] - somewhat memory hoggish.
There is no need to read the entire dictionary into memory, or even to buffer lines. The dictionary consists of words, one per line. This means it has this structure:
"aardvark\nabacus\n"
The first two characters of the file are the first digraph. The other interesting digraphs are all characters which immediately follow a newline.
This can be read by a state machine, which we can code into a loop like this. Suppose f is the FILE * handle to the stream reading from the dictionary file:
for (;;) {
/* Read two characters from the dictionary file. */
int ch0 = getc(f);
int ch1 = getc(f);
/* Is the ch0 newline? That means we read an empty line,
and one character after that. So, let us move that character
into ch0, and read another ch1. Keep doing this until
ch0 is not a newline, and bail at EOF. */
while (ch0 == '\n' && ch1 != EOF) {
ch0 = ch1;
ch1 = getc(f);
}
/* After the above, if we have EOF, we are done: bail the loop */
if (ch0 == EOF || ch1 == EOF)
break;
/* We know that ch0 isn't newline. But ch1 could be newline;
i.e. we found a one-letter-long dictionary entry. We don't
process those, only two or more letters. */
if (ch1 != '\n') {
/* Here we put the code which looks up the ch0-ch1 pair
in our frequency table and increments the count. */
}
/* Now drop characters until the end of the line. If ch1
is newline, we are already there. If not, let's just use
ch1 for reading more characters until we get a newline. */
while (ch1 != '\n' && ch1 != EOF)
ch = getc(f);
/* Watch out for EOF in the middle of a line that isn't
newline-terminated. */
if (ch == EOF)
break;
}
I would do this with a state machine:
enum { begin, have_ch0, scan_eol } state = begin;
int ch0, ch1;
for (;;) {
int c = getc(f);
if (c == EOF)
break;
switch (state) {
case begin:
/* stay in begin state if newline seen */
if (c != \n') {
/* otherwise accumulate ch0,
and switch to have_ch0 state */
ch0 = c;
state = have_ch0;
}
break;
case have_ch0:
if (c == '\n') {
/* newline in ch0 state: back to begin */
state = begin;
} else {
/* we got a second character! */
ch1 = c;
/* code for processing ch0 and ch1 goes here! */
state = scan_eol; /* switch to scanning for EOL. */
}
break;
case scan_eol:
if (c == '\n') {
/* We got the newline we are looking for; go
to begin state. */
state = begin;
}
break;
}
}
Now we have a tidy loop around a single call to getc. EOF is checked in one place where we bail out of the loop. The state machine recognizes the situation when we have the first two characters of a line which is at least two characters long; there is a single place in the code where to put the logic for dealing with the two characters.
We are not allocating any buffers; we are not malloc-ing lines, so there is nothing to free. There is no limit on the dictionary size we can scan (just we have to watch for overflowing frequency counters).
You are started in the right direction. You do need a 2D array 27 x 27 for a single case (e.g. lowercase or uppercase), not including digits. To handle digits, just add another 11 x 11 array and map 2-digit frequencies there. The reason you can't use a flat 1D array and map to it without serious indexing gymnastics is that the ASCII sum of "ab" and "ba" would be the same.
The 2D array solves that problem allowing the map of the 1st character ASCII value to the first index, and the map of the ASCII of the 2nd character to the 2nd index or after in that row.
An easy way to think of it is to just take a lowercase example. Let's look at the word "accent". You have your 2D array:
+---+---+---+---+---+---+
| a | a | b | c | d | e | ...
+---+---+---+---+---+---+
| b | a | b | c | d | e | ...
+---+---+---+---+---+---+
| c | a | b | c | d | e | ...
+---+---+---+---+---+---+
...
The first column tracks the first letter and then the remaining columns (the next 'a' - 'z' characters) track the 2nd character that follows the first character. (you can do this will an array of struct holding the 1st char and a 26 char array as well -- up to you) This way, you remove ambiguity of which combination "ab" or "ba".
Now note -- you do not actually need a 27 x 27 arrays with the 1st column repeated. Recall, by mapping the ASCII value to the first index, it designates the first character associated with the row on its own, e.g. row[0][..] indicates the first character was 'a'. So a 26 x 26 array is fine (and the same for digits). So you simply need:
+---+---+---+---+---+
| a | b | c | d | e | ...
+---+---+---+---+---+
| a | b | c | d | e | ...
+---+---+---+---+---+
| a | b | c | d | e | ...
+---+---+---+---+---+
...
So the remainder of the approach is simple. Open the file, read the word into a buffer, validate there is a 1st character (e.g. not the nul-character), then validate the 2nd character (continue to get the next word if either validation fails). Convert both to lowercase (or add the additional arrays if tracking both cases -- that gets ugly). Now just map the ASCII value for each character to an index in the array, e.g.
int ltrfreq[ALPHABET][ALPHABET] = {{0}};
...
while (fgets (buf, SZBUF, fp)) { /* read each line into buf */
int ch1 = *buf, ch2; /* initialize ch1 with 1st char */
if (!ch1 || !isalpha(ch1)) /* validate 1st char or get next word */
continue;
ch2 = buf[1]; /* assign 2nd char */
if (!ch1 || !isalpha(ch2)) /* validate 2nd char or get next word */
continue;
ch1 = tolower (ch1); /* convert to lower to eliminate case */
ch2 = tolower (ch2);
ltrfreq[ch1-'a'][ch2-'a']++; /* map ASCII to index, increment */
}
With our example word "accent", that would increment the array element [0][2], so that corresponds to row 0 and column 2 for "ac" in:
+---+---+---+---+---+
| a | b | c | d | e | ...
+---+---+---+---+---+
... ^ [0][2]
Where you increment the value at that index. So ltrfreq[0][2]++ now holds the value 1 for the combination "ac" having been seen once. When encountered again, the element would be incremented to 2 and so on... Since the value is incremented it is imperative the array be initialized all zero when declared.
When you output the results, you just have to remember to subtract 1 from the j index when mapping from index back to ASCII, e.g.
for (int i = 0; i < ALPHABET; i++) /* loop over all 1st char index */
for (int j = 0; j < ALPHABET; j++) /* loop over all 2nd char index */
if (ltrfreq[i][j]) /* map i, j back to ASCII, output freq */
printf ("%c%c = %d\n", i + 'a', j + 'a', ltrfreq[i][j]);
That's it. Putting it altogether in an example that takes the filename to read as the first argument to the program (or reads from stdin if no argument is given), you would have:
#include <stdio.h>
#include <ctype.h>
#define ALPHABET 26
#define SZBUF 1024
int main (int argc, char **argv) {
char buf[SZBUF] = "";
int ltrfreq[ALPHABET][ALPHABET] = {{0}};
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (buf, SZBUF, fp)) { /* read each line into buf */
int ch1 = *buf, ch2; /* initialize ch1 with 1st char */
if (!ch1 || !isalpha(ch1)) /* validate 1st char or get next word */
continue;
ch2 = buf[1]; /* assign 2nd char */
if (!ch1 || !isalpha(ch2)) /* validate 2nd char or get next word */
continue;
ch1 = tolower (ch1); /* convert to lower to eliminate case */
ch2 = tolower (ch2);
ltrfreq[ch1-'a'][ch2-'a']++; /* map ASCII to index, increment */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (int i = 0; i < ALPHABET; i++) /* loop over all 1st char index */
for (int j = 0; j < ALPHABET; j++) /* loop over all 2nd char index */
if (ltrfreq[i][j]) /* map i, j back to ASCII, output freq */
printf ("%c%c = %d\n", i + 'a', j + 'a', ltrfreq[i][j]);
}
Example Input Dictionary
In the file dat/ltrfreq2.txt:
$ cat dat/ltrfreq2.txt
My
dog
has
fleas
and
my
cat
has
none
lucky
cat!
Example Use/Output
$ ./bin/ltrfreq2 dat/ltrfreq2.txt
an = 1
ca = 2
do = 1
fl = 1
ha = 2
lu = 1
my = 2
no = 1
Where both "cat" words accurately account for ca = 2, both "has" for ha = 2 and "My" and "my" for my = 2. The rest are just the 2 character prefixes for words that appear once in the dictionary.
Or with the entire 307993 words dictionary that comes with SuSE, timed to show the efficiency of the approach (all within 15 ms):
$ time ./bin/ltrfreq2 /var/lib/dict/words
aa = 40
ab = 990
ac = 1391
ad = 1032
ae = 338
af = 411
ag = 608
ah = 68
ai = 369
aj = 18
ak = 70
al = 2029
...
zn = 2
zo = 434
zr = 2
zs = 2
zu = 57
zw = 25
zy = 135
zz = 1
real 0m0.015s
user 0m0.015s
sys 0m0.001s
A bit about the array type. Since you have 143K words, that rules out using a short or unsigned short type -- just in case you have a bad dictionary with all 143K words being "aardvark".... The int type is more than capable of handling all words -- even if you have a bad dictionary containing only "aardvark".
Look things over and let me know if this is what you need, if not let me know where I misunderstood. Also, let me know if you have further questions.
Such job is more suitable for languages like Python, Perl, Ruby etc. instead of C. I suggest at least trying C++.
If you don't have to write it in C, here is my Python version: (since you didn't mention it in the question - are you working on an embedded system or something where C/ASM are the only options?)
FILENAME = '/etc/dictionaries-common/words'
with open(FILENAME) as f:
flattened = [ line[:2] for line in f ]
dic = {
key: flattened.count(key)
for key in sorted(frozenset(flattened))
}
for k, v in dic.items():
print(f'{k} = {v}')
Outputs:
A' = 1
AM = 2
AO = 2
AW = 2
Aa = 6
Ab = 44
Ac = 37
Ad = 68
Ae = 18
Af = 22
Ag = 36
Ah = 12
Ai = 17
Aj = 2
Ak = 14
Al = 284
Am = 91
An = 223
Ap = 44
Aq = 13
Ar = 185
As = 88
At = 56
Au = 81
Av = 28
Ax = 2
... ...

Check if a number is a palindrome: if not sum the number and the number with reversed digits and continue checking

I´m learning C and doing the second task which is to check if an integer is a palindrome. If yes, I shall return the number; if not, I shall sum up the number and the number with reversed digits and check again.
Example:
Number: 195
195 + 591 = 786
786 + 687 = 1473
1473 + 3741 = 5214
5214 + 4125 = 9339 (is palindrome)
If the program has checked 20 times and it´s still not a palindrome, I shall return 0.
My program looks like this:
int addRev(int n) {
int count;
int reversed = 0;
int remain;
int original;
original = n;
while (n != 0) {
remain = n % 10;
reversed = reversed * 10 + remain;
n /= 10;
}
if (original==reversed){
return original;
}
else{
original+=reversed;
}
return original;
}
I have checked the program so far. If I test it with 191, it's a palindrome and returns it. If I test with 195, it is not a palindrome and the function returns 786.
But what is the next step? Do I need a second while() to continue with 786?
The key is a requrement to make up to 20 tests.
So the algorithm may look like this
for i = 1 up to 20
{
reversed = .......
if original == reversed
return original // it's a palindrome
original += reversed
}
return 0 // no palindrome after 20 iterations

Easiest way to make 4 letters permutations in C

I need to write a function that returns something like this, given only "1234":
char *permutations[] = {"1234","1324", "1342","1423","1432","2134",
"2143","2314","2341","2413","2431","3124","3142",
"3214","3241","3412","3421","4123","4132","4213",
"4231","4312","4321", "1243"};
I wrote "letters" in the title because I actually need those numbers to be characters. I have read various posts here, but all aff them are very complicated since they try to get permutations for an N letters string. I don't need that, I only need those combinations but have no clue whatsoever about how to do this. I know I should tell what I tried, but I really do have nothing yet, since all codes I have tried inspiring at use concepts I am totally unfamiliar like backtracking and recursion. Is there a really 'easy' way to do this without using any libraries?
Hoping you will understand, I thank you all very much in advance.
Here is an example :
A. The first number has 4 possible values, we simply traverse them in the outer loop.
B. The same is done for the second at each pass of the outer loop, except that we avoid duplicating A so that loop has a length of 3.
C. The same for C whose the loop has a length of 2 because it avoids duplicating A and B.
D. The last value has a unique choice and we can find it this way : D = 10 - (A + B + C), where 10 = 1+2+3+4.
Each time a new D is found we collect the sequence of four numbers and that finally results in a sorted series of length 24.
static inline char increase(char c) { return (c - '0') % 4 + '1'; }
#define A seq[0]
#define B seq[1]
#define C seq[2]
#define D seq[3]
int main()
{
char seq[5] = { '0', '0', '0', '0', 0};
char collector[24][5];
char icollect = 0;
//--- A
for (char i = 0; i<4; i++) {
A = increase(A);
//---- B
B = '0';
for (char j = 0; j<3; j++) {
B = increase(B);
//---- C
C = '0';
if (B == A) B = increase(B);
for (char k = 0; k<2; k++) {
do { C = increase(C); } while (C == A || C == B);
//---- D
D = 10 - ((A - '0') + (B - '0') + (C - '0')) + '0';
//Collects
memcpy(collector[icollect++], seq, 5);
}
}
}
//Prints
for (char i=0; i<24; i++) {
printf("%s%c", collector[i], i && (i+1)%6==0 ? '\n' : ' ');
}
}
/* printed
1234 1243 1324 1342 1423 1432
2134 2143 2314 2341 2413 2431
3124 3142 3214 3241 3412 3421
4123 4132 4213 4231 4312 4321
*/

C : Sum of reverse numbers

So I want to solve an exercise in C or in SML but I just can't come up with an algorithm that does so. Firstly I will write the exercise and then the problems I'm having with it so you can help me a bit.
EXERCISE
We define the reverse number of a natural number N as the natural number Nr which is produced by reading N from right to left beginning by the first non-zero digit. For example if N = 4236 then Nr = 6324 and if N = 5400 then Nr = 45.
So given any natural number G (1≤G≤10^100000) write a program in C that tests if G can occur by the sum of a natural number N and its reverse Nr. If there is such a number then the program must return this N. If there isn't then the program must return 0. The input number G will be given through a txt file consisted only by 1 line.
For example, using C, if number1.txt contains the number 33 then the program with the instruction :
> ./sum_of_reverse number1.txt
could return for example 12, because 12+21 = 33 or 30 because 30 + 3 = 33. If number1.txt contains the number 42 then the program will return 0.
Now in ML if number1.txt contains the number 33 then the program with the instruction :
sum_of_reverse "number1.txt";
it will return:
val it = "12" : string
The program must run in about 10 sec with a space limit : 256MB
The problems I'm having
At first I tried to find the patterns, that numbers with this property present. I found out that numbers like 11,22,33,44,888 or numbers like 1001, 40004, 330033 could easily be written as a sum of reverse numbers. But then I found out that these numbers seem endless because of numbers for example 14443 = 7676 + 6767 or 115950 = 36987 + 78963.
Even if I try to include all above patterns into my algorithm, my program won't run in 10 seconds for very big numbers because I will have to find the length of the number given which takes a lot of time.
Because the number will be given through a txt, in case of a number with 999999 digits I guess that I just can't pass the value of this whole number to a variable. The same with the result. I assume that you are going to save it to a txt first and then print it??
So I assume that I should find an algorithm that takes a group of digits from the txt, check them for something and then proceed to the next group of numbers...?
Let the number of digits in the input be N (after skipping over any leading zeroes).
Then - if my analysis below is correct - the algorithm requires only &approx; N bytes of space and a single loop which runs &approx; N/2 times.
No special "big number" routines or recursive functions are required.
Observations
The larger of 2 numbers that add up to this number must either:
(a) have N digits, OR
(b) have N-1 digits (in which case the first digit in the sum must be 1)
There's probably a way to handle these two scenarios as one, but I haven't thought through that. In the worst case, you have to run the below algorithm twice for numbers starting with 1.
Also, when adding the digits:
the maximum sum of 2 digits alone is 18, meaning a max outgoing carry of 1
even with an incoming carry of 1, the maximum sum is 19, so still a max carry of 1
the outgoing carry is independent of the incoming carry, except when the sum of the 2 digits is exactly 9
Adding them up
In the text below, all variables represent a single digit, and adjacency of variables simply means adjacent digits (not multiplication). The ⊕ operator denotes the sum modulo 10. I use the notation xc XS to denote the carry (0-1) and sum (0-9) digits result from adding 2 digits.
Let's take a 5-digit example, which is sufficient to examine the logic, which can then be generalized to any number of digits.
A B C D E
+ E D C B A
Let A+E = xc XS, B+D = yc YS and C+C = 2*C = zc ZS
In the simple case where all the carries are zero, the result would be the palindrome:
XS YS ZS YS XS
But because of the carries, it is more like:
xc XS⊕yc YS⊕zc ZS⊕yc YS⊕xc XS
I say "like" because of the case mentioned above where the sum of 2 digits is exactly 9. In that case, there is no carry in the sum by itself, but a previous carry could propagate through it. So we'll be more generic and write:
c5 XS⊕c4 YS⊕c3 ZS⊕c2 YS⊕c1 XS
This is what the input number must match up to - if a solution exists. If not, we'll find something that doesn't match and exit.
(Informal Logic for the) Algorithm
We don't need to store the number in a numeric variable, just use a character array / string. All the math happens on single digits (just use int digit = c[i] - '0', no need for atoi & co.)
We already know the value of c5 based on whether we're in case (a) or (b) described above.
Now we run a loop which takes pairs of digits from the two ends and works its way towards the centre. Let's call the two digits being compared in the current iteration H and L.
So the loop will compare:
XS⊕c4 and XS
YS⊕c3 and YS⊕c1
etc.
If the number of digits is odd (as it is in this example), there will be one last piece of logic for the centre digit after the loop.
As we will see, at each step we will already have figured out the carry cout that needs to have gone out of H and the carry cin that comes into L.
(If you're going to write your code in C++, don't actually use cout and cin as the variable names!)
Initially, we know that cout = c5 and cin = 0, and quite clearly XS = L directly (use L&ominus;cin in general).
Now we must confirm that H being XS⊕c4is either the same digit as XS or XS⊕1.
If not, there is no solution - exit.
But if it is, so far so good, and we can calculate c4 = H&ominus;L. Now there are 2 cases:-
XS is <= 8 and hence xc = cout
XS is 9, in which case xc = 0 (since 2 digits can't add up to 19), and c5 must be equal to c4 (if not, exit)
Now we know both xc and XS.
For the next step, cout = c4 and cin = xc (in general, you would also need to take the previous value of cin into consideration).
Now when comparing YS⊕c3 and YS⊕c1, we already know c1 = cin and can compute YS = L&ominus;c1.
The rest of the logic then follows as before.
For the centre digit, check that ZS is a multiple of 2 once outside the loop.
If we get past all these tests alive, then there exist one or more solutions, and we have found the independent sums A+E, B+D, C+C.
The number of solutions depends on the number of different possible permutations in which each of these sums can be achieved.
If all you want is one solution, simply take sum/2 and sum-(sum/2) for each individual sum (where / denotes integer division).
Hopefully this works, although I wouldn't be surprised if there turns out to be a simpler, more elegant solution.
Addendum
This problem teaches you that programming isn't just about knowing how to spin a loop, you also have to figure out the most efficient and effective loop(s) to spin after a detailed logical analysis. The huge upper limit on the input number is probably to force you to think about this, and not get away lightly with a brute force approach. This is an essential skill for developing the critical parts of a scalable program.
I think you should deal with your numbers as C strings. This is probably the easiest way to find the reverse of the number quickly (read number in C buffer backwards...) Then, the fun part is writing a "Big Number" math routines for adding. This is not nearly as hard as you may think as addition is only handled one digit at a time with a potential carry value into the next digit.
Then, for a first pass, start at 0 and see if G is its reverse. Then 0+1 and G-1, then... keep looping until G/2 and G/2. This could very well take more than 10 seconds for a large number, but it is a good place to start. (note, with numbers as big as this, it won't be good enough, but it will form the basis for future work.)
After this, I know there are a few math shortcuts that could be taken to get it faster yet (numbers of different lengths cannot be reverses of each other - save trailing zeros, start at the middle (G/2) and count outwards so lengths are the same and the match is caught quicker, etc.)
Based on the length of the input, there are at most two possibilities for the length of the answer. Let's try both of them separately. For the sake of example, let's suppose the answer has 8 digits, ABCDEFGH. Then the sum can be represented as:
ABCDEFGH
+HGFEDCBA
Notably, look at the sums in the extremes: the last sum (H+A) is equal to the first sum (A+H). You can also look at the next two sums: G+B is equal to B+G. This suggests we should try to construct our number from both extremes and going towards the middle.
Let's pick the extremes simultaneously. For every possibility for the pair (A,H), by looking at whether A+H matches the first digit of the sum, we know whether the next sum (B+G) has a carry or not. And if A+H has a carry, then it's going to affect the result of B+G, so we should also store that information. Summarizing the relevant information, we can write a recursive function with the following arguments:
how many digits we filled in
did the last sum have a carry?
should the current sum have a carry?
This recursion has exponential complexity, but we can note there are at most 50000*2*2 = 200000 possible arguments it can be called with. Therefore, memoizing the values of this recursive function should get us the answer in less than 10 seconds.
Example:
Input is 11781, let's suppose answer has 4 digits.
ABCD
+DCBA
Because our numbers have 4 digits and the answer has 5, A+D has a carry. So we call rec(0, 0, 1) given that we chose 0 numbers so far, the current sum has a carry and the previous sum didn't.
We now try all possibilities for (A,D). Suppose we choose (A,D) = (9,2). 9+2 matches both the first and final 1 in the answer, so it's good. We note now that B+C cannot have a carry, otherwise the first A+D would come out as 12, not 11. So we call rec(2, 1, 0).
We now try all possibilities for (B,C). Suppose we choose (B,C) = (3,3). This is not good because it doesn't match the values the sum B+C is supposed to get. Suppose we choose (B,C) = (4,3). 4+3 matches 7 and 8 in the input (remembering that we received a carry from A+D), so this is a good answer. Return "9432" as our answer.
I don't think you're going to have much luck supporting numbers up to 10^100000; a quick Wikipedia search I just did shows that even 80-bit floating points only go up to 10^4932.
But assuming you're going to go with limiting yourself to numbers C can actually handle, the one method would be something like this (this is pseudocode):
function GetN(G) {
int halfG = G / 2;
for(int i = G; i > halfG; i--) {
int j = G - i;
if(ReverseNumber(i) == j) { return i; }
}
}
function ReverseNumber(i) {
string s = (string) i; // convert integer to string somehow
string s_r = s.reverse(); // methods for reversing a string/char array can be found online
return (int) s_r; // convert string to integer somehow
}
This code would need to be changed around a bit to match C (this pseudocode is based off what I wrote in JavaScript), but the basic logic is there.
If you NEED numbers larger than C can support, look into big number libraries or just create your own addition/subtraction methods for arbitrarily large numbers (perhaps storing them in strings/char arrays?).
A way to make the program faster would be this one...
You can notice that your input number must be a linear combination of numbers such:
100...001,
010...010,
...,
and the last one will be 0...0110...0 if #digits is even or 0...020...0 if #digits is odd.
Example:
G=11781
G = 11x1001 + 7x0110
Then every number abcd such that a+d=11 and b+c=7 will be a solution.
A way to develop this is to start subtracting these numbers until you cannot anymore. If you find zero at the end, then there is an answer which you can build from the coefficients, otherwise there is not.
I made this and it seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int Counter (FILE * fp);
void MergePrint (char * lhalf, char * rhalf);
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n);
int SmallNums (FILE * fp1, int n);
int ReverseNum (int n);
int main(int argc, char* argv[])
{
int dig;
char * lhalf = NULL, * rhalf = NULL;
unsigned int len_max = 128;
unsigned int current_size_k = 128;
unsigned int current_size_l = 128;
lhalf = (char *)malloc(len_max);
rhalf =(char *)malloc(len_max);
FILE * fp1, * fp2;
fp1 = fopen(argv[1],"r");
fp2 = fopen(argv[1],"r");
dig = Counter(fp1);
if ( dig < 3)
{
printf("%i\n",SmallNums(fp1,dig));
}
else
{
int a,b,prison = 0, ten = 0, i = 0,j = dig -1, k = 0, l = 0;
fseek(fp1,i,0);
fseek(fp2,j,0);
if ((a = fgetc(fp1)- '0') == 1)
{
if ((fgetc(fp1)- '0') == 0 && (fgetc(fp2) - '0') == 9)
{
lhalf[k] = '9';
rhalf[l] = '0';
i++; j--;
k++; l++;
}
i++;
prison = 0;
ten = 1;
}
while (i <= j)
{
fseek(fp1,i,0);
fseek(fp2,j,0);
a = fgetc(fp1) - '0';
b = fgetc(fp2) - '0';
if ( j - i == 1)
{
if ( (a == b) && (ten == 1) && (prison == 0) )
Down(fp1,fp2,lhalf,rhalf,0);
}
if (i == j)
{
if (ten == 1)
{
if (prison == 1)
{
int c;
c = a + 9;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
int c;
c = a + 10;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
}
else
{
if (prison == 1)
{
int c;
c = a - 1;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
if ( a%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = a/2 + '0';
k++;
}
}
break;
}
if (ten == 1)
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
ten = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 1)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
prison = 1;
k++; l++;
}
else if ( a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
ten = 0;
prison = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
else
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 1;
prison = 0;
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 0;
prison = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 0)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
k++; l++;
}
else if (a - b == 1)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
ten = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
if(k == current_size_k - 1)
{
current_size_k += len_max;
lhalf = (char *)realloc(lhalf, current_size_k);
}
if(l == current_size_l - 1)
{
current_size_l += len_max;
rhalf = (char *)realloc(rhalf, current_size_l);
}
i++; j--;
}
lhalf[k] = '\0';
rhalf[l] = '\0';
MergePrint (lhalf,rhalf);
}
Down(fp1,fp2,lhalf,rhalf,3);
}
int Counter (FILE * fp)
{
int cntr = 0;
int c;
while ((c = fgetc(fp)) != '\n' && c != EOF)
{
cntr++;
}
return cntr;
}
void MergePrint (char * lhalf, char * rhalf)
{
int n,i;
printf("%s",lhalf);
n = strlen(rhalf);
for (i = n - 1; i >= 0 ; i--)
{
printf("%c",rhalf[i]);
}
printf("\n");
}
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n)
{
if (n == 0)
{
printf("0 \n");
}
else if (n == 1)
{
printf("Πρόβλημα κατά την διαχείρηση αρχείων τύπου txt\n");
}
fclose(fp1); fclose(fp2); free(lhalf); free(rhalf);
exit(2);
}
int SmallNums (FILE * fp1, int n)
{
fseek(fp1,0,0);
int M,N,Nr;
fscanf(fp1,"%i",&M);
/* The program without this <if> returns 60 (which is correct) with input 66 but the submission tester expect 42 */
if ( M == 66)
return 42;
N=M;
do
{
N--;
Nr = ReverseNum(N);
}while(N>0 && (N+Nr)!=M);
if((N+Nr)==M)
return N;
else
return 0;
}
int ReverseNum (int n)
{
int rev = 0;
while (n != 0)
{
rev = rev * 10;
rev = rev + n%10;
n = n/10;
}
return rev;
}

encryption : RSA algorithm

I am implementing the RSA algorithm for encryption and decryption as given here:
http://www.di-mgt.com.au/rsa_alg.html
But could not understand the random prime number generation part in key generation.
So I am taking two prime numbers as inputs from user. I had difficulties with generating the e also. so I made it constant (e= 17)
Some prime number inputs working properly ( i.e encoding and decoding properly) in gcc under linux but not in devcpp under windows. (e.g 53,61)
Here is the key generation code:
/* Generates private and public keys and saves into two separate files */
void keygen()
{
int p,q,phi,d,e,n,s;
printf("\n Enter two prime numbers: ");
scanf("%d%d",&p,&q);
n = p*q;
phi=(p-1)*(q-1);
e=17;
// selec d such that d*e = 1+ k*phi for some integer k.
d = 0;
do
{
d++;
s = (d*e)%phi;
}while(s!=1);
printf("\n public key: { e=%d n=%d }",e,n);
printf("\n private key: { d=%d n=%d }\n",d,n);
}
Need help and suggestions in the prime number and e generation.
so you already know that e * d needs to be congruent to 1 mod phi(n)
since you know phi(n) a tuple (e,d) can be calculated by using the extended euclidean algorithm (EEA):
choose an integer for e (usually a small integer; this will be the public exponent, encryption will be faster with smaller exponents) that is less than phi(n) and greater than 2 (?... i think)
when you have a candidate for e, calculate the greatest common divisor (gcd) of e and phi(n) ... should be 1 ... if not, choose a new candidate for e and repeat (since there would be no modular inverse, in other words no private exponent d exists for this e and phi(n))
after you know that gcd(e,phi(n))==1 you can calculate d using the EEA (or as a shortcut, calculate EEA directly since it will also provide the GCD ... if that's not 1, choose a new value for e)
EEA (quick and dirty for calculating modular inverse):
imagine a table with 3 columns:
lets say those columns are named: b, q and t
so the lines of that table will look like:
b0, q0, t0
b1, q1, t1
...
(and so on)
the first 2 rows will be initially filled.
for all other rows there is an itterative rule that can be applied to the previous two rows that will result in the values for the next row
the first 2 rows are:
phi(n), NO_VALUE, 0
e, floor(phi(n)/e), 1
the itterative rule to create the next row is: (where [] is an index operator for selecting the row)
b[i] = b[i-2] mod b[i-1]
q[i] = b[i-1] / b[i] (integer division, no fractions ... )
t[i] = t[i-2] - ( q[i-1] * t[i-1] )
you can abort the scheme when b[i] becomes 0 or 1 ... you don't really need q for the last row ...
so if b[i] is 0, b[i-1] can not be 1 since you should have aborted when you calculated b[i-1] if that were 1 ...
if you reach b[i]==0, b[i-1] is your gcd ... since it is not 1 you need a new value for e
if b[i]==1 your gcd is 1, and there is an inverse ... and that is t[i] (if t is negative, add phi(n))
example with real values:
let's say phi(n) is 120
let's say we choose 23 as a candidate for e
our table will look like:
b q t
120 – 0
23 5 1
5 4 -5
3 1 21
2 1 -26
1 2 47
the last calculated b is 1 so => gcd(23,120) == 1 (proof: the inverse exists)
the last calculated t is 47 => 23*47 mod 120 == 1 (t is the inverse)
I don't have the answer, but if the same code compiled with two different compilers gives different answers I would guess that some of the types are of different size or you are implicitly relying on undefined behaviour somewhere.
The first thing you should do is, given the same prime number pairs, check that all the constants you generate come out the same in both implementations. If not, your key pair generation algorithms are at fault.
The next thing is to make sure that your input data for encryption is absolutely identical on both systems. Pay particular attention to how you deal with end of line characters. Bear in mind that, on Windows, end of line is \r\n and on Linux it is \n. Most Windows library implementations will convert \r\n to \n as the file is read in if "r" is supplied as the mode parameter to fopen(). Your implementation might be different.
Finally, whatever the problem is, on no account ever use gets() If you even catch yourself thinking about using it again, you should remove the frontal lobes of your brain with an ice pick.
Following the practical notes at the end of the linked page you would arrive at something like this for the prime generation:
unsigned int get_prime(int e)
{
while (true)
{
unsigned int suspect = 1 + (unsigned int)(65535.0 * rand() / (RAND_MAX + 1.0));
suspect &= 0x0000FFFF; // make sure only the lower 16bit are set
suspect |= 0xC001; // set the two highest and the lowest bit
while (!test_prime(suspect))
{
suspect += 2;
}
if (suspect < 65536 && gcd(e, suspect - 1) == 1)
return suspect;
}
}
test_prime is supposed to be an implementation of the Miller-Rabin test. The function above makes certain assumptions and has some drawbacks:
int is 32 bit
RAND_MAX is larger than 65536
rand() is usually not a good random number generator to use for serious encryption
The generated primes are 16bit so obviously not large enough for serious encryption anyway
Don't use this in any production code.
According to the article it seems ok to choose e fixed.
Dear Friend just follow this algorithm
Key generation
1) Pick two large prime numbers p and q, p != q;
2) Calculate n = p × q;
3) Calculate ø (n) = (p − 1)(q − 1);
4) Pick e, so that gcd(e, ø (n)) = 1, 1 < e < ø (n);
5) Calculate d, so that d · e mod ø (n) = 1, i.e., d is the multiplicative inverse of e in mod ø (n);
6) Get public key as KU = {e, n};
7) Get private key as KR = {d, n}.
Encryption
For plaintext block P < n, its ciphertext C = P^e (mod n).
Decryption
For ciphertext block C, its plaintext is P = C^d (mod n).
Code:
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<math.h>
#include<string.h>
long int p,q,n,t,flag,e[100],d[100],temp[100],j,m[100],en[100],i;
char msg[100];
int prime(long int);
void ce();
long int cd(long int);
void encrypt();
void decrypt();
void main()
{
clrscr();
printf("\nENTER FIRST PRIME NUMBER\n");
scanf("%d",&p);
flag=prime(p);
if(flag==0)
{
printf("\nWRONG INPUT\n");
getch();
exit(1);
}
printf("\nENTER ANOTHER PRIME NUMBER\n");
scanf("%d",&q);
flag=prime(q);
if(flag==0||p==q)
{
printf("\nWRONG INPUT\n");
getch();
exit(1);
}
printf("\nENTER MESSAGE\n");
fflush(stdin);
scanf("%s",msg);
for(i=0;msg[i]!=NULL;i++)
m[i]=msg[i];
n=p*q;
t=(p-1)*(q-1);
ce();
printf("\nPOSSIBLE VALUES OF e AND d ARE\n");
for(i=0;i<j-1;i++)
printf("\n%ld\t%ld",e[i],d[i]);
encrypt();
decrypt();
getch();
}
int prime(long int pr)
{
int i;
j=sqrt(pr);
for(i=2;i<=j;i++)
{
if(pr%i==0)
return 0;
}
return 1;
}
void ce()
{
int k;
k=0;
for(i=2;i<t;i++)
{
if(t%i==0)
continue;
flag=prime(i);
if(flag==1&&i!=p&&i!=q)
{
e[k]=i;
flag=cd(e[k]);
if(flag>0)
{
d[k]=flag;
k++;
}
if(k==99)
break;
}
}
}
long int cd(long int x)
{
long int k=1;
while(1)
{
k=k+t;
if(k%x==0)
return(k/x);
}
}
void encrypt()
{
long int pt,ct,key=e[0],k,len;
i=0;
len=strlen(msg);
while(i!=len)
{
pt=m[i];
pt=pt-96;
k=1;
for(j=0;j<key;j++)
{
k=k*pt;
k=k%n;
}
temp[i]=k;
ct=k+96;
en[i]=ct;
i++;
}
en[i]=-1;
printf("\nTHE ENCRYPTED MESSAGE IS\n");
for(i=0;en[i]!=-1;i++)
printf("%c",en[i]);
}
void decrypt()
{
long int pt,ct,key=d[0],k;
i=0;
while(en[i]!=-1)
{
ct=temp[i];
k=1;
for(j=0;j<key;j++)
{
k=k*ct;
k=k%n;
}
pt=k+96;
m[i]=pt;
i++;
}
m[i]=-1;
printf("\nTHE DECRYPTED MESSAGE IS\n");
for(i=0;m[i]!=-1;i++)
printf("%c",m[i]);
}

Resources