Related
I'm pretty new to C, and I'm trying to write a function that takes a user input RAM size in B, kB, mB, or gB, and determines the address length. My test program is as follows:
int bitLength(char input[6]) {
char nums[4];
char letters[2];
for(int i = 0; i < (strlen(input)-1); i++){
if(isdigit(input[i])){
memmove(&nums[i], &input[i], 1);
} else {
//memmove(&letters[i], &input[i], 1);
}
}
int numsInt = atoi(nums);
int numExponent = log10(numsInt)/log10(2);
printf("%s\n", nums);
printf("%s\n", letters);
printf("%d", numExponent);
return numExponent;
}
This works correctly as it is, but only because I have that one line commented out. When I try to alter the 'letters' character array with that line, it changes the 'nums' character array to '5m2'
My string input is '512mB'
I need the letters to be able to tell if the user input is in B, kB, mB, or gB.
I am confused as to why the commented out line alters the 'nums' array.
Thank you.
In your input 512mB, "mB" is not digit and is supposed to handled in commented code. When handling those characters, i is 3 and 4. But because length of letters is only 2, when you execute memmove(&letters[i], &input[i], 1);, letters[i] access out of bounds of array so it does undefined behaviour - in this case, writing to memory of nums array.
To fix it, you have to keep unique index for letters. Or better, for both nums and letters since i is index of input.
There are several problems in your code. #MarkSolus have already pointed out that you access letters out-of-bounds because you are using i as index and i can be more than 1 when you do the memmove.
In this answer I'll address some of the other poroblems.
string size and termination
Strings in C needs a zero-termination. Therefore arrays must be 1 larger than the string you expect to store in the array. So
char nums[4]; // Can only hold a 3 char string
char letters[2]; // Can only hold a 1 char string
Most likely you want to increase both arrays by 1.
Further, your code never adds the zero-termination. So your strings are invalid.
You need code like:
nums[some_index] = '\0'; // Add zero-termination
Alternatively you can start by initializing the whole array to zero. Like:
char nums[5] = {0};
char letters[3] = {0};
Missing bounds checks
Your loop is a for-loop using strlen as stop-condition. Now what would happen if I gave the input "123456789BBBBBBBB" ? Well, the loop would go on and i would increment to values ..., 5, 6, 7, ... Then you would index the arrays with a value bigger than the array size, i.e. out-of-bounds access (which is real bad).
You need to make sure you never access the array out-of-bounds.
No format check
Now what if I gave an input without any digits, e.g. "HelloWorld" ? In this case nothin would be written to nums so it will be uninitialized when used in atoi(nums). Again - real bad.
Further, there should be a check to make sure that the non-digit input is one of B, kB, mB, or gB.
Performance
This is not that important but... using memmove for copy of a single character is slow. Just assign directly.
memmove(&nums[i], &input[i], 1); ---> nums[i] = input[i];
How to fix
There are many, many different ways to fix the code. Below is a simple solution. It's not the best way but it's done like this to keep the code simple:
#define DIGIT_LEN 4
#define FORMAT_LEN 2
int bitLength(char *input)
{
char nums[DIGIT_LEN + 1] = {0}; // Max allowed number is 9999
char letters[FORMAT_LEN + 1] = {0}; // Allow at max two non-digit chars
if (input == NULL) exit(1); // error - illegal input
if (!isdigit(input[0])) exit(1); // error - input must start with a digit
// parse digits (at max 4 digits)
int i = 0;
while(i < DIGITS && isdigit(input[i]))
{
nums[i] = input[i];
++i;
}
// parse memory format, i.e. rest of strin must be of of B, kB, mB, gB
if ((strcmp(&input[i], "B") != 0) &&
(strcmp(&input[i], "kB") != 0) &&
(strcmp(&input[i], "mB") != 0) &&
(strcmp(&input[i], "gB") != 0))
{
// error - illegal input
exit(1);
}
strcpy(letters, &input[i]);
// Now nums and letter are ready for further processing
...
...
}
}
Can someone explain to me how the calculation works?
what I don't understand is:
the getch(); function, what does that function does?
2.
Can someone explain to me how the int decimal_binary(int n) operates mathematically?
#include<stdio.h>
int decimal_binary (int n);
void main()
{
int n;
printf("Enter decimal number: ");
scanf("%d", &n);
printf("\n%d", decimal_binary(n));
getch();
}
int decimal_binary(int n)
{
int rem, i = 1, binary = 0;
while(n!=0)
{
rem = n % 2;
n = n/2;
binary = binary + rem*i;
i = i*10;
}
return binary;
}
if for example the n = 10
and this is how i calculate it
I'm not going to explain the code in the question, because I fundamentally (and rather vehemently) disagree with its implementation.
When we say something like "convert a number to base 2", it's useful to understand that we are not really changing the number. All we're doing is changing the representation. An int variable in a computer program is just a number (although deep down inside it's already in binary). The base matters when we print the number out as a string of digit characters, and also when we read it from as a string of digit characters. So any sensible "convert to base 2" function should have as its output a string, not an int.
Now, when you want to convert a number to base 2, and in fact when you want to convert to base b, for any base "b", the basic idea is to repeatedly divide by b.
For example, if we wanted to determine the base-10 digits of a number, it's easy. Consider the number 12345. If we divide it by 10, we get 1234, with a remainder of 5. That remainder 5 is precisely the last digit of the number 12345. And the remaining digits are 1234. And then we can repeat the procedure, dividing 1234 by 10 to get 123 remainder 4, etc.
Before we go any further, I want you to study this base-10 example carefully. Make sure you understand that when we split 12345 up into 1234 and 5 by dividing it by 10, we did not just look at it with our eyes and pick off the last digit. The mathematical operation of "divide by 10, with remainder" really did do the splitting up for us, perfectly.
So if we want to determine the digits of a number using a base other than 10, all we have to do is repeatedly divide by that other base. Suppose we're trying to come up with the binary representation of eleven. If we divide eleven by 2, we get five, with a remainder of 1. So the last bit is going to be 1.
Next we have to work on five. If we divide five by 2, we get two, with a remainder of 1. So the next-to-last bit is going to be 1.
Next we have to work on two. If we divide two by 2, we get one, with a remainder of 0. So the next bit is going to be 0.
Next we have to work on one. If we divide one by 2, we get zero, with a remainder of 1. So the next bit is going to be 1.
And now we have nothing left to work with -- the last division has resulted in 0. The binary bits we've picked off were, in order, 1, 1, 0, and 1. But we picked off the last bit first. So rearranging into conventional left-to-right order, we have 1011, which is the correct binary representation of the number eleven.
So with the theory under our belt, let's look at some actual C code to do this. It's perfectly straightforward, except for one complication. Since the algorithm we're using always gives us the rightmost bit of the result first, we're going to have to do something special in order to end up with the bits in conventional left-to-right order in the final result.
I'm going to write the new code as function, sort of like your decimal_binary. This function will accept an integer, and return the binary representation of that integer as a string. Because strings are represented as arrays of characters in C, and because memory allocation for arrays can be an issue, I'm going to also have the function accept an empty array (passed by the caller) to build the return string in. And I'm also going to have the function accept a second integer giving the size of the array. That's important so that the function can make sure not to overflow the array.
If it's not clear from the explanation so far, here's what a call to the new function is going to look like:
#include <stdio.h>
char *integer_binary(int n, char *str, int sz);
int main()
{
int n;
char result[40];
printf("Enter decimal number: ");
scanf("%d", &n);
char *str = integer_binary(n, result, 40);
printf("%s\n", str);
}
As I said, the new function, integer_binary, is going to create its result as a string, so we have to declare an array, result, to hold that string. We're declaring it as size 40, which should be plenty to hold any 32-bit integer, with some left over.
The new function returns a string, so we're printing its return value using %s.
And here's the implementation of the integer_binary function. It's going to look a little scary at first, but bear with me. At its core, it's using the same algorithm as the original decimal_binary function in the question did, repeatedly dividing by 2 to pick off the bits of the binary number being generated. The differences have to do with constructing the result in a string instead of an int. (Also, it's not taking care of quite everything yet; we'll get to one or two more improvements later.)
char *integer_binary(int n, char *binary, int sz)
{
int rem;
int j = sz - 2;
do {
if(j < 0) return NULL;
rem = n % 2;
n = n / 2;
binary[j] = '0' + rem;
j--;
} while(n != 0);
binary[sz-1] = '\0';
return &binary[j+1];
}
You can try that, and it will probably work for you right out of the box, but let's explain the possibly-confusing parts.
The new variable j keeps track of where in the array result we're going to place the next bit value we compute. And since the algorithm generates bits in right-to-left order, we're going to move j backwards through the array, so that we stuff new bits in starting at the end, and move to the left. That way, when we take the final string and print it out, we'll get the bits in the correct, left-to-right order.
But why does j start out as sz - 2? Partly because arrays in C are 0-based, partly to leave room for the null character '\0' that terminates arrays in C. Here's a picture that should make things clearer. This will be the situation after we've completely converted the number eleven:
0 1 2 31 32 33 34 35 36 37 38 39
+---+---+---+-- ~ --+---+---+---+---+---+---+---+---+---+
result: | | | | ... | | | | | 1 | 0 | 1 | 1 |\0 |
+---+---+---+-- ~ --+---+---+---+---+---+---+---+---+---+
^ ^ ^ ^
| | | |
binary final return initial
j value j
The result array in the caller is declared as char result[40];, so it has 40 elements, from 0 to 39. And sz is passed in as 40. But if we want j to start out "at the right edge" of the array, we can't initialize j to sz, because the leftmost element is 39, not 40. And we can't initialize j as sz - 1, either, because we have to leave room for the terminating '\0'. That's why we initialize j to sz - 2, or 38.
The next possibly-confusing aspect of the integer_binary function is the line
binary[j] = '0' + rem;
Here, rem is either 0 or 1, the next bit of our binary conversion we've converted. But since we're creating a string representation of the binary number, we want to fill the binary result in with one of the characters '0' or '1'. But characters in C are represented by tiny integers, and you can do arithmetic on them. The constant '0' is the value of the character 0 in the machine's character set (typically 48 in ASCII). And the bottom line is that '0' + 1 turns into the character '1'. So '0' + rem turns into '0' if rem is 0, or '1' if rem is 1.
Next to talk about is the loop I used. The original decimal_binary function used while(n != 0) {...}, but I'm using do { ... } while(n != 0). What's the difference? It's precisely that the do/while loop always runs once, even if the controlling expression is false. And that's what we want here, so that the number 0 will be converted to the string "0", not the empty string "". (That wasn't an issue for integer_binary, because it returned the integer 0 in that case, but that was a side effect of its otherwise-poor choice of int as its return value.)
Next we have the line
binary[sz-1] = '\0';
We've touched on this already: it simply fills in the necessary null character which terminates the string.
Finally, there's the last line,
return &binary[j+1];
What's going on there? The integer_binary function is supposed to return a string, or in this case, a pointer to the first character of a null-terminated array of characters. Here we're returning a pointer (generated by the & operator) to the element binary[j+1] in the result array. We have to add one to j because we always subtract 1 from it in the loop, so it always indicates the next cell in the array where we'd store the next character. But we exited the loop because there was no next character to generate, so the last character we did generate was at j's previous value, which is j+1.
(This integer_binary function is therefore mildly unusual in one respect. The caller passes in an empty array, and the function builds its result string in the empty array, but the pointer it returns, which points to the constructed string, does not usually point to the beginning of the passed-in array. It will work fine as long as the caller uses the returned pointer, as expected. But it's unusual, and the caller would get confused if accidentally using its own original result array as if it would contain the result.)
One more thing: that line if(j < 0) return NULL; at the top of the loop is a double check that the caller gave us a big enough array for the result we're generating. If we run out of room for the digits we're generating, we can't generate a correct result, so we return a null pointer instead. (That's likely to cause problems in the caller unless explicitly checked for, but that's a story for another day.)
So integer_binary as discussed so far will work, although I'd like to make three improvements to address some remaining deficiencies:
The decimal_binary function as shown won't handle negative numbers correctly.
The way the decimal_binary function uses the j variable is a bit clumsy. (Evidence of the clumsiness is the fact that I had to expend so many words explaining the j = sz-2 and return &binary[j+1] parts.)
The decimal_binary functions as shown only handles, obviously, binary, but what I really want (although you didn't ask for it) is a function that can convert to any base.
So here's an improved version. Based on the integer_binary function we've already seen, there are just a few small steps to achieve the desired improvements. I'm calling the new function integer_base, because it converts to any base (well, any base up to 10, anyway). Here it is:
char *integer_base(int n, int base, char *result, int sz)
{
int rem;
int j = sz - 1;
int negflag = 0;
if(n < 0) {
n = -n;
negflag = 1;
}
result[j] = '\0';
do {
j--;
if(j < 0) return NULL;
rem = n % base;
n = n / base;
result[j] = '0' + rem;
} while(n != 0);
if(negflag) {
j--;
result[j] = '-';
}
return &result[j];
}
As mentioned, this is just like integer_binary, except:
I've changed the way j is used. Before, it was always the index of the next element of the result array we were about to fill in. Now, it's always one to the right of the next element we're going to fill in. This is a less obvious choice, but it ends up being more convenient. Now, we initialize j to sz-1, not sz-2. Now, we do the decrement j-- before we fill in the next character of the result, not after. And now, we can return &binary[j], without having to remember to subtract 1 at that spot.
I've moved the insertion of the terminating null character '\0' up to the top. Since we're building the whole string right-to-left, it makes sense to put the terminator in first.
I've handled negative numbers, in a kind of brute-force but expedient way. If we receive a negative number, we turn it into a positive number (n = -n) and use our regular algorithm on it, but we set a flag negflag to remind us that we've done so and, when we're all done, we tack a '-' character onto the beginning of the string.
Finally, and this is the biggie, the new function works in any base. It can create representations in base 2, or base 3, or base 5, or base 7, or any base up to 10. And what's really neat is how few modifications were required in order to achieve this. In fact, there were just two: In two places where I had been dividing by 2, now I'm dividing by base. That's it! This is the realization of something I said back at the very beginning of this too-long answer: "The basic idea is to repeatedly divide by b."
(Actually, I lied: There was a fourth change, in that I renamed the result parameter from "binary" to "result".)
Although you might be thinking that this integer_base function looks pretty good, I have to admit that it still has at least three problems:
It won't work for bases greater than 10.
It can occasionally overflow its result buffer.
It has an obscure problem when trying to convert the largest negative number.
The reason it only works for bases up to 10 is the line
result[j] = '0' + rem;
This line only knows how to create ordinary digits in the result. For (say) base 16, it would also have to be able to create hexadecimal digits A - F. One quick but obfuscated way to achieve this is to replace that line with
result[j] = "0123456789ABCDEF"[rem];
This answer is too long already, so I'm not going to get into a side discussion on how this trick works.
The second problem is hiding in the lines I added to handle negative numbers:
if(negflag) {
j--;
result[j] = '-';
}
There's no check here that there's enough room in the result array for the minus sign. If the array was just barely big enough for the converted number without the minus sign, we'll hit this part of the code with j being 0, and we'll subtract 1 from it, and fill the minus sign in to result[-1], which of course doesn't exist.
Finally, on a two's complement machine, if you pass the most negative integer, INT_MIN, in to this function, it won't work. On a 16-bit 2's complement machine, the problem number is -32768. On a 32-bit machine, it's -2147483648. The problem is that +32768 can't be represented as a signed integer on a 16-bit machine, nor will +2147483648 fit in 32 signed bits. So a rewrite of some kind will be necessary in order to achieve a perfectly general function that can also handle INT_MIN.
In order to convert a decimal number to a binary number, there is a simple recursive algorithm to apply to that number (recursive = something that is repeated until something happen):
take that number and divide by 2
take the reminder
than repeat using as current number, the original number divided by 2 (take in account that this is a integer division, so 2,5 becomes 2) until that number is different to 0
take all the reminders and read from the last to the first, and that's the binary form of that number
What that function does is exactly this
take the number and divide it by 2
takes the reminder and add it in into the variable binary multiplied by and i that each time is multiplied by 10, in order to have the first reminder as the less important digit, and the last one as the most significant digit, that is the same of take all the reminders and read them from the last to the first
save as n the n/2
and than repeat it until the current number n is different to 0
Also getch() is sometimes used in Windows in order to hold the command prompt open, but is not that recommended
getchar() stops your program in console. Maths behind function looks like this:
n=7:
7%2=1; //rem=1
7/2=3; //n=3
binary=1;
next loop
n=3:
3%2=1;
3/2=1; //n=1;
binary=11 //1 + 1* 10
final loop
n=1:
1%2=1;
1/2=0; //n=0;
binary=111 //11+1*100
I'm tryring to solve this problem though using brute force I was able to solve it, but
the following optimised algo is giving me incorrect results for some of the testcases .I tried but couldn;t find the problem with the code can any body help me.
Problem :
Given a string S and and integer K, find the integer C which equals the number of pairs of substrings(S1,S2) such that S1 and S2 have equal length and Mismatch(S1, S2) <= K where the mismatch function is defined below.
The Mismatch Function
Mismatch(s1,s2) is the number of positions at which the characters in S1 and S2 differ. For example mismatch(bag,boy) = 2 (there is a mismatch in the second and third position), mismatch(cat,cow) = 2 (again, there is a mismatch in the second and third position), Mismatch(London,Mumbai) = 6 (since the character at every position is different in the two strings). The first character in London is ‘L’ whereas it is ‘M’ in Mumbai, the second character in London is ‘o’ whereas it is ‘u’ in Mumbai - and so on.
int main() {
int k;
char str[6000];
cin>>k;
cin>>str;
int len=strlen(str);
int i,j,x,l,m,mismatch,count,r;
count=0;
for(i=0;i<len-1;i++)
for(j=i+1;j<len;j++)
{ mismatch=0;
for(r=0;r<len-j+i;r++)
{
if(str[i+r]!=str[j+r])
{ ++mismatch;
if(mismatch>=k)break;
}
if(mismatch<=k)++count;
}
}
cout<<count;
return 0;
}
Sample test cases
Test case (passing for above code)
**input**
0
abab
**output**
3
Test case (failing for above code)
**input**
3
hjdiaceidjafcchdhjacdjjhadjigfhgchadjjjbhcdgffibeh
**expected output**
4034
**my output**
4335
You have two errors. First,
for(r=1;r<len;r++)
should be
for(r=1;r<=len-j;r++)
since otherwise,
str[j+r]
would at some point begin comparing characters past the null-terminator (i.e. beyond the end of the string). The greatest r can be is the remaining number of characters from the jth index to the last character.
Second, writing
str[i+r]
and
str[j+r]
skips the comparison of the ith and jth characters since r is always at least 1. You should write
for(r=0;r<len-j;r++)
You have two basic errors. You are quitting when mismatches>=k instead of mismatches>k (mismatches==k is an acceptable number) and you are letting r get too large. These skew the final count in opposite directions but, as you see, the second error "wins".
The real inner loop should be:
for (r=0; r<len-j; ++r)
{
if (str[i+r] != str[j+r])
{
++mismatch;
if (mismatch > k)
break;
}
++count;
}
r is an index into the substring, and j+r MUST be less than len to be valid for the right substring. Since i<j, if str[j+r] is valid, then so it str[i+r], so there's no need to have i involved in the upper limit calculation.
Also, you want to break on mismatch>k, not on >=k, since k mismatches are allowed.
Next, if you test for too many mismatches after incrementing mismatch, you don't have to test it again before counting.
Finally, the upper limit of r<len-j (instead of <=) means that the trailing '\0' character won't be compared as part of the str[j+r] substring. You were comparing that and more when j+r >= len, but mismatches was less than k when that first happened.
Note: You asked about a faster method. There is one, but the coding is more involved. Make the outer loop on the difference delta between starting index values. (0<delta<len) Then, count all acceptable matches with something like:
count = 0;
for delta = 1 to len-1
set i=0; j=delta; mismatches=0; r=0;
while j < len
.. find k'th mismatch, or end of str:
while mismatches < k and j+r<len
if str[i+r] != str[j+r] then mismatches=mismatches+1
r = r+1
end while
.. extend r to cover any trailing matches:
while j+r<len and str[i+r]==str[j+r]
r + r+1
end while
.. arrive here with r being the longest string pair starting at str[i]
.. and str[j] with no more than k mismatches. This loop will add (r)
.. to the count and advance i,j one space to the right without recounting
.. the character mismatches inside. Rather, if a mismatch is dropped off
.. the front, then mismatches is decremented by 1.
repeat
count = count + r
if str[i] != str[j] then mismatches=mismatches-1
i = i+1, j = j+1, r = r-1
until mismatches < k
end if
end while
That's pseudocode, and also pseudocorrect. The general idea is to compare all substrings with starting indices differing by (delta) in one pass, starting and the left, and increasing the substring length r until the end of the source string is reached or k+1 mismatches have been seen. That is, str[j+r] is either the end of the string, or the camel's-back-breaking mismatch position in the right substring. That makes r substrings that had k or fewer mismatches starting at str[i] and str[j].
So count those r substrings and move to the next positions i=i+1,j=j+1 and new length r=r-1, reducing the mismatch count if unequal characters were dropped off the left side.
It should be pretty easy to see that on each loop either r increases by 1 or j increases by 1 and (j+r) stays the same. Both will j and (j+r) will reach len in O(n) time, so the whole thing is O(n^2).
Edit: I fixed the handing of r, so the above should be even more pseudocorrect. The improvement to O(n^2) runtime might help.
Re-edit: Fixed comment bugs.
Re-re-edit: More typos in algorithm, mostly mismatches misspelled and incremented by 2 instead of 1.
#Mike I have some modifications in your logic and here is the correct code for it...
#include<iostream>
#include<string>
using namespace std;
int main()
{
long long int k,c=0;
string s;
cin>>k>>s;
int len = s.length();
for(int gap = 1 ; gap < len; gap ++)
{
int i=0,j=gap,mm=0,tmp_len=0;
while (mm <=k && (j+tmp_len)<len)
{
if (s[i+tmp_len] != s[j+tmp_len])
mm++;
tmp_len++;
}
// while (((j+tmp_len)<len) && (s[i+tmp_len]==s[j+tmp_len]))
// tmp_len++;
if(mm>k){tmp_len--;mm--;}
do{
c = c + tmp_len ;
if (s[i] != s[j]) mm--;
i++;
j++;
tmp_len--;
while (mm <=k && (j+tmp_len)<len)
{
if (s[i+tmp_len] != s[j+tmp_len])
mm++;
tmp_len++;
}
if(mm>k){tmp_len--;mm--;}
}while(tmp_len>0);
}
cout<<c<<endl;
return 0;
}
I really don't know how to implement this function:
The function should take a pointer to an integer, a pointer to an array of strings, and a string for processing. The function should write to array all variations of exchange 'ch' combination to '#' symbol and change the integer to the size of this array. Here is an example of processing:
choker => {"choker","#oker"}
chocho => {"chocho","#ocho","cho#o","#o#o"}
chachacha => {"chachacha","#achacha","cha#acha","chacha#a","#a#acha","cha#a#a","#acha#a","#a#a#a"}
I am writing this in C standard 99. So this is sketch:
int n;
char **arr;
char *string = "chacha";
func(&n,&arr,string);
And function sketch:
int func(int *n,char ***arr, char *string) {
}
So I think I need to create another function, which counts the number of 'ch' combinations and allocates memory for this one. I'll be glad to hear any ideas about this algorithm.
You can count the number of combinations pretty easily:
char * tmp = string;
int i;
for(i = 0; *tmp != '\0'; i++){
if(!(tmp = strstr(tmp, "ch")))
break;
tmp += 2; // Skip past the 2 characters "ch"
}
// i contains the number of times ch appears in the string.
int num_combinations = 1 << i;
// num_combinations contains the number of combinations. Since this is 2 to the power of the number of occurrences of "ch"
First, I'd create a helper function, e.g. countChs that would just iterate over the string and return the number of 'ch'-s. That should be easy, as no string overlapping is involved.
When you have the number of occurences, you need to allocate space for 2^count strings, with each string (apart from the original one) of length strlen(original) - 1. You also alter your n variable to be equal to that 2^count.
After you have your space allocated, just iterate over all indices in your new table and fill them with copies of the original string (strcpy() or strncpy() to copy), then replace 'ch' with '#' in them (there are loads of ready snippets online, just look for "C string replace").
Finally make your arr pointer point to the new table. Be careful though - if it pointed to some other data before, you should think about freeing it or you'll end up having memory leaks.
If you would like to have all variations of replaced string, array size will have 2^n elements. Where n - number of "ch" substrings. So, calculating this will be:
int i = 0;
int n = 0;
while(string[i] != '\0')
{
if(string[i] == 'c' && string[i + 1] == 'h')
n++;
i++;
}
Then we can use binary representation of number. Let's note that incrementing integer from 0 to 2^n, the binary representation of i-th number will tell us, which "ch" occurrence to change. So:
for(long long unsigned int i = 0; i < (1 << n); i++)
{
long long unsigned int number = i;
int k = 0;
while(number > 0)
{
if(number % 2 == 1)
// Replace k-th occurence of "ch"
number /= 2;
k++;
}
// Add replaced string to array
}
This code check every bit in binary representation of number and changes k-th occurrence if k-th bit is 1. Changing k-th "ch" is pretty easy, and I leave it for you.
This code is useful only for 64 or less occurrences, because unsigned long long int can hold only 2^64 values.
There are two sub-problems that you need to solve for your original problem:
allocating space for the array of variations
calculating the variations
For the first problem, you need to find the mathematical function f that takes the number of "ch" occurrences in the input string and returns the number of total variations.
Based on your examples: f(1) = 1, f(2) = 4 and f(3) = 8. This should give you a good idea of where to start, but it is important to prove that your function is correct. Induction is a good way to make that proof.
Since your replace process ensures that the results have either the same of a lower length than the original you can allocate space for each individual result equal to the length of original.
As for the second problem, the simplest way is to use recursion, like in the example provided by nightlytrails.
You'll need another function which take the array you allocated for the results, a count of results, the current state of the string and an index in the current string.
When called, if there are no further occurrences of "ch" beyond the index then you save the result in the array at position count and increment count (so the next time you don't overwrite the previous result).
If there are any "ch" beyond index then call this function twice (the recurrence part). One of the calls uses a copy of the current string and only increments the index to just beyond the "ch". The other call uses a copy of the current string with the "ch" replaced by "#" and increments the index to beyond the "#".
Make sure there are no memory leaks. No malloc without a matching free.
After you make this solution work you might notice that it plays loose with memory. It is using more than it should. Improving the algorithm is an exercise for the reader.
I am trying to develop a program in C that will "crack" the crypt(3) encryption used by UNIX.
The most naive way to do it is brute forcing I guess. I thought I should create an array containing all the symbols a password can have and then get all possible permutations of them and store them in a two-dimensional array (where all the 1 character passwords get's saved in the first row etc.) through for loops. Is there any better way to do this? It's pretty messy with the loops.
Assuming only 62 different characters can be used, storing all the possible 8 character passwords requires 62^8=198 Terabytes.
To anwser you loop question, here is some code to loop over all the possible passwords of a given len, using the characters for a given set:
int len = 3;
char letters[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int nbletters = sizeof(letters)-1;
int main() {
int i, entry[len];
for(i=0 ; i<len ; i++) entry[i] = 0;
do {
for(i=0 ; i<len ; i++) putchar(letters[entry[i]]);
putchar('\n');
for(i=0 ; i<len && ++entry[i] == nbletters; i++) entry[i] = 0;
} while(i<len);
}
The main part is the last for loop. In most cases, it only increments the first entry, and stops there, as this entry has not reached nbletters. If the entry reaches nbletter, it means it has to return to zero, and it's the turn of the next entry to be incremented. It is indeed an unusual loop condition: the loop continues until there is no overflow. The looping only occurs in the worst case: when several entries are on the last element.
Imagine the case where the current word is "zzzc". In turn, each entry is incremented, its overflow is detected, it is reset to 0, and the next entry is considered, until the last entry which does not overflow, to give "000d".
As the commenters on the question point out - you don't have the necessary RAM, and you don't need to store it all.
Covering the permutations in sort sequence is not the most efficient approach to password cracking, although it will ultimately be effective.
An approach to achieving full coverage is to iterate 0 through the number of permutations and encode the value with the size of your character set as a base. This can be scaled to the size of your character set quite easily.
(pseudocode, but you get the idea)
passChars = '[all characters used in this attempt]'
permutationCount = 8^len(passChars) #crypt(3) only uses 8 chars
output = ''
for looper = 0 to permutationCount - 1
localTemp = looper
while localTemp > 0
output += passchars[localTemp%len(passchars)] # % being modulus
localTemp = floor(localTemp/len(passChars))