Need help with brute force code for crypt(3) - c

I am trying to develop a program in C that will "crack" the crypt(3) encryption used by UNIX.
The most naive way to do it is brute forcing I guess. I thought I should create an array containing all the symbols a password can have and then get all possible permutations of them and store them in a two-dimensional array (where all the 1 character passwords get's saved in the first row etc.) through for loops. Is there any better way to do this? It's pretty messy with the loops.

Assuming only 62 different characters can be used, storing all the possible 8 character passwords requires 62^8=198 Terabytes.
To anwser you loop question, here is some code to loop over all the possible passwords of a given len, using the characters for a given set:
int len = 3;
char letters[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int nbletters = sizeof(letters)-1;
int main() {
int i, entry[len];
for(i=0 ; i<len ; i++) entry[i] = 0;
do {
for(i=0 ; i<len ; i++) putchar(letters[entry[i]]);
putchar('\n');
for(i=0 ; i<len && ++entry[i] == nbletters; i++) entry[i] = 0;
} while(i<len);
}
The main part is the last for loop. In most cases, it only increments the first entry, and stops there, as this entry has not reached nbletters. If the entry reaches nbletter, it means it has to return to zero, and it's the turn of the next entry to be incremented. It is indeed an unusual loop condition: the loop continues until there is no overflow. The looping only occurs in the worst case: when several entries are on the last element.
Imagine the case where the current word is "zzzc". In turn, each entry is incremented, its overflow is detected, it is reset to 0, and the next entry is considered, until the last entry which does not overflow, to give "000d".

As the commenters on the question point out - you don't have the necessary RAM, and you don't need to store it all.
Covering the permutations in sort sequence is not the most efficient approach to password cracking, although it will ultimately be effective.
An approach to achieving full coverage is to iterate 0 through the number of permutations and encode the value with the size of your character set as a base. This can be scaled to the size of your character set quite easily.
(pseudocode, but you get the idea)
passChars = '[all characters used in this attempt]'
permutationCount = 8^len(passChars) #crypt(3) only uses 8 chars
output = ''
for looper = 0 to permutationCount - 1
localTemp = looper
while localTemp > 0
output += passchars[localTemp%len(passchars)] # % being modulus
localTemp = floor(localTemp/len(passChars))

Related

intermix command line strings in C

This is homework, so I am not looking for a direct answer I am more-so looking for the logic behind this. I do not believe the question is stated very well for novice C devs, and I cannot find any resources to help me out here. I am new to C much more a Java guy so this may seem totally and utterly noobish. The instructions are below
$ ./mixedupecho HELLO!
.H/EmLiLxOe!dHuEpLeLcOh!oH
*
For this program, you can ignore any command-line arguments beyond the first two (including the program name itself):
$ ./mixedupecho HELLO! morestuff lalala
.H/EmLiLxOe!dHuEpLeLcOh!oH
*
Notice how "HELLO!" is shorter than "./mixedupecho", and so the program "wraps around"
and starts over again at 'H'whenever it reaches the end of the string.
*
How can you implement that? The modulo % operator is your friend here.
Spcecifically, note that "HELLO!"[5] yields '!', and "HELLO!"[6] is beyond the bounds of the array.
But "HELLO!"[6 % 6] evaluates to "HELLO!"[0], which yields 'H'.
And "HELLO!"[7 % 6] evaluates to "HELLO!"[1] ...
Below is the code I have so far. This iterates through the every character of the argv string which I get. What I don't get is how to print it off so instead of the sequence [0][0], [0][1], [0][2]... I get [0][0], [1][0], [0][1]... etc.
Can someone take a crack at explaining this to me?
int main(int argc, string argv[])
{
for(int i = 0; i < argc; i++)
{
for(int j = 0, n = strlen(argv[i]); j < n; j++)
{
printf("%c", argv[i][j]);
}
}
printf("\n");
}
THANKS SO MUCH! THIS IS DRIVING ME INSANE!
You want to keep the index i incrementing until it is equal to the index of the null terminator in the longest string. Meanwhile, you'll use the % operator to ensure that i stays within the boundaries of the shorter string.
Here's how I'd do it:
Set the initial (unsigned) lengths to -1U to avoid calculating lengths unnecessarily. I'll use LIMIT for the rest of this example as if I did #define LIMIT -1U.
Iterate through the strings, checking to ensure that argv[N][i] % len[N] is not a null terminator. If it is a null terminator and len[N] == LIMIT, set len[N] = i.
When the expression len[0] != LIMIT && len[1] != LIMIT is true, the loop ends since both strings will have the correct length, meaning all characters in each string have been enumerated.
The only thing left is printing the character for each string, which I'm sure you can handle. I would have used 0 as the initial length, except that complicates things since you can't do x % 0. The reason for unsigned length is that -1U results in an unsigned int value (e.g. 4294967295 or 65535); plain -1 results in x % -1, which makes no sense because dividing by -1 yields no remainder.

How does this C for-loop print text-art pyramids?

This is my first time posting in here, hopefully I am doing it right.
Basically I need help trying to figure out some code that I wrote for class using C. The purpose of the program is to ask the user for a number between 0 and 23. Then, based on the number the user inputted, a half pyramid will be printed (like the ones in the old school Mario games). I am a beginner at programming and got the answer to my code by mere luck, but now I can't really tell how my for-loops provide the pyramid figure.
#include <stdio.h>
int main ( void )
{
int user_i;
printf ( "Hello there and welcome to the pyramid creator program\n" );
printf ( "Please enter a non negative INTEGER from 0 to 23\n" );
scanf ( "%d", &user_i );
while ( user_i < 0 || user_i > 23 )
{
scanf ( "%d", &user_i );
}
for ( int tall = 0; tall < user_i; tall++ )
{
// this are the two for loops that happened by pure magic, I am still
// trying to figure out why are they working they way they are
for ( int space = 0; space <= user_i - tall; space++ )
{
printf ( " " );
}
for ( int hash = 0; hash <= tall; hash++ )
{
printf ( "#" );
}
// We need to specify the printf("\n"); statement here
printf ( "\n" );
}
return 0;
}
Being new to programming I followed what little I know about pseudocode, I just can't seem to understand why the for-loop section is working the way it is. I understand the while loop perfectly (although corrections and best practices are welcomed), but the for-loop logic continues to elude me and I would like to fully understand this before moving on. Any help will be greatly appreciated.
I will explain the process of how I would come to understand this code to the point where I would be comfortable using it myself. I will pretend that I did not read you description, so I am starting from scratch. The process is divided into stages which I will number as I go. My goal will be to give some general techniques which will make programs easier to read.
Stage 1: Understand coarse structure
The first step will be to understand in general terms what the program does without getting bogged down in details. Let's start reading the body of the main function.
int main(void) {
int user_i;
printf("Hello there and welcome to the pyramid creator program\n");
printf("Please enter a non negative INTEGER from 0 to 23\n");
scanf("%d", &user_i);
So far, we have declared in integer, and told the user to enter a number, and then used the scanf function to set the integer equal to what the user entered. Unfortunately it is unclear from the prompt or the code what the purpose of the integer is. Let's keep reading.
while (user_i < 0 || user_i > 23) {
scanf("%d", &user_i);
}
Here we possibly ask the user to enter additional integers. Judging by the prompt, it seems like a good guess that the purpose of this statement is to make sure that our integer is in the appropriate range and this is easy to check by examining the code. Let's look at the next line
for (int tall = 0; tall < user_i; tall++) {
This is the outer for loop. Our enigmatic integer user_i appears again, and we have another integer tall which is going between 0 and user_i. Let's look at some more code.
for (int space = 0; space <= user_i - tall; space++) {
printf(" ");
}
This is the first inner for loop. Let's not get bogged down in the details of what is going on with this new integer space or why we have user_i - tall appearing, but let's just note that the body of the foor loop just prints a space. So this for loop just has the effect of printing a bunch of spaces. Let's look at the next inner for loop.
for (int hash = 0; hash <= tall; hash++) {
printf("#");
}
This one looks similar. It just prints a bunch of hashes. Next we have
printf("\n");
This prints a new line. Next is
}
return 0;
}
This means that the outer for loop ends, and after the outer for loop ends, the program ends.
Notice we have found two major pieces to the code. The first piece is where a value to user_i is obtained, and the second piece, the outer for loop, must be where this value is used to draw the pyramid. Next let's try to figure out what user_i means.
Stage 2: Discover the meaning of user_i
Now since a new line is printed for every iteration of the outer loop, and the enigmatic user_i controls how many iterations of the outer loop there are, and therefore how many new lines are printed, it would seem that user_i controls the height of the pyramid that is created. To get the exact relationship, lets assume that user_i had the value 3, then tall would take on the values 0,1, and 2, and so the loop would execute three times and the height of the pyramid would be three. Notice also that if user_i would increase by one, then the loop would execute once more and the pyramid would be higher by one. This means that user_i must be the height of the pyramid. Let's rename the variable to pyramidHeight before we forget. Our main function now looks like this:
int main(void) {
int pyramidHeight;
printf("Hello there and welcome to the pyramid creator program\n");
printf("Please enter a non negative INTEGER from 0 to 23\n");
scanf("%d", &pyramidHeight);
while (pyramidHeight < 0 || pyramidHeight > 23) {
scanf("%d", &pyramidHeight);
}
for (int tall = 0; tall < pyramidHeight; tall++) {
for (int space = 0; space <= pyramidHeight - tall; space++) {
printf(" ");
}
for (int hash = 0; hash <= tall; hash++) {
printf("#");
}
printf("\n");
}
return 0;
}
Stage 3: Make a function that gets the pyramid height
Since we understand the first part of the code, we can move it into a function and not think about it anymore. This will make the code easier to look at. Since this part of the code is responsible for obtaining a valid height, let's call the funcion getValidHeight. After doing this, notice the pyramid height will not change in the main method, so we can declare it as a const int. Our code now looks like this:
#include <stdio.h>
const int getValidHeight() {
int pyramidHeight;
printf("Hello there and welcome to the pyramid creator program\n");
printf("Please enter a non negative INTEGER from 0 to 23\n");
scanf("%d", &pyramidHeight);
while (pyramidHeight < 0 || pyramidHeight > 23) {
scanf("%d", &pyramidHeight);
}
return pyramidHeight;
}
int main(void) {
const int pyramidHeight = getValidHeight();
for (int tall = 0; tall < pyramidHeight; tall++) {
for (int space = 0; space <= pyramidHeight - tall; space++) {
printf(" ");
}
for (int hash = 0; hash <= tall; hash++) {
printf("#");
}
printf("\n");
}
return 0;
}
Stage 4: understand the inner for loops.
We know the inner for loops print a character repeatedly, but how many times? Let's consider the first inner for loop. How many spaces are printed? You might think that by analogy with the outer for loop there are pyramidHeight - tall spaces, but here we have space <= pyramidHeight - tall where the truly analogous situation would be space < pyramidHeight - tall. Since we have <= instead of <, we get an extra iteration where space equals pyramidHeight - tall. Thus we see that in fact pyramidHeight - tall + 1 spaces are printed. Similarly tall + 1 hashes are printed.
Stage 5: Move printing of multiple characters into their own functions.
Since printing a character multiple times is easy to understand, we can move this code into their own functions. Let's see what our code looks like now.
#include <stdio.h>
const int getValidHeight() {
int pyramidHeight;
printf("Hello there and welcome to the pyramid creator program\n");
printf("Please enter a non negative INTEGER from 0 to 23\n");
scanf("%d", &pyramidHeight);
while (pyramidHeight < 0 || pyramidHeight > 23) {
scanf("%d", &pyramidHeight);
}
return pyramidHeight;
}
void printSpaces(const int numSpaces) {
for (int i = 0; i < numSpaces; i++) {
printf(" ");
}
}
void printHashes(const int numHashes) {
for (int i = 0; i < numHashes; i++) {
printf("#");
}
}
int main(void) {
const int pyramidHeight = getValidHeight();
for (int tall = 0; tall < pyramidHeight; tall++) {
printSpaces(pyramidHeight - tall + 1);
printHashes(tall + 1);
printf("\n");
}
return 0;
}
Now when I look at the main function, I don't have to worry about the details of how printSpaces actually prints the spaces. I have already forgotten if it uses a for loop or a while loop. This frees my brain up to think about other things.
Stage 6: Introducing variables and choosing good names for variables
Our main function is now easy to read. We are ready to start thinking about what it actually does. Each iteration of the for loop prints a certain number of spaces followed by a certain number of hashes followed by a new line. Since the spaces are printed first, they will all be on the left, which is what we want to get the picture it gives us.
Since new lines are printed below old lines on the terminal, a value of zero for tall corresponds to the top row of the pyramid.
With these things in mind, let's introduce two new variables, numSpaces and numHashes for the number of spaces and hashes to be printed in an iteration. Since the value of these variables does not change in a single iteration, we can make them constants. Also let's change the name of tall (which is an adjective and hence a bad name for an integer) to distanceFromTop. Our new main method looks like this
int main(void) {
const int pyramidHeight = getValidHeight();
for (int distanceFromTop = 0; distanceFromTop < pyramidHeight; distanceFromTop++) {
const int numSpaces = pyramidHeight - distanceFromTop + 1;
const int numHashes = distanceFromTop + 1;
printSpaces(numSpaces);
printHashes(numHashes);
printf("\n");
}
return 0;
}
Stage 7: Why are numSpaces and numHashes what they are?
Everything is coming together now. The only thing left to figure out is the formulas that give numSpaces and numHashes.
Let's start with numHashes because it is easier to understand. We want numHashes to be one when the distance from the top is zero, and we want numHashes to increase by one whenever the distance from the top does, so the correct formula is numHashes = distanceFromTop + 1.
Now for numSpaces. We know that every time the distance from the top increases, a space turns into a hash, and so there is one less space. Thus the expression for numSpaces should have a -distanceFromTop in it. But how many spaces should the top row have? Since the top row already has a hash, there are pyramidHeight - 1 hashes that need to be made, so there must be at least pyramidHeight - 1 spaces so they can be turned into hashes. In the code we have chosen pyramidHeight + 1 spaces in the top row, which is two more than pyramidHeight - 1, and so has the effect of moving the whole picture right by two spaces.
Conclusion
You were only asking how the two inner loops worked, but I gave a very long answer. This is because I thought the real problem was not that you didn't understand how for loops work sufficiently well, but rather that your code was difficult to read and so it was hard to tell what anything was doing. So I showed you how I would have written the program, hoping that you would think it is easier to read, so you would be able to see what is going on more clearly, and hopefully so you could learn to write clearer code yourself.
How did I change the code? I changed the names of the variables so it was clear what the role of each variable was; I introduced new variables and tried to give them good names as well; and I moved some of the lower level code involving input and output and the logic of printing characters a certain number of times into their own methods. This last change greatly decreased the number of lines in the main function
, got rid of the nesting of for loops in the main function, and made the key logic of the program easy to see.
First off, let's get the body of the loops out of the way. The first one just prints spaces, and the second one prints hash marks.
We want to print a line like this, where _ is a space:
______######
So the magic question is, how many spaces and #s do we need to print?
At each line, we want to print 1 more # than the line before, and 1 fewer spaces than the line before. This is the purpose "tall" serves in the outer loop. You can think of this as "the number of hash marks that should be printed on this line."
All of the remaining characters to print on the line should be spaces. As a result, we can take the total line length (which is the number the user entered), and subtract the number of hash marks on this line, and that's the number of spaces we need. This is the condition in the first for loop:
for ( int space = 0; space <= user_i - tall; space++ )
// ~~~~~~~~~~~~~
// Number of spaces == number_user_entered - current_line
Then we need to print the number of hash marks, which is always equal to the current line:
for ( int hash = 0; hash <= tall; hash++ )
// ~~~~
// Remember, "tall" is the current line
This whole thing is in a for loop that just repeats once per line.
Renaming some of the variables and introducing some new names can make this whole thing much easier to understand:
#include <stdio.h>
int main ( void )
{
int userProvidedNumber;
printf ( "Hello there and welcome to the pyramid creator program\n" );
printf ( "Please enter a non negative INTEGER from 0 to 23\n" );
scanf ( "%d", &userProvidedNumber );
while ( userProvidedNumber < 0 || userProvidedNumber > 23 )
{
scanf ( "%d", &userProvidedNumber );
}
for ( int currentLine = 0; currentLine < userProvidedNumber; currentLine++ )
{
int numberOfSpacesToPrint = userProvidedNumber - currentLine;
int numberOfHashesToPrint = currentLine;
for ( int space = 0; space <= numberOfSpacesToPrint; space++ )
{
printf ( " " );
}
for ( int hash = 0; hash <= numberOfHashesToPrint; hash++ )
{
printf ( "#" );
}
// We need to specify the printf("\n"); statement here
printf ( "\n" );
}
return 0;
}
A few things:
Consider "bench checking" this. That is, trace through the loops yourself and draw out the spaces and hash marks. Consider using graph paper for this. I've been doing this for 15 years and I still trace things out on paper from time to time when they're hairy.
The magic in this is the value "user_i - tall" and "hash <= tall". Those are the conditions on the two inner for loops as the middle values in the parenthesis. Note what they are doing:
Because tall is "going up" from the outer-most loop, by subtracting it from user_i, the loop that prints the spaces is "going down". That is, printing fewer and fewer spaces as it goes.
Because tall is "going up", and because the hash loop is just basically using it as-is, it's going up too. That is, printing more hash marks as you go.
So really ignore most of this code. It's generic: the outer loop just counts up, and most of the inner loops just do basic initialization (i.e. space = 0 and hash = 0), or basic incrementation (space++ and hash++).
It is only the center parts of the inner loops that matter, and they use the movement of tall from the outermost loop to increment themselves down and up respectively, as noted above, thus making the combination of spaces and hash marks needed to make a half-pyramid.
Hope this helps!
Here's your code, just reformatted and stripped of comments:
for(int tall = 0;tall<user_i;tall++)
{
for(int space=0; space<=user_i-tall; space++){
printf(" ");
}
for(int hash=0; hash<=tall; hash++){
printf("#");
}
printf("\n");
}
And the same code, with some explanation interjected:
// This is a simple loop, it will loop user_i times.
// tall will be [0,1,...,(user_i - 1)] inclusive.
// If we peek at the end of the loop, we see that we're printing a newline character.
// In fact, each iteration of this loop will be a single line.
for(int tall=0; tall<user_i; tall++)
{
// "For each line, we do the following:"
//
// This will loop (user_i - tall + 1) times, each time printing a single space.
// As we've seen, tall starts at 0 and increases by 1 per line.
// On the first line, tall = 0 and this will loop (user_i + 1) times, printing that many spaces
// On the last line, tall = (user_i - 1) and this will loop 0 times, not printing any spaces
for(int space=0; space<=user_i-tall; space++){
printf(" ");
}
// This will loop (tall + 1) times, each printing a single hash
// On the first line, tall = 0 and this will loop 1 time, printing 1 hash
// On the last line, tall = (user_i - 1) and this will loop user_i times, printing that many hashes
for(int hash=0; hash<=tall; hash++){
printf("#");
}
// Finally, we print a newline
printf("\n");
}
This type of small programs are the one which kept fascinating me in my earlier days. I think they play a vital role in building logics and understand how, when, where and in which situation which loop will be perfect one.
The best way one can understand what happening is to manually debug each and every statement.
But to understand better lets understand how to build the logic for that, I am making some minor changes so that we can understand it better,
n is no of lines to print in pyramid n =5
Replacing empty spaces ' ' with '-' (dash symbol)
Now pyramid will look something like,
----#
---##
--###
-####
#####
Now steps to design the loop,
First of all we have to print n lines i.e. 5 lines so first loop will be running 5 times.
for (int rowNo = 0; rowNo < n; rowNo++ ) the Row number rowNo is similar to tall in your loop
In each line we have to print 5 characters but such that we get our desired figure, if we closely look at what logic is there,
rowNo=0 (Which is our first row) we have 4 dashes and 1 hash
rowNo=1 we have 3 dashes and 2 hash
rowNo=2 we have 2 dashes and 3 hash
rowNo=3 we have 1 dashes and 4 hash
rowNo=4 we have 0 dashes and 5 hash
Little inspection can reveal that for each row denoted by rowNo we have to print n - rowNo - 1 dashes - and rowNo + 1 hashes #,
Therefore inside our first for loop we have to have two loops, one to print dashes and one to print hashes.
Dash loop will be for (int dashes= 0; dashes < n - rowNo - 1; dashes ++ ), here dashes is similar to space in your original program
Hash loop will be for (int hash = 0; hash < rowNo + 1; dashes ++ ),
The last step after every line we have to print a line break so that we can move to next line.
Hope, the above explanation provides a clear understanding how the for loops that you have written will be formulated.
In your program there are only minor changes then my explanation, in my loops I have used less then < operator and you have used <= operator so it makes difference of one iteration.
You should use indentatation to make your code more readable and therefore easier to understand.
What your code does is print user_i lines that consist of user_i-tall+1 spaces and then tall+1 hashes. Because the iteration index tall increases with every pass this means that one more hash is printed. They are right aligned because a space is left out as well. Subsequently you have 'growing' rows of hashes that form a pyramid.
What you are doing is equivalent to this pseudo code:
for every i between 0 and user_i do:
print " " (user_i-i+1) times
print "#" (i+1) times
Analyzing your loops:
The first loop
for ( int tall = 0; tall < user_i; tall++ ){...}
is controlling the row. The second loop
for ( int space = 0; space <= user_i - tall; space++ ){...}
for the column to be filled by spaces.
For each row , it will fill all the user_i - tall columns with spaces.
Now the remaining columns are filled by # by the loop
for ( int hash = 0; hash <= tall; hash++ ){...}
#include <stdio.h>
// Please note spacing of
// - functions braces
// - for loops braces
// - equations
// - indentation
int main(void)
{
// Char holds all the values we want
// Also, declaire all your variables at the top
unsigned char user_i;
unsigned char tall, space, hash;
// One call to printf is more efficient
printf("Hello there and welcome to the pyramid creator program\n"
"Please enter a non negative INTEGER from 0 to 23\n");
// This is suited for a do-while. Exercise to the reader for adding in a
// print when user input is invalid.
do scanf("%d", &user_i);
while (user_i < 0 || user_i > 23);
// For each level of the pyramid (starting from the top)...
// Goes from 0 to user_i - 1
for (tall = 0; tall < user_i; tall++) {
// We are going to make each line user_i + 2 characters wide
// At tall = 0, this will be user_i + 1 characters worth of spaces
// At tall = 1, this will be user_i + 1 - 1 characters worth of spaces
// ...
// At tall = user_i - 1, this will be user_i + 1 - (user_i - 1) characters worth of spaces
for (space = 0; space <= user_i - tall; space++)
printf(" "); // no '\n', so characters print right next to one another
// because of using '<=' inequality
// \_
// At tall = 0, this will be 0 + 1 characters worth of hashes
// At tall = 1, this will be 1 + 1 characters worth of hashes
// ...
// At tall = user_i - 1, this will be user_i - 1 + 1 characters worth of spaces
for (hash = 0; hash <= tall; hash++)
printf("#");
// Level complete. Add a newline to start the next level
printf("\n");
}
return 0;
}
The simplest answer to why that is happening is because one loop prints spaces like this represented by -.
----------
---------
--------
-------
------
-----
and so on.
The hashes are printed like this
------#
-----##
----###
---####
--#####
-######
since hash printing loop at second stage needs to print equal no of hashes more to complete the pyramid it can be solved by two methods one by copying and pasting hash loop twice or by making the loop run twice next time by following modification
for ( int hash = 0; hash <= tall*2; hash++ )
{
printf ( "#" );
}
The logic to build such loops is simple the outermost loop prints one line feed to seperate
loops,the inner loops are responsible for each lines contents.
The space loop puts spaces and hash loop appends hash at end of spaces.
[My answer can be redundant because i did not read other answers so much carefully,they were long]
Result:
#
###
#####
#######
#########
###########
for ( int tall = 0; tall < user_i; tall++ ) { ... }
I think of this in natural language like this:
int tall = 0; // Start with an initial value of tall = 0
tall < user_i; // While tall is less than user_i, do the stuff between the curly braces
tall++; // At the end of each loop, increment tall and then recheck against the condition in step 2 before doing another loop
And with nice indentation in your code now it's much easier to see how the for loops are nested. The inner loops are going to be run for every iteration of the outer loop.

Can't find most frequent word

Hey this is my first post here. i have been assigned with an excercise to count the most frequent word in c programming language. first and foremost i need to read a number which tells me how many words i will have to read. then i need to use calloc with max element size 50. after that i read the strings. my original idea was to create a one-dimensional array which im gonna later sort alphabetically and then counting and printing the most frequent word would be easy. but after some hours of research i found out i need to use a two-dimensional array and things went out of control. ive been studying computer science for 3 months now and this exercise seems tough. do u have any other suggestions?. the example was this:
10
hello
world
goodbye
world
thanks
for
all
hello
the
fish
hello
my code so far is
int main()
{
int i, n, j, temp;
int *a;
printf("Eisagete to plhthos twn leksewn:");
scanf("%d",&n);
a = (int*)calloc(n,50);
printf("Eisagete tis %d lekseis:\n",n);
for( i=0 ; i < n ; i++ )
{
scanf("%d",&a[i]);
}
for (i = 0 ; i < ( n - 1 ); i++)
{
for (j = 0 ; j < n - i - 1; j++)
{
if (a[j] > a[j+1])
{
temp = a[j];
a[j] = a[j+1];
a[j+1] = temp;
}
}
}
dont mind the printfs they are in greek and they are just there to make it look better. i also want to point out that this version is used for integers and not for strings just to start off.
im currently trying a linear search but im not sure if it will help
As you point out, the code you show is related to reading and sorting integers; it is only loosely related to the word counting problem.
How would you count the occurrences of each number? You'd have to
Read the next number;
If you already have a tally for the number, you add one to the tally for that number;
If you've not seen the number before, you create a tally for it and set its count to one.
When all the numbers are read, you search through the set of tallies, looking for the one with the largest count.
Record the number and count of the first entry.
For each subsequent entry:
If the count is larger than the current maximum, record the new maximum count and the entry.
Print the information about the number with the largest count and what that count is.
Replace numbers with words and the general outline will be very similar. You might allocate storage for each string (distinct word) separately.
It is easy to count the number of distinct words or the total number of words as you go. Note that you do not need to store all the words; you only need to store the distinct words. And the count at the front of the list is computer science education gone astray; you don't need the count to make it work (but you probably have to live with it being in the data; the simplest thing is to ignore the first line of input since it really doesn't help very much at all). The next simplest thing is to note that unless they're fibbing to you, the maximum number of distinct words will be the number specified, so you can pre-allocate all the space you need in one fell swoop.
Very simple. Store your words in a 2D array the loop through it and each time you take a word loop through it again starting from the current index and check if there is an equal. ech time the child loop ends check if the number of occurrences is bigger than the last maximum.
#include <stdio.h>
int main()
{
int i, j, occurrence=0, maximum = 0;
char *index_max = NULL;
char wl[10][10] = {"hello","world","goodbye","world","thanks","for","all","hello","the","world"};
for (i=0; i<10; i++){
occurrence = 0;
for (j=i; j<10; j++){
if (!strcmp(*(wl+i), *(wl+j))){
occurrence++;
}
}
if (occurrence>maximum){
maximum = occurrence;
index_max = *(wl+i);
}
}
if (index_max != NULL){
printf("The most frequent word is \"%s\" with %d occurrences.\n", index_max, maximum);
}
return 0;
}

atmost K mismatch substrings?

I'm tryring to solve this problem though using brute force I was able to solve it, but
the following optimised algo is giving me incorrect results for some of the testcases .I tried but couldn;t find the problem with the code can any body help me.
Problem :
Given a string S and and integer K, find the integer C which equals the number of pairs of substrings(S1,S2) such that S1 and S2 have equal length and Mismatch(S1, S2) <= K where the mismatch function is defined below.
The Mismatch Function
Mismatch(s1,s2) is the number of positions at which the characters in S1 and S2 differ. For example mismatch(bag,boy) = 2 (there is a mismatch in the second and third position), mismatch(cat,cow) = 2 (again, there is a mismatch in the second and third position), Mismatch(London,Mumbai) = 6 (since the character at every position is different in the two strings). The first character in London is ‘L’ whereas it is ‘M’ in Mumbai, the second character in London is ‘o’ whereas it is ‘u’ in Mumbai - and so on.
int main() {
int k;
char str[6000];
cin>>k;
cin>>str;
int len=strlen(str);
int i,j,x,l,m,mismatch,count,r;
count=0;
for(i=0;i<len-1;i++)
for(j=i+1;j<len;j++)
{ mismatch=0;
for(r=0;r<len-j+i;r++)
{
if(str[i+r]!=str[j+r])
{ ++mismatch;
if(mismatch>=k)break;
}
if(mismatch<=k)++count;
}
}
cout<<count;
return 0;
}
Sample test cases
Test case (passing for above code)
**input**
0
abab
**output**
3
Test case (failing for above code)
**input**
3
hjdiaceidjafcchdhjacdjjhadjigfhgchadjjjbhcdgffibeh
**expected output**
4034
**my output**
4335
You have two errors. First,
for(r=1;r<len;r++)
should be
for(r=1;r<=len-j;r++)
since otherwise,
str[j+r]
would at some point begin comparing characters past the null-terminator (i.e. beyond the end of the string). The greatest r can be is the remaining number of characters from the jth index to the last character.
Second, writing
str[i+r]
and
str[j+r]
skips the comparison of the ith and jth characters since r is always at least 1. You should write
for(r=0;r<len-j;r++)
You have two basic errors. You are quitting when mismatches>=k instead of mismatches>k (mismatches==k is an acceptable number) and you are letting r get too large. These skew the final count in opposite directions but, as you see, the second error "wins".
The real inner loop should be:
for (r=0; r<len-j; ++r)
{
if (str[i+r] != str[j+r])
{
++mismatch;
if (mismatch > k)
break;
}
++count;
}
r is an index into the substring, and j+r MUST be less than len to be valid for the right substring. Since i<j, if str[j+r] is valid, then so it str[i+r], so there's no need to have i involved in the upper limit calculation.
Also, you want to break on mismatch>k, not on >=k, since k mismatches are allowed.
Next, if you test for too many mismatches after incrementing mismatch, you don't have to test it again before counting.
Finally, the upper limit of r<len-j (instead of <=) means that the trailing '\0' character won't be compared as part of the str[j+r] substring. You were comparing that and more when j+r >= len, but mismatches was less than k when that first happened.
Note: You asked about a faster method. There is one, but the coding is more involved. Make the outer loop on the difference delta between starting index values. (0<delta<len) Then, count all acceptable matches with something like:
count = 0;
for delta = 1 to len-1
set i=0; j=delta; mismatches=0; r=0;
while j < len
.. find k'th mismatch, or end of str:
while mismatches < k and j+r&ltlen
if str[i+r] != str[j+r] then mismatches=mismatches+1
r = r+1
end while
.. extend r to cover any trailing matches:
while j+r<len and str[i+r]==str[j+r]
r + r+1
end while
.. arrive here with r being the longest string pair starting at str[i]
.. and str[j] with no more than k mismatches. This loop will add (r)
.. to the count and advance i,j one space to the right without recounting
.. the character mismatches inside. Rather, if a mismatch is dropped off
.. the front, then mismatches is decremented by 1.
repeat
count = count + r
if str[i] != str[j] then mismatches=mismatches-1
i = i+1, j = j+1, r = r-1
until mismatches < k
end if
end while
That's pseudocode, and also pseudocorrect. The general idea is to compare all substrings with starting indices differing by (delta) in one pass, starting and the left, and increasing the substring length r until the end of the source string is reached or k+1 mismatches have been seen. That is, str[j+r] is either the end of the string, or the camel's-back-breaking mismatch position in the right substring. That makes r substrings that had k or fewer mismatches starting at str[i] and str[j].
So count those r substrings and move to the next positions i=i+1,j=j+1 and new length r=r-1, reducing the mismatch count if unequal characters were dropped off the left side.
It should be pretty easy to see that on each loop either r increases by 1 or j increases by 1 and (j+r) stays the same. Both will j and (j+r) will reach len in O(n) time, so the whole thing is O(n^2).
Edit: I fixed the handing of r, so the above should be even more pseudocorrect. The improvement to O(n^2) runtime might help.
Re-edit: Fixed comment bugs.
Re-re-edit: More typos in algorithm, mostly mismatches misspelled and incremented by 2 instead of 1.
#Mike I have some modifications in your logic and here is the correct code for it...
#include<iostream>
#include<string>
using namespace std;
int main()
{
long long int k,c=0;
string s;
cin>>k>>s;
int len = s.length();
for(int gap = 1 ; gap < len; gap ++)
{
int i=0,j=gap,mm=0,tmp_len=0;
while (mm <=k && (j+tmp_len)<len)
{
if (s[i+tmp_len] != s[j+tmp_len])
mm++;
tmp_len++;
}
// while (((j+tmp_len)<len) && (s[i+tmp_len]==s[j+tmp_len]))
// tmp_len++;
if(mm>k){tmp_len--;mm--;}
do{
c = c + tmp_len ;
if (s[i] != s[j]) mm--;
i++;
j++;
tmp_len--;
while (mm <=k && (j+tmp_len)<len)
{
if (s[i+tmp_len] != s[j+tmp_len])
mm++;
tmp_len++;
}
if(mm>k){tmp_len--;mm--;}
}while(tmp_len>0);
}
cout<<c<<endl;
return 0;
}

Algorithm for processing the string

I really don't know how to implement this function:
The function should take a pointer to an integer, a pointer to an array of strings, and a string for processing. The function should write to array all variations of exchange 'ch' combination to '#' symbol and change the integer to the size of this array. Here is an example of processing:
choker => {"choker","#oker"}
chocho => {"chocho","#ocho","cho#o","#o#o"}
chachacha => {"chachacha","#achacha","cha#acha","chacha#a","#a#acha","cha#a#a","#acha#a","#a#a#a"}
I am writing this in C standard 99. So this is sketch:
int n;
char **arr;
char *string = "chacha";
func(&n,&arr,string);
And function sketch:
int func(int *n,char ***arr, char *string) {
}
So I think I need to create another function, which counts the number of 'ch' combinations and allocates memory for this one. I'll be glad to hear any ideas about this algorithm.
You can count the number of combinations pretty easily:
char * tmp = string;
int i;
for(i = 0; *tmp != '\0'; i++){
if(!(tmp = strstr(tmp, "ch")))
break;
tmp += 2; // Skip past the 2 characters "ch"
}
// i contains the number of times ch appears in the string.
int num_combinations = 1 << i;
// num_combinations contains the number of combinations. Since this is 2 to the power of the number of occurrences of "ch"
First, I'd create a helper function, e.g. countChs that would just iterate over the string and return the number of 'ch'-s. That should be easy, as no string overlapping is involved.
When you have the number of occurences, you need to allocate space for 2^count strings, with each string (apart from the original one) of length strlen(original) - 1. You also alter your n variable to be equal to that 2^count.
After you have your space allocated, just iterate over all indices in your new table and fill them with copies of the original string (strcpy() or strncpy() to copy), then replace 'ch' with '#' in them (there are loads of ready snippets online, just look for "C string replace").
Finally make your arr pointer point to the new table. Be careful though - if it pointed to some other data before, you should think about freeing it or you'll end up having memory leaks.
If you would like to have all variations of replaced string, array size will have 2^n elements. Where n - number of "ch" substrings. So, calculating this will be:
int i = 0;
int n = 0;
while(string[i] != '\0')
{
if(string[i] == 'c' && string[i + 1] == 'h')
n++;
i++;
}
Then we can use binary representation of number. Let's note that incrementing integer from 0 to 2^n, the binary representation of i-th number will tell us, which "ch" occurrence to change. So:
for(long long unsigned int i = 0; i < (1 << n); i++)
{
long long unsigned int number = i;
int k = 0;
while(number > 0)
{
if(number % 2 == 1)
// Replace k-th occurence of "ch"
number /= 2;
k++;
}
// Add replaced string to array
}
This code check every bit in binary representation of number and changes k-th occurrence if k-th bit is 1. Changing k-th "ch" is pretty easy, and I leave it for you.
This code is useful only for 64 or less occurrences, because unsigned long long int can hold only 2^64 values.
There are two sub-problems that you need to solve for your original problem:
allocating space for the array of variations
calculating the variations
For the first problem, you need to find the mathematical function f that takes the number of "ch" occurrences in the input string and returns the number of total variations.
Based on your examples: f(1) = 1, f(2) = 4 and f(3) = 8. This should give you a good idea of where to start, but it is important to prove that your function is correct. Induction is a good way to make that proof.
Since your replace process ensures that the results have either the same of a lower length than the original you can allocate space for each individual result equal to the length of original.
As for the second problem, the simplest way is to use recursion, like in the example provided by nightlytrails.
You'll need another function which take the array you allocated for the results, a count of results, the current state of the string and an index in the current string.
When called, if there are no further occurrences of "ch" beyond the index then you save the result in the array at position count and increment count (so the next time you don't overwrite the previous result).
If there are any "ch" beyond index then call this function twice (the recurrence part). One of the calls uses a copy of the current string and only increments the index to just beyond the "ch". The other call uses a copy of the current string with the "ch" replaced by "#" and increments the index to beyond the "#".
Make sure there are no memory leaks. No malloc without a matching free.
After you make this solution work you might notice that it plays loose with memory. It is using more than it should. Improving the algorithm is an exercise for the reader.

Resources