Compare character by character of strings from an array efficiently - c

Palavras = (char***)malloc(maxLen*sizeof(char**));
matrizAdj = (int***)malloc(maxLen*sizeof(int**));
for(i=0; i<maxLen; i++){
Palavras[i]= (char**)malloc(NMax_Palavras[i]*sizeof(char*));
matrizAdj[i]=(int**)malloc((NMax_Palavras[i])*sizeof(int*));
Palavras_Atuais[i]=0;
}
for(i=0; i<maxLen; i++){
for(j=0; j<NMax_Palavras[i];j++){
Palavras[i][j]=(char*)malloc((i+1)*sizeof(char)+1);
matrizAdj[i][j]=(int*)malloc((NMax_Palavras[i])*sizeof(int)+1);
}
}
rewind(fpIn1);
while(fscanf(fpIn1, "%s", novaPal1) == 1){
size_t len = strlen(novaPal1);
strcpy(Palavras[len-1][Palavras_Atuais[len-1]], novaPal1);
Palavras_Atuais[len-1]+=1;
}
for(i=0; i<maxLen;i++){ /*different array according to word length*/
Palavras_Atuais[i]=0; /*index in the string array according to the length*/
for(k=0;k<NMax_Palavras[i];k++){ /*NMAX_palavras stands for the number of words of that particulary length*/
Palavras_Atuais[i]=aux3; /*initializes in 1 so it dont compare with it self*/
while(Palavras_Atuais[i]<NMax_Palavras[i]){/*while you dont compare with all the words*/
for(j=0;j<i;j++){
if((Palavras[i][k][j])==(Palavras[i][Palavras_Atuais[i]][j])){ /*compare 1 one with all oders*/
continue;
}
else{
matrizAdj[i][k][Palavras_Atuais[i]]+=1; /*create an adjacency matrix with the number of different characters*/
matrizAdj[i][Palavras_Atuais[i]][k]+=1;
}
}
Palavras_Atuais[i]+=1; /*go to the next word*/
}
aux3+=1; /*add on in each cycle meaning that it doesnt compare words that already been compare since the matrix will gonna be symetric*/
}
}
Hello, based on a given dictionary with words of various sizes I need to code a weighted adjacency matrix based on the number of different characters.
Later I have saved and organized all the words in n matrix (n=word size), which in the code is Palavras[i][k].
My code works for small dictionaries but as the size increases (for example if it has 200 000 words) as the complexity is n^2 it doesn't work.
Any tips on how to do this more efficiently?

Related

Shuffle words from a 1D array

I've been given this sentence and I need to shuffle the words of it:
char array[] = "today it is going to be a beautiful day.";
A correct output would be: "going it beautiful day is a be to today"
I've tried many things like turning it into a 2D array and shuffling the rows, but I can't get it to work.
Your instinct of creating a 2D array is solid. However in C that's more involved than you might expect:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
int main()
{
char array[] = "today it is going to be a beautiful day.";
char out_array[sizeof(array)];
char words[sizeof(array)][46];
int word_count = 0;
int letter_count = 0;
int on_word = 0;
int count = 0;
int i = 0;
int j = 0;
srand(time(NULL));
// parse words into 2D array
for (i = 0; i < sizeof(array); i++) {
if (array[i] == ' ') {
if (on_word) {
words[word_count++][letter_count] = '\0';
letter_count = 0;
on_word = 0;
}
} else if (array[i] == '\0' || array[i] == '.') {
break;
} else {
on_word = 1;
words[word_count][letter_count++] = array[i];
}
}
words[word_count++][letter_count] = '\0';
// randomly swap around words
for (i = 0; i < word_count; i++) {
char temp[46];
int idx = rand() % word_count;
if (idx != i) {
strcpy(temp, words[idx]);
strcpy(words[idx], words[i]);
strcpy(words[i], temp);
}
}
// output words into out_array
for (i = 0; i < word_count; i++) {
for (j = 0; words[i][j] != '\0'; j++) {
out_array[count++] = words[i][j];
}
out_array[count++] = ' ';
}
out_array[count - 1] = '\0';
printf("%s", out_array);
return 0;
}
You need two basic algorithms to solve this problem.
Split the input string into a list of words.
Randomly sample your list of words until there are no more.
1. Split the input string into a list of words.
This is much simpler than you may think. You don’t need to actually copy any words, just find where each one begins in your input string.
today it is going to be a beautiful day.
^---- ^- ^- ^---- ^- ^- ^ ^-------- ^--
There are all kinds of ways you can store that information, but the two most useful would be either an array of integer indices or an array of pointers.
For your example sentence, the following would be a list of indices:
0, 6, 9, 12, 18, 21, 24, 26, 36
To do this, just create an array with a reasonable upper limit on words:
int words[100]; // I wanna use a list of index values
int nwords = 0;
 
char * words[100]; // I wanna use a list of pointers
int nwords = 0;
If you do it yourself either structure is just as easy.
If you use strtok life is much easier with a list of pointers.
All you need at this point is a loop over your input to find the words and populate your list. Remember, a words is any alphabetic or numeric value (and maybe hyphens, if you want to go that far). Everything else is not a word. If you #include <ctype.h> you get a very handy function for classifying a character is “word” or “not-word”:
if (isalnum( input[n] )) its_a_word_character;
else its_not_a_word_character_meaning_we_have_found_the_end_of_the_word;
Now that you have a list of words, you can:
2. Randomly sample your list of words until there are no more.
There are, again, a number of ways you could do this. Already suggested above is to randomly shuffle the list of words (array of indices or array of pointers), and then simply rebuild the sentence by taking the words in order.
→ Beware, Etian’s example is not a correct shuffle, though it would probably go unnoticed or ignored by everyone at your level of instruction as it will appear to work just fine. Google around “coding horror fisher yates” for more.
The other way would be to just select and remove a random word from your array until there are no words left.
The random sampling is not difficult, but it does require some precise thinking, making this the actually most difficult part of your project.
To start you first need to get a proper random number. There is a trick to this that people are generally not taught. Here you go:
int random( int N ) // Return an UNBIASED pseudorandom value in [0, N-1].
{
int max_value = (RAND_MAX / N) * N;
int result;
do result = rand(); while (result >= max_value);
return result % N;
}
And in main() the very first thing you should do is initialize the random number generator:
#include <stdlib.h>
#include <time.h>
int main()
{
srand( (unsigned)time( NULL ) );
Now you can sample / shuffle your array properly. You can google "Fisher-Yates Shuffle" (or follow the link in the comment below your question). Or you can just select the next word:
while (nwords)
{
int index = random( nwords );
// do something with word[index] here //
// Remove the word we just printed from our list of words
// • Do you see what trick we use to remove the word?
// • Do you also know why this does not affect our random selection?
words[index] = words[--nwords];
}
Hopefully you can see that both of these methods are essentially the same thing. Whichever you choose is up to you. I personally would use the latter because of the following consideration:
Output
You can create a new string and then print it, or you can just print each word directly. As the homework (as you presented it) does not require generation of a new string, I would just print the output directly. This makes life simpler in the sense that you do not have to mess with another string array.
As you print each word (or append it to a new string), remember how you separated them to begin with. If you use strtok you can just use something like:
printf( "%s", words[index] ); // print word directly to stdout
 
strcat( output, words[index] ); // append word to output string
If you found the beginnings of each word yourself, you will have to again loop until you find the end of the word:
// Print word, character by character, directly to stdout
for (int n = index; isalnum( words[index+n] ); n++)
{
putchar( words[index+n] );
}
 
// Append word, character by character, to output string
for (int n = index; isalnum( words[index+n] ); n++)
{
char * p = strchr( output, '\0' ); // (Find end of output[])
*p++ = words[index+n]; // (Add char)
*p = '\0'; // (Add null terminator)
}
All that’s left is to pay attention to spaces and periods in your output.
Hopefully this should be enough to get you started.

How to reduce the time complexity of seaching the same character in two different strings in C?

Input contains a
n
which indicated the total amount of the strings.
Then use scanf to scan those strings.
The task is that find out if two strings have the same characters.
If so,they are in the same group.
Two string belongs to same group if :
1.there exists a character that exist in both string.
2.there exists a character in both string A and B,
and there exists another character in both string B and C,
then A, B, C belong to same group.
for example
>>input
5
abbbb
a
c
ddca
fgg
Here "abbbb","a","c","ddca" are in the same group
and output the total numbers of groups
in this example is
2
Every characters in the strings only contains 'a'~'z'
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
typedef struct str //'ini' is used to save the initial strings 'convert' recorded the characters converted to ascii code then -'a'
{
char ini[1001];
int convert[1001];
} STR;
STR a[2002];
int visited[2002];// record the index that has been traverse
int count;//record the total number of groups
int record[26];//there is 26 characters from'a'~'z'
int check(int index1,int index2)
{
int len_1=strlen(a[index1].ini);
int len_2=strlen(a[index2].ini);
for(int i=0; i<26; i++)//reset
{
record[i]=0;
}
for(int i=0; i<len_1; i++)//traverse the first string and recorded the characters in "record"array
{
record[a[index1].convert[i]]=1;
}
for(int i=0; i<len_2; i++)
{
if(record[a[index2].convert[i]]==1)//if they have same characters
{
return 1;
}
}
return 0;
}
void dfs(int now,int n)//now record index ,n record the total index
{
visited[now]=1;
for(int i=0; i<n; i++)
{
if(i==now) continue;
if(visited[i]==1) continue;
if(check(i,now))
{
dfs(i,n);
}
}
}
int main()
{
int n;
scanf("%d",&n);
for(int i=0; i<n; i++)
{
scanf("%s",a[i].ini);
int len=strlen(a[i].ini);//recording the length of the input string
for(int j=0; j<len; j++) //convert every characters in the ini string to ascii then -'a', recording in 'convert' int array
{
char ch=a[i].ini[j];
a[i].convert[j]=ch-'a';
}
}
for(int i=0; i<n; i++)//dfs
{
if(visited[i]==0)
{
count++;
dfs(i,n);
}
}
printf("%d\n",count);
return 0;
}
I tried to use DFS to search it, and use every strings as a node
Is that any better way to reduce the time complexity?
Can it be more faster?
You iterate over each string 2 times (strlen + the for loop). That's rather inefficient.
When done looking through the string(s), there shouldn't be a need to search for anything, if you place the results in a sensible way.
Ideally keep track of the results from each string separately.
Something like this (naive implementation):
#define LETTERS 26
void str_common (const char* s1, const char* s2)
{
bool record1 [LETTERS]={false};
bool record2 [LETTERS]={false};
for(size_t i=0; s1[i]!='\0'; i++)
{
record1[s1[i]-'a']=true;
}
for(size_t i=0; s2[i]!='\0'; i++)
{
record2[s2[i]-'a']=true;
}
for(size_t i=0; i<LETTERS; i++)
{
if(record1[i] && record2[i])
{
putchar('a' + i);
}
}
}
This is naive code because it has no error handling and also assumes that all letters in the symbol table are adjacent, which C makes no guarantees for. Still it gives something to start with.
You can rewrite this function to only work with one string at a time and instead of printing, returning the record array to the caller. Comparing which letters that exists in your 3 different strings then becomes trivial, just change the if(record1[i] && record2[i]) check to contain as many strings and conditions as you need.
There is a place for some optimization. First, drop the convert arrays and replace it with bool usedLetters[256];. Fill it after reading each string with a simple loop over the string:
scanf("%s",a[i].ini);
for(int j=0; a[i].ini[j] != 0; j++) //scan the string till its end
{
unsigned char ch=a[i].ini[j]; // get a next character
a[i].usedLetters[ch] = true; // and note it's used
}
Then checking whether two strings share a letter requires a parallel scanning of both corresponding usedLetters[]:
int check(int index1,int index2)
{
bool *used1 = a[index1].usedLetters;
bool *used2 = a[index2].usedLetters;
for(int i=0; i < 256; i++) // iterate over all possible chars
{
if(used1[i] && used2[i]) // char(i) used in both strings?
return 1;
}
return 0;
}
However, the main cost is in your scanning routine. It's not DFS (unless you consider the whole set of strings as a full graph) – for each string visited you try to compare it to all strings in a set, then discarding those already visited. As a result, your routine performs up to N^2 iterations of a loop and (in a worst case of all strings disjoint) up to ~(N^2)/2 calls to check() (it will need to identify N one-string 'groups', each time checking a starting string against N/2 other strings on average).

How do i keep the rest of an array of strings in C from filling with junk?

I'm working on a practice program where the user inputs a list of names. I've got the array of strings set to 50 long to give the user plenty of space, but if they are done, they can type 'quit' to stop typing. how can i keep the rest of the array from filling with junk or possibly shrink it to fit only the entered list.
#include <stdio.h>
#include <string.h>
int main()
{
char list[50][11];
char temp[11];
int index;
printf("Input a list of names type 'quit' to stop\n")
for(index = 0; index < 50; index++)
{
scanf(" %10s", temp);
if(strcmp(temp, "quit") != 0)
{
strcpy(list[index], temp);
}
else
{
index = 50;
}
}
for(int index = 0; index < 50; index++)
{
puts(list[index]);
}
return 0;
}
IMO this is a Zen of Programming question, and UnholySheep is prodding you to think in the right direction.
What is Junk? You have told the computer you need a list of 50 things, but you didn't tell it what to put in all of those list entries. So the computer just uses whatever memory it has lying around, and the odds of a particular byte being whatever value you decide is Not Junk is something like 1:256.
Of course, the Zen here is not the answer to the question "What is Junk", but rather understanding that there is Junk and Not Junk, and the only Not Junk is that which you have arranged for to exist.
So, if you don't know that a memory address does not contain Junk, then it does.
The solution to your programming question then, is to keep track of how many list entries are Not Junk. There are two common approaches used in C for this:
keep track of the length of your list, or
put a special value at the end of your list
how can i keep the rest of the array from filling with junk (?)
1) Use index. Simply keep track of how much was used. Do not access the unused portion of the array
for(int i = 0; i < index; i++) {
puts(list[i]);
}
2) Mark the next unused with a leading null character.
if (index < 50) list[index][0] = '\0';
for(int i = 0; i < 50 && list[i][0]; i++) {
puts(list[i]);
}
3) Re-architect: use a right-sized allocation array (below), a link-list, etc.
or possibly shrink it to fit only the entered list (?)
Once an array is defined, its size cannot change.
Yet a pointer to an allocated memory can be re-allocated.
Here list is a pointer to array 11 of char
char (*list)[11] = malloc(sizeof *list * 50); // error checking omitted for brevity
....
// fill up to 50
....
list = realloc(sizeof *list * index); // error checking omitted for brevity
just keep a count of entered values
int count = 0;
printf("Input a list of names type 'quit' to stop\n")
for(index = 0; index < 50; index++)
{
scanf(" %10s", temp);
if(strcmp(temp, "quit") != 0)
{
strcpy(list[index], temp);
count++;
}
else
{
index = 50;
}
}
for(int index = 0; index < count; index++)
{
puts(list[index]);
}

Copying elements from array by file

I am fetching the data from file line by line and storing them in word array
i want to copy the whole word into another array like if wrd has assssh in current iteration i want it whole to be copied to arr array
but what is been doing the first element in each iteration is copied in arr[i] but that is not what i want
i want the whole word to be copied at each index, actually after that i am sorting the word according to first alphabet in each array please help out
while (fscanf(file, " %1023s", wrd) == 1) {
printf("%s\n", wrd);
//Pushing the result into vector
//strcpy(arr,wrd);
arr[i]=wrd[0];
i++;
counter++;
}
bubbleSortAWriteToB(arr, s_arr);
Assumeing 'arr' is a two dimentional array and 'wrd' is a character array, your code should look something like below in order to achieve what you want:
while (fscanf(file, " %1023s", wrd) == 1) {
printf("%s\n", wrd);
// calculating length of the wrd array
int wrd_length = (int)( sizeof(wrd) / sizeof(wrd[0]);
int idx = 0;
while(idx < wrd_length) {
arr[i][idx] = wrd[idx];
idx++;
}
i++;
counter++;
}

Array manipulation in C

I am like 3 weeks new at writing c code, so I am a newbie just trying some examples from a Harvard course video hosted online. I am trying to write some code that will encrypt a file based on the keyword.
The point is each letter of the alphabet will be assigned a numerical value from 0 to 25, so 'A' and 'a' will be 0, and likewise 'z' and 'Z' will be 25. If the keyword is 'abc' for example, I need to be able to convert it to its numerical form which is '012'. The approach I am trying to take (having learned nothing yet about many c functions) is to assign the alphabet list in an array. I think in the lecture he hinted at a multidimensional array but not sure how to implement that. The problem is, if the alphabet is stored as an array then the letters will be the actual values of the array and I'd need to know how to search an array based on the value, which I don't know how to do (so far I've just been returning values based on the index). I'd like some pseudo code help so I can figure this out. Thanks
In C, a char is an 8-bit integer, so, assuming your letters are in order, you can actually use the char value to get the index by using the first letter (a) as an offset:
char offset = 'a';
char value = 'b';
int index = value - offset; /* index = 1 */
This is hard to answer, not knowing what you've learned so far, but here's a hint to what I would do: the chars representing letters are bytes representing their ASCII values, and occur sequentially, from a to z and A to Z though they don't start at zero. You can cast them to ints and get the ascii values out.
Here's the pseudo code for how I'd write it:
Cast the character to a number
IF it's between the ascii values of A and Z, subtract it from A
ELSE Subtract it from the ASCII value of a or A
Output the result.
For what it's worth, I don't see an obvious solution to the problem that involves multidimensional arrays.
char '0' is the value 48
char 'A' is the value 65
char 'a' is the value 97
You said you want to learn how to search in the array:
char foo[26]; //your character array
...
...
//here is initialization of the array
for(int biz=0;biz<26;biz++)
{
foo[biz]=65+biz; // capital alphabet
}
...
...
//here is searching 1 by 1 iteration(low-yield)
char baz=67; //means we will find 'C'
for(int bar=0;bar<26;bar++)
{
if(foo[bar]==baz) {printf("we found C at the index: %i ",bar);break;}
}
//since this is a soted-array, you can use more-yield search algortihms.
Binary search algortihm(you may use on later chapters):
http://en.wikipedia.org/wiki/Binary_search_algorithm
The use of a multidimensional array is to store both the lower case and upper case alphabets in an array so that they can be mapped. An efficient way is using their ASCII code, but since you are a beginner, I guess this example will introduce you to handle for loops and multidimensional arrays, which I think is the plan of the instructor as well.
Let us first set up the array for the alphabets. We will have two rows with 26 alphabets in each row:
alphabetsEnglish[26][2] = {{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'},
{'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'}};
Now we can map elements of both cases.
int main()
{
int c,i,j;
char word[10];
printf("Enter a word:");
scanf("%s",word);
c=strlen(word);
printf("Your word has %d letters ", c);
for (i = 0; i < c; i++) //loop for the length of your word
{
for (j = 0; j <= 25; j++) //second loop to go through your alphabet list
{
if (word[i] == alphabetsEnglish[0][j] || word[i] == alphabetsEnglish[1][j]) //check for both cases of your alphabet
{
printf("Your alphabet %c translates to %d: ", word[i], j);
}
}
}
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int *conv(char* str){
static const char* table = "abcdefghijklmnopqrstuvwxyz";
int size, *ret, *p;
if(NULL==str || *str == '\0') return NULL;
size = strlen(str);
ret=p=(int*)malloc(size*sizeof(int));
while(*str){
char *pos;
pos=strchr(table, tolower(*str++));
*p++ = pos == NULL ? -1 : pos - table;
}
return ret;
}
int main(void){
char *word = "abc";
int i, size = strlen(word), *result;
result = conv(word);
for(i=0;i<size;++i){
printf("%d ", result[i]);//0 1 2
}
free(result);
return 0;
}

Resources