Check for a word in a dictionary in C - c

I have a project in which I have a file (.dic) with many words with different sizes. And another file (.pal) with some words. For each word of the .pal file, I have to find its position in a list of words with the same number of words, ordered alphabetically from the .dic file.
For example,
in the .dic file:
car
banana
dog
flower
tar
So the dictionary would be something like:
3 letters: [car->dab->dog->tar]
6 letters: [banana->flower]
in the .pal file:
dog
flower
So the output would be:
dog in position 3
flower in position 2
My question is: What is the best data structure to implement this in C, so that it takes the least memory and time?
I was thinking of having a matrix in which each the first index (index1) corresponds to the number of letters in the word, and the second index (index2) corresponds to the first letter of the word I'm looking for. Each element of that matrix is a list of words with index1 letters and starting in letter index2.
Example:
| A | B | C | .....
_______________
1|list|list|list|
2|list|....|....|
3|...
.
.
So "dog" would be in a list inside matrix[3][D].
Problem 1: the matrix will have hole if there aren't words with all different number of letters or different first letters -> too much memory wasted?
Problem 2: to know the position I asked before I would have to sum up the number of elements of each list before the one I'm using.
Example: "dog" position would be
number of element in list [3][A]+number of element in list [3][B]+number of element in list [3][C]+"dog" position in the list
So when I inserted a word in a list, I would have to update the numbers of elements of the lists in the next matrix elements. -> time consuming?
So what do you think of this method? Do you have better ideas?

What is the best data structure to implement this in C, so that it takes the least memory and time?
It's difficult to get both least memory and least time. If you want to keep memory usages as low as possible, you'll need dynamic memory allocation which is expensive when considering time.
To get low memory usage, you could go for the following data structure:
#define MAX_WORD_LEN 50
char** dic[MAX_WORD_LEN];
You use it like this:
index 0: -----> char*, char*, char*, ... // Words with length 1
| | |
| | ------> string (i.e. char, '\0')
| |
| ------> string (i.e. char, '\0')
|
------> string (i.e. char, '\0')
index 1: -----> char*, char*, ... // Words with length 2
| |
| ------> string (i.e. char, char, '\0')
|
------> string (i.e. char, char, '\0')
This allows you to store a variable number of words for each length and you don't allocate more memory than needed for each string. It is like a matrix but the benefit is that each row can have different number of columns.
You will however need quite some dynamic memory handling, i.e. malloc, realloc and strdup.
To save some execution time you should grow the "char*, char*, char*, ..." array by some N larger than 1 and set the unused entries to NULL. That will save a lot of realloc but you'll need to keep track on the number of allocated elements in each row. That could call for something like:
struct x
{
char** data;
int number_allocated;
}
#define MAX_WORD_LEN 50
struct x dic[MAX_WORD_LEN];
If memory usage is real hot, you can avoid the "char*, char* ..." array and just use one big char array for each word length. Like:
index 0: -----> 'a', '\0', 'I', '\0', ...
index 1: -----> 'b', 'e', '\0', 't', 'o', '\0', ....
You can do this because all words in a char-array has the same length.
In this case you would have something like:
struct x
{
char* data;
int bytes_allocated;
int number_of_words;
}
#define MAX_WORD_LEN 50
struct x dic[MAX_WORD_LEN];

Related

How should you add elements to a multi dimensional array? (In C)

I'm working on a table football cup program (in C), where I have 16 people facing off to get to the final. I'm having trouble putting elements into the different elements of the array (which has sort of stopped my progress until I figure it out). I've searched on the internet (not extensively) about pointers, but I can't find anything on multi dimensional arrays.
I have 8 games, each with 2 participants, who each play 5 matches against each other. Hopefully that means I define the array as int lastSixteen[8][2][5]. All participants have a unique ID
Assuming I have declared my arrays correctly... On to the main question.
This is what I'm currently doing:
int i;
for(i=0; i<MAX_PLAYERS/2;i++){
roundOne[i] = i;
}
I want to set the first dimension of my array to be the numbers 1 through 8 incl. but I run into 'error: incompatible types in assignment'.
I tried setting the line with the assignment to be roundOne[i][][] = i; but as I expected, that didn't work either.
Later on in the program I need to set the second set of numbers to be the games participants to be the 16 participants (to keep it simple I'm doing it in ascending numerical order) so Game 1 is Player 1 and Player 2, Game 2 is Player 3 and 4 etc.
for(i=0; i<16; i++){
if(i % 2 != 0){
roundOne[(MAX_PLAYERS/2)-1][0] = i; /* puts 1,3,5,7,9,11,13,15 */
}
else{
roundOne[(MAX_PLAYERS/2)-1][1] = i; /* puts 2,4,6,8,10,12,14,16 */
}
}
I'm assuming the second part will be fixed by the answer to the first part since they return the same error, but I included it because I don't know.
A sample of code that has a minimal, Complete, and Verifiable example.
#include <stdio.h>
#define MAX_PLAYERS 16
int main(void){
int i;
int roundOne[8][2][5];
/* seeded in numerical order.*/
for(i=0; i<MAX_PLAYERS/2;i++){
roundOne[i] = i;
}
for(i=0; i<MAX_PLAYERS; i++){
if(i % 2 != 0){
roundOne[(MAX_PLAYERS/2)-1][0] = i;
}
else{
roundOne[(MAX_PLAYERS/2)-1][1] = i;
}
}
return 0;
}
Thanks in Advance,
Rinslep
You can't just use a multi dimensional array - it doesn't do what you want. And here is why: Lets say you have 8 games and 2 players (forget that there are 5 matches for a second) That means your multi dimensional array would have 16 spots:
Player
0 1
+---+---+
0 | | |
+---+---+
1 | | |
+---+---+
2 | | |
G +---+---+
a 3 | | |
m +---+---+
e 4 | | |
+---+---+
5 | | |
+---+---+
6 | | |
+---+---+
7 | | |
+---+---+
Now you want to put the game number in there AND you want to put the unique player IDs in there AND you might want to put other stuff in there (like who won and the score)? How are you going to do that? There are a couple choices:
The game number is the index into the array - not a value you store in the array. Now you can store the palyer IDs for each game in the array. But this still doesn't address storing other stuff (like who won and the score)
If the game number needs to be stored in the array (or other things like who won and the score) you will need to store more than one thing in the array so the array cannot hold ints - you need an array of structures.
It is hard to guess what the right data structure is because it depends on what your program is going to do, but I think I would do this:
typedef struct match
{
int score[2]; /* index 0 is player 1, index 1 is player 2 */
int winner; /* index into the player and score arrays (either 0 or 1) */
};
typedef struct game
{
int players[2]; /* index 0 is player 1, index 1 is player 2 */
match matches[5];
};
game games[8];
Now, the the game number (1-8) is just the index to games plus 1, the match number (1-5) is just the index to matches plus 1 and if you want to make unique player numbers that go from 1-16 you can do this:
i=1;
for(int g=0;g<8;g++)
for(int p=0;p<1;p++)
games[g].player[p]=i++;
You need to initialize the array with dynamic allocation.
How do I work with dynamic multi-dimensional arrays in C?
Think of it this way
A = [
[B],
[C],
[D],
...
]
So lets say we need an array round with 10 rows and each row has 20 columns. They will all be filled with integer values.
Option One - Dynamically Allocating
Define the number of buckets the array will have. We are taking the size of the pointer because each bucket will container a pointer/array that represents the inner array.
int** round;
round = malloc(10 * sizeof(int*))
Now that we have allocated the space for the buckets, go through and give space for the points. This one is just a normal integer so we take the sizeof(int).
for (int i = 0; i < 10; i++) {
round[i] = malloc(20* sizeof(int))
}
Option Two
We can define the size of the multidimensional array in a bit of an easier way. We know the number of rows and the number of columns. So alternatively we can allocate the space like this:
int* round;
round = malloc (10 * 20 * sizeof(int));
Both of these will produce the array round[10][20] with memory allocated for it. With C you can't add elements to an array on the fly if the size of the array is unknown, in my experience linked lists are better for this.
Edit: I see that you updated the question, this code can be used with a 3D array also. You can easily use option two as 3D like round = malloc(x * y * z * sizeof(int)), where x, y, and z are equal to the dimensional values. You can also modify option one to work with this also.

How can I map a character to a pointer of another type?

I have a char*. I want to parse it, character by character, and store the location of each in an int*.
With the dummy-string "abbcdc", the content should be as follows
char int*
-------------
'a' -> 0
'b' -> 1,2
'c' -> 3,5
'd' -> 4
I want this to be accessible through a char* containing the entire alphabet, so that each character in the alphabet-pointer points to each separate integer-pointer. This is where I'm lost.
I know I can point to a pointer using the double-asterisk syntax
int **a = &aLocations;
But I don't really know how to refer to the locations-pointer by using a character as a reference. I am pretty new to C, so all pointers (pun intended) are appreciated.
Update 1:
int *aLocations = malloc(3*sizeof(int));
aLocations[0] = 13;
aLocations[1] = 9;
aLocations[2] = 57;
int **a = &aLocations;
This seems to work as expected, but a obviously remains an integer, not a char. I was thinking of writing a function something along the lines of
int *getCharLocations(char c) {
// returns a pointer to the stored character locations
}
but I don't know how to proceed with implementing it.
Ok then.
Although it would be possible it would be pretty ugly and complicated.
So if you do not mind i would suggest to drop char and use integers exclusively.
It is possible since char is in fact just small integer.
So first you would need to create your two dimensional alphabet array:
int *alphabet[26]; // this will create 26 element array of integer pointers
Now we will fill it:
int i = 0;
for(i = 0; i < 26; i++) {
alphabet[i] = malloc(100 * sizeof(int)); //alloc memory for 100 integers (should be enough for human language if we're talking about single words here)
alphabet[i][0] = 'a' + i; // this will put a letter as first element of the array
alphabet[i][1] = 2 // here we will keep index of first available position in our indices array
}
So now we have array like this:
'a', 2, ... // '...' means here that we have space to fill here
'b', 2, ...
...
'z', 2, ...
And you can add indices of occurences of letter to such construction like this:
alphabet[index_of_letter][alphanet[index_of_letter][1]] = index_of_letter; //put index at the end of array
alphabet[index_of_letter][1]++; // increment our index of available position
That's pretty much it.
I didn't test it so it may need some polishing but such approach should do the trick.
PS.
Someone in comments above noted uppercase letters - in such case you would need to extend array to 52 characters to store also occurences of uppercase letters (also fill first element with uppercase letter in for loop for such records). But i guess you will manage from now on

Filter out a list of strings

Given a list of strings such as [boo,koo,kool]
You try to filter out the characters that occur in all strings and further you filter out characters that occur equal number of times so in the case above you would return oo
In my approach i was thinking of first making a struct for all unique letters in the first string and keep a count of them and then do a compare with every other string. i think it might be a overkill in terms of run time. can anyone suggest better approach?
You already have a struct of sorts to use, its the type 'char' which stores -128 to 128 values. Perhaps an array of ints for each word would do the trick, where you index the array with the char you have found in the string.
#define NUMWORDS 10 // assuming 10 words in list
int CountOfChars[NUMWORDS][256];
for each string n in list
{
for each char c in string n
{
CountOfChars[n][c]++;
}
}
THEN analyze each CountOfChars array to find the counts >=2
You can use a loop like this:
char SetFlag;
for each char c in the system // a - z, A - Z
{
SetFlag = 0;
if (CountOfChars[0][c]>1)
{
for each string n in list except the first word
{
if (CountOfChars[n][c]>1)
SetFlag = c;
else
SetFlag = 0;
}
}
if (SetFlag)
printf("%c",SetFlag); // prints a char found twice in all words
}
I have left this as psuedocode as it sounds like homework, hope this gets you started
You can solve this problem using TRIES. something relevant: here
For example consider your given scenario:
+1 +2
b k
\ |
\ |
o +3 ----|
| | ==> this has contiguous max frequency.
o +3 ----|
|
l +1
Since your array has only three string and count of contiguous string is 3. Therefore answer will be oo

Dynamic Allocation Two dimensional array of fixed-length strings

In a program, I have to scan from the input
scanf("%s",currWord);
a non-defined number of words, which come in a non-defined number of lines.
I want to put the words in a 2 dimensional array of strings.
Length of the strings is fixed [MAX_WORD_LEN+1]
My idea is:
int row=10 //10 lines for starting
int col=5 //5 words in each line for starting
int i;
typedef char word[MAX_WORD_LEN+1]; //new type of 11char lenght string
word** matrix; //2 dimensional array (pointers) with no memory
matrix = malloc(row*sizeof(word*)); //allocate row number of word* (word pointer) size
for(i=0;i<row;i++)
{
matrix[i] = malloc(col*sizeof(word)); //allocate col number of words to each row
}
So, I have no idea if that is right.
I'll be happy for some help and tips..
EDIT:
When receiving the words from input I have to increase memory ( number of rows and words in each row) if needed, How do I do that? (realloc ?)
I need to do the following:
Without going into details, the easiest way is to use a Matrix as a Linked-List of Linked-Lists ..
struct matrix_field
{
char data [11];
matrix_field * next_field;
};
struct matrix_row
{
matrix_field * first_field;
matrix_row * next_row;
};
struct matrix
{
matrix_node * first_row;
};
Your data will look like this in memory ..
[+]
|
v
[x]-->[a]-->[b]-->[c]-->
|
v
[y]-->[d]-->[e]-->[f]-->
|
v
[z]-->[g]-->[h]-->[i]-->
|
v
---------------
[+] matrix
[x] .. [z] matrix_row
[a] .. [i] matrix_field

Get all the divisors for a number in C

I'm trying to build a function that takes 1 param: the number as char[] and returns a char** with the divisors as strings.
I have come up with the following function, which works only for some numbers.
char** calc_div(char nr[100])
{
int nri,i,ct=0;
char **a = (char**)malloc(sizeof(char*));
nri = atoi(nr);
for(i=0;i<sizeof(char*);i++)
a[i] = (char*)malloc(sizeof(char));
for(i=1;i<=nri;i++)
if(nri % i == 0)
{
sprintf(a[ct++],"%d",i);
}
return a;
}
This works for numbers like 22, 33, 77 but not for 66 or 88 (it just gets stuck somewhere). Could anyone help me?
So many problems in such a small space...oh dear!
Let's think about the interface first...how does the calling code know how many values are returned? Presumably, there must be a null pointer at the end of the array of pointers. Also, for each number bigger than 1, we know that 1 and the number itself will be divisors, so we are going to need an array of at least 3 pointers returned. If a number is not prime or one, then there will be more values to push into the array. Therefore, one of the things we'll need to do is keep tabs on how many values are in the array. Also, the memory release code will need to step through the returned array, releasing each string before releasing the array overall.
So, we get some ideas about what the code should do. How does your code fare against this?
char** calc_div(char nr[100])
{
int nri,i,ct=0;
char **a = (char**)malloc(sizeof(char*));
This allocates one entry in the return array. We now know we need at least 3 times as much space, and we also have to keep a record of how much space was allocated.
nri = atoi(nr);
for(i=0;i<sizeof(char*);i++)
a[i] = (char*)malloc(sizeof(char));
This allocates 4 or 8 strings of size 1 byte each, assigning them to successive elements of the array of size 1 previously allocated. This is a guaranteed buffer overflow on the array a. Plus, because the strings are only big enough to hold the null at the end of string, you can't put any answers in there. You should probably be allocating strlen(nr)+1 bytes since nr is one of the numbers you'll need. It is not remotely clear that numbers are limited to either 3 or 7 factors (since you also need to allow for the terminating null pointer).
for(i=1;i<=nri;i++)
if(nri % i == 0)
{
sprintf(a[ct++],"%d",i);
}
The code inside the body of the if statement will have to be ready to do memory allocation for the new factor and for the array as and when necessary.
return a;
}
After
char **a = (char**)malloc(sizeof(char*));
a has space for 1 pointer to char ...
for(i=0;i<sizeof(char*);i++)
a[i] = (char*)malloc(sizeof(char));
but you try to write to more than that single element (unless sizeof(char*) happens to be 1).

Resources