I just started to learn C and came across the Insertion Sort algorithm. I am now trying to apply it to the alphabet letters. However, the challenge I face is that I want the program to sort out the letters independently of whether or not they are caps or not.
For example, 'd' should come before 'E' in the sorted list. Here's what I have so far.
Thank you for the help.
#include <stdio.h>
#define MAX_NUMS 10
void InsertionSort(char list[]);
int main()
{
int index = 0; /* iteration variable */
char letters[MAX_NUMS]; /* list of number to be sorted */
/* Get input */
printf("Enter a word: ");
scanf("%s", letters);
printf("%s\n", letters);
InsertionSort(letters); /* Call sorting routine */
printf("pass");
/* Print sorted list */
printf("\nThe input set, in ascending order:\n");
while (letters[index] != '\0') {
printf("%c\n", letters[index]);
index += 1;
}
/* printf("%s", letters);
for (index = 0; index < MAX_NUMS; index++)
printf("%c\n", letters[index]); */
}
void InsertionSort(char list[])
{
int unsorted; /* index for unsorted list items */
int sorted; /* index for sorted items */
char unsortedItem; /* Current item to be sorted */
char lowUnsortedItem;
/* This loop iterates from 1 thru MAX_NUMS */
for (unsorted = 0; list[unsorted] != '\0'; unsorted++) {
unsortedItem = list[unsorted];
if (unsortedItem >= 'A' && unsortedItem <= 'Z') {
lowUnsortedItem = unsortedItem + 32;
}
/* This loop iterates from unsorted thru 0, unless
we hit an element smaller than current item */
for (sorted = unsorted - 1;
(sorted >= 0) && (list[sorted] > unsortedItem);
sorted--)
list[sorted + 1] = list[sorted];
list[sorted + 1] = unsortedItem; /* Insert item */
}
}
I think you need to use tolower() from ctype.h to obtain lowercase variant of each character and then compare those lowercase variants.
i.e. like:
lowUnsortedItem = tolower(unsortedItem)
...
for ( ...; tolower(list[sorted]) > lowUnsortedItem)
...
Improved answer: My previous attempt contained a bug and did not include the (later) requirement that, e.g., A should come before a. I have rewritten therefore my answer.
Since we have a more complex sorting condition than merely "less than" according to the ASCII character set, let's make the code more modular by writing a function that sorts and a function that returns whether two characters are sorted correctly (the function isLessThan below). If, at a later stage, we'd like to compare different types or use a different ordering, we can simply use a different comparison function. Alternative, we could also use this comparison function with another sorting algorithm. Also, see the C standard library's qsort which uses a similar modular approach.
To use boolean values, include the stdbool.h header. It is more convenient to use library functions to detect if a character is uppercase or lowercase, and to change between the two. Therefore, include ctype.h. Note that these functions accept int's! Be sure to only give them characters as input, otherwise their behavior is undefined.
Modified code snippet:
bool isLessThan(char lhs, char rhs)
{
if (tolower(lhs) == tolower(rhs))
return isupper(lhs) || islower(rhs);
return tolower(lhs) < tolower(rhs);
}
void InsertionSort(char list[])
{
for (size_t unsortedIdx = 1U; list[unsortedIdx] != '\0'; ++unsortedIdx)
{
char unsortedItem = list[unsortedIdx];
size_t sortedIdx = unsortedIdx;
for ( ; sortedIdx > 0U && !isLessThan(list[sortedIdx - 1U], unsortedItem); --sortedIdx)
list[sortedIdx] = list[sortedIdx - 1U];
list[sortedIdx] = unsortedItem; // Insert item
}
}
The comparison function first checks for the special property we'd like to have, e.g. A comes before a. Otherwise, we make sure both characters are lowercase and compare them.
The insertion sort algorithm now only contains the "insertion sorting logic". The indices used have been changed slightly to accommodate using the unsigned type size_t for indices, which is a good practice.
Some additional advice: scanf can be unsafe w.r.t. buffer overflows. It is best replaced by fgets and sscanf.
Related
I wrote a program that counts and prints the number of occurrences of elements in a string but it throws a garbage value when i use fgets() but for gets() it's not so.
Here is my code:
#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int main() {
char c[1005];
fgets(c, 1005, stdin);
int cnt[26] = {0};
for (int i = 0; i < strlen(c); i++) {
cnt[c[i] - 'a']++;
}
for (int i = 0; i < strlen(c); i++) {
if(cnt[c[i]-'a'] != 0) {
printf("%c %d\n", c[i], cnt[c[i] - 'a']);
cnt[c[i] - 'a'] = 0;
}
}
return 0;
}
This is what I get when I use fgets():
baaaabca
b 2
a 5
c 1
32767
--------------------------------
Process exited after 8.61 seconds with return value 0
Press any key to continue . . . _
I fixed it by using gets and got the correct result but i still don't understand why fgets() gives wrong result
Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:
Always sanitize your inputs
What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:
Exploits of a Mom #xkcd.com
It is always worth reading the explanation even if you get it already #explainxkcd.com
medium.com wrote an article about “How Little Bobby Tables Ruined the Internet”
Heck, Bobby’s even got his own website — bobby-tables.com
Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.
For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” #makeuseof.com — a good number of which were caused by failure to process bad input!
Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!
Where your code went wrong
Using gets() is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?
But it does remove the Enter key from the end of the user’s input!
fgets(), in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.
Hence, assuming the user types: b a n a n a Enter
gets() returns the string "banana"
fgets() returns the string "banana\n"
That newline character '\n' (what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!
The Fix
The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:
// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";
Now we want to create a mapping from each letter in letters[] to an array of integers (its name doesn’t matter, but we’re calling it count[]). Let’s wrap that up in a little function:
// Here is our mapping of letters[] ←→ integers[]
// • supply a valid input → get an integer unique to that specific input
// • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
static int fooey; // number of invalid inputs
static int count[sizeof(letters)] = {0}; // numbers of each valid input 'a'..'z'
const char * p = strchr(letters, c); // find the valid input, else NULL
if (p) {
int index = p - letters; // 'a'=0, 'b'=1, ... (same order as in letters[])
return &count[index]; // VALID INPUT → the corresponding integer in count[]
}
else return &fooey; // INVALID INPUT → returns a dummy integer
}
For the more astute among you, this is rather verbose: we can totally get rid of those fooey and index variables.
“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”
Easy. Just check that your character is in range:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
return &fooey;
}
“But EBCDIC...!”
Fine. The following will work with both EBCDIC and ASCII:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
return &fooey;
}
You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.Prove me wrong.
Back to main()
Before we forget, stick the required magic at the top of your program:
#include <stdio.h>
#include <string.h>
Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.
int main() {
// Ask for and get user input
char s[1005];
printf("s? ");
fgets(s, 1005, stdin);
// Histogram the input
for (int i = 0; i < strlen(s); i++) {
*histogram(s[i]) += 1;
}
// Print out the histogram, not printing zeros
for (int i = 0; i < strlen(letters); i++) {
if (*histogram(letters[i])) {
printf("%c %d\n", letters[i], *histogram(letters[i]));
}
}
return 0;
}
We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[] array! Win-win!
s? a - ba na na !
a 4
b 1
n 2
But wait, there’s more!
We can totally reuse our histogram. Check out this little function:
// Reset the histogram to all zeros
//
void clear_histogram(void) {
for (const char * p = letters; *p; p++)
*histogram(*p) = 0;
}
All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!
Other ways you could handle input is to transform things into acceptable values. For example you can use tolower() to convert any majuscule letters to your histogram’s input set.
s? ba na NA!
a 3
b 1
n 2
But I digress again...
Hang in there!
I'm trying to create a complete C program to read ten alphabets and display them on the screen. I shall also have to find the number of a certain element and print it on the screen.
#include <stdio.h>
#include <conio.h>
void listAlpha( char ch)
{
printf(" %c", ch);
}
int readAlpha(){
char arr[10];
int count = 1, iterator = 0;
for(int iterator=0; iterator<10; iterator++){
printf("\nAlphabet %d:", count);
scanf(" %c", &arr[iterator]);
count++;
}
printf("-----------------------------------------");
printf("List of alphabets: ");
for (int x=0; x<10; x++)
{
/* I’m passing each element one by one using subscript*/
listAlpha(arr[x]);
}
printf("%c",arr);
return 0;
}
int findTotal(){
}
int main(){
readAlpha();
}
The code should be added in the findTotal() element. The output is expected as below.
Output:
List of alphabets : C C C A B C B A C C //I've worked out this part.
Total alphabet A: 2
Total alphabet B: 2
Total alphabet C: 6
Alphabet with highest hit is C
I use an array to count the number of the existence of each character,
I did this code but the display of number of each character is repeated in the loop
int main()
{
char arr[100];
printf("Give a text :");
gets(arr);
int k=strlen(arr);
for(int iterator=0; iterator<k; iterator++)
{
printf("[%c]",arr[iterator]);
}
int T[k];
for(int i=0;i<k;i++)
{
T[i]=arr[i];
}
int cpt1=0;
char d;
for(int i=0;i<k;i++)
{int cpt=0;
for(int j=0;j<k;j++)
{
if(T[i]==T[j])
{
cpt++;
}
}
if(cpt>cpt1)
{
cpt1=cpt;
d=T[i];
}
printf("\nTotal alphabet %c : %d \n",T[i],cpt);
}
printf("\nAlphabet with highest hit is : %c\n",d,cpt1);
}
There is no way to get the number of elements You write in an array.
Array in C is just a space in the memory.
C does not know what elements are actual data.
But there are common ways to solve this problem in C:
as mentioned above, create an array with one extra element and, fill the element after the last actual element with zero ('\0'). Zero means the end of the actual data. It is right if you do not wish to use '\0' among characters to be processed. It is similar to null-terminated strings in C.
add the variable to store the number of elements in an array. It is similar to Pascal-strings.
#include <stdio.h>
#include <string.h>
#define ARRAY_SIZE 10
char array[ARRAY_SIZE + 1];
int array_len(char * inp_arr) {
int ret_val = 0;
while (inp_arr[ret_val] != '\0')
++ret_val;
return ret_val;
}
float array_with_level[ARRAY_SIZE];
int array_with_level_level;
int main() {
array[0] = '\0';
memcpy(array, "hello!\0", 7); // 7'th element is 0
printf("array with 0 at the end\n");
printf("%s, length is %d\n", array, array_len(array));
array_with_level_level = 0;
const int fill_level = 5;
int iter;
for (iter = 0; iter < fill_level; ++iter) {
array_with_level[iter] = iter*iter/2.0;
}
array_with_level_level = iter;
printf("array with length in the dedicated variable\n");
for (int i1 = 0; i1 < array_with_level_level; ++i1)
printf("%02d:%02.2f ", i1, array_with_level[i1]);
printf(", length is %d", array_with_level_level);
return 0;
}
<conio.h> is a non-standard header. I assume you're using Turbo C/C++ because it's part of your course. Turbo C/C++ is a terrible implementation (in 2020) and the only known reason to use it is because your lecturer made you!
However everything you actually use here is standard. I believe you can remove it.
printf("%c",arr); doesn't make sense. arr will be passed as a pointer (to the first character in the array) but %c expects a character value. I'm not sure what you want that line to do but it doesn't look useful - you've listed the array in the for-loop.
I suggest you remove it. If you do don't worry about a \0. You only need that if you want to treat arr as a string but in the code you're handling it quite validly as an array of 10 characters without calling any functions that expect a string. That's when it needs to contain a 0 terminator.
Also add return 0; to the end of main(). It means 'execution successful' and is required to be conformant.
With those 3 changes an input of ABCDEFGHIJ produces:
Alphabet 1:
Alphabet 2:
Alphabet 3:
Alphabet 4:
Alphabet 5:
Alphabet 6:
Alphabet 7:
Alphabet 8:
Alphabet 9:
Alphabet 10:-----------------------------------------List of alphabets: A B C D E F G H I J
It's not pretty but that's what you asked for and it at least shows you've successfully read in the letters. You may want to tidy it up...
Remove printf("\nAlphabet %d:", count); and insert printf("\nAlphabet %d: %c", count,arr[iterator]); after scanf(" %c", &arr[iterator]);.
Put a newline before and after the line of minus signs (printf("\n-----------------------------------------\n"); and it looks better to me.
But that's just cosmetics. It's up to you.
There's a number of ways to find the most frequent character. But at this level I recommend a simple nested loop.
Here's a function that finds the most common character (rather than the count of the most common character) and if there's a tie (two characters with the same count) it returns the one that appears first.
char findCommonest(const char* arr){
char commonest='#'; //Arbitrary Bad value!
int high_count=0;
for(int ch=0;ch<10;++ch){
const char counting=arr[ch];
int count=0;
for(int c=0;c<10;++c){
if(arr[c]==counting){
++count;
}
}
if(count>high_count){
high_count=count;
commonest=counting;
}
}
return commonest;
}
It's not very efficient and you might like to put some printfs in to see why!
But I think it's at your level of expertise to understand. Eventually.
Here's a version that unit-tests that function. Never write code without a unit test battery of some kind. It might look like chore but it'll help debug your code.
https://ideone.com/DVy7Cn
Footnote: I've made minimal changes to your code. There's comments with some good advice that you shouldn't hardcode the array size as 10 and certainly not litter the code with that value (e.g. #define ALPHABET_LIST_SIZE (10) at the top).
I have used const but that may be something you haven't yet met. If you don't understand it and don't want to learn it, remove it.
The terms of your course will forbid plagiarism. You may not cut and paste my code into yours. You are obliged to understand the ideas and implement it yourself. My code is very inefficient. You might want to do something about that!
The only run-time problem I see in your code is this statement:
printf("%c",arr);
Is wrong. At this point in your program, arr is an array of char, not a single char as expected by the format specifier %c. For this to work, the printf() needs to be expanded to:
printf("%c%c%c%c%c%c%c%c%c%c\n",
arr[0],arr[1],arr[2],arr[3],arr[4],
arr[5],arr[6],arr[7],arr[8],arr[9]);
Or: treat arr as a string rather than just a char array. Declare arr as `char arr[11] = {0};//extra space for null termination
printf("%s\n", arr);//to print the string
Regarding this part of your stated objective:
"I shall also have to find the number of a certain element and print it on the screen. I'm new to this. Please help me out."
The steps below are offered to modify the following work
int findTotal(){
}
Change prototype to:
int FindTotal(char *arr);
count each occurrence of unique element in array (How to reference)
Adapt above reference to use printf and formatting to match your stated output. (How to reference)
I have a few questions about an assignment that i need to do. It might seem that what im looking for is to get the code, however, what im trying to do is to learn because after weeks of searching for information im lost. Im really new atC`.
Here is the assignment :
Given 3 files (foo.txt , bar.txt , foo2.txt) they all have a different amount of words (I need to use dynamic memory).
Create a program that ask for a word and tells you if that word is in any of the documents (the result is the name of the document where it appears).
Example :
Please enter a word: dog
"dog" is in foo.txt and bar.txt
(I guess i need to load the 3 files, create a hash table that has the keyvalues for every word in the documents but also has something that tells you which one is the document where the word is at).
I guess i need to implement:
A Hash Function that converts a word into a HashValue
A Hash Table that stores the HashValue of every word (But i think i should also store the document index?).
Use of dynamic allocation.
Check for collisions while im inserting values into the hash table (Using Quadratic Probing and also Chaining).
Also i need to know how many times the word im looking for appears in the text.
I've been searching about hashmaps implementations, hash tables , quadratic probing, hash function for strings...but my head is a mess right now and i dont really now from where i should start.
so far i`ve read :
Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?
Implementing with quadratic probing
Does C have hash/dictionary data structure?
https://gist.github.com/tonious/1377667
hash function for string
http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)HashTables.html?highlight=(CategoryAlgorithmNotes)
https://codereview.stackexchange.com/questions/115843/dictionary-implementation-using-hash-table-in-c
Sorry for my english in advance.
Hope you can help me.
Thanks.
FIRST EDIT
Thanks for the quick responses.
I'm trying to put all together and try something, however #Shasha99 I cannot use the TRIE data structure, i'm checking the links you gave me.
#MichaelDorgan Thanks for posting a solution for beginners however i must use Hashing (It's for Algorithms and Structures Class) and the teacher told us we MUST implement a Hash Function , Hash Table and probably another structure that stores important information.
After thinking for an hour I tried the following :
A Structure that stores the word, the number of documents where it appears and the index of those documents.
typedef struct WordMetadata {
char* Word;
int Documents[5];
int DocumentsCount;
} WordMetadata;
A function that Initializes that structure
void InitTable (WordMetadata **Table) {
Table = (WordMetadata**) malloc (sizeof(WordMetadata) * TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
Table[i] = (WordMetadata*) NULL;
}
}
A function that Loads to memory the 3 documents and index every word inside the hash table.
A function that index a word in the mentioned structure
A function that search for the specific word using Quadratic Probing (If i solve this i will try with the chaining one...).
A function that calculates the hash value of a word (I think i will use djb2 or any of the ones i found here http://www.cse.yorku.ca/~oz/hash.html) but for now :
int Hash (char *WordParam) {
for (int i = 0; *WordParam != '\0';) {
i += *WordParam++;
}
return (i % TABLESIZE);}
EDIT 2
I tried to implement something, its not working but would take a look and tell me what is wrong (i know the code is a mess)
EDIT 3
This code is properly compiling and running, however , some words are not finded (maybe not indexed i' dont know), i'm thinking about moving to another hashfunction as i mentioned in my first message.
Approximately 85% of the words from every textfile (~ 200 words each) are correctly finded by the program.
The other ones are ramdom words that i think are not indexed correctly or maybe i have an error in my search function...
Here is the current (Fully functional) code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLESIZE 4001
#define LINESIZE 2048
#define DELIMITER " \t"
typedef struct TTable {
char* Word; /* The actual word */
int Documents[5]; /* Documents Index */
int DocumentsCount; /* Number of documents where the word exist */
} TTable;
int Hash (char *Word);
void Index (TTable **HashTable, char* Word, int DocumentIndex);
int Search (TTable **HashTable, char* Word);
int mystrcmp(char *s1, char *s2);
char* Documents[] = {"foo.txt","bar.txt","foo2.txt",NULL};
int main() {
FILE* file;
TTable **HashTable
int DocumentIndex;
char Line[LINESIZE];
char* Word;
char* Tmp;
HashTable = (TTable**) malloc (sizeof(TTable)*TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
HashTable[i] = (TTable*) NULL;
}
for (DocumentIndex = 0; Documents[DocumentIndex] != NULL; DocumentIndex++) {
file = fopen(Documents[DocumentIndex],"r");
if (file == NULL) {
fprintf(stderr, "Error%s\n", Documents[DocumentIndex]);
continue;
}
while (fgets (Line,LINESIZE,file) != NULL) {
Line[LINESIZE-1] = '\0';
Tmp = strtok (Line,DELIMITER);
do {
Word = (char*) malloc (strlen(Tmp)+1);
strcpy(Word,Tmp);
Index(HashTable,Word,DocumentIndex);
Tmp = strtok(NULL,DELIMITER);
} while (Tmp != NULL);
}
fclose(file);
}
printf("Enter the word:");
fgets(Line,100,stdin);
Line[strlen(Line)-1]='\0'; //fgets stores newline as well. so removing newline.
int i = Search(HashTable,Line);
if (i != -1) {
for (int j = 0; j < HashTable[i]->DocumentsCount; j++) {
printf("%s\n", Documents[HashTable[i]->Documents[j]]);
if ( j < HashTable[i]->DocumentsCount-1) {
printf(",");
}
}
}
else {
printf("Cant find word\n");
}
for (i = 0; i < TABLESIZE; i++) {
if (HashTable[i] != NULL) {
free(HashTable[i]->Word);
free(HashTable[i]);
}
}
return 0;
}
/* Theorem: If TableSize is prime and ? < 0.5, quadratic
probing will always find an empty slot
*/
int Search (TTable **HashTable, char* Word) {
int Aux = Hash(Word);
int OldPosition,ActualPosition;
ActualPosition = -1;
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (HashTable[ActualPosition] == NULL) {
return -1;
}
if (strcmp(Word,HashTable[ActualPosition]->Word) == 0) {
return ActualPosition;
}
}
return -1; // Word not found
}
void Index (TTable **HashTable, char* Word, int DocumentIndex) {
int Aux; //Hash value
int OldPosition, ActualPosition;
if ((ActualPosition = Search(HashTable,Word)) != -1) {
for (int j = 0; j < HashTable[ActualPosition]->DocumentsCount;j++) {
if(HashTable[ActualPosition]->Documents[j] == DocumentIndex) {
return;
}
}
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount] = DocumentIndex; HashTable[ActualPosition]->DocumentsCount++;
return;
}
ActualPosition = -1;
Aux = Hash(Word);
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (OldPosition == ActualPosition) {
break;
}
if (HashTable[ActualPosition] == NULL) {
HashTable[ActualPosition] = (TTable*)malloc (sizeof(TTable));
HashTable[ActualPosition]->Word = Word;
HashTable[ActualPosition]->Documents[0] = DocumentIndex;
HashTable[ActualPosition]->DocumentsCount = 1;
return;
}
}
printf("No more free space\n");
}
int Hash (char *Word) {
int HashValue;
for (HashValue = 0; *Word != '\0';) {
HashValue += *Word++;
}
return (HashValue % TABLESIZE);
}
I would suggest you to use TRIE data structure for storing strings present in all three files in memory as Hash would be more space consuming.
As the first step you should read all three files one by one and for each word in file_i, you should do the following:
if the word is already present in TRIE, append the file index to that node or update the word count relative to that particular file. You may need 3 variables for file1, file and file3 at each node to store the values of word count.
if the word is not present, add the word and the file index in TRIE node.
Once you are done with building your TRIE, checking whether the word is present or not would be an O(1) operation.
If you are going with Hash Tables, then:
You should start with how to get hash values for strings.
Then read about open addressing, probing and chaining
Then understand the problems in open addressing and chaining approaches.
How will you delete and element in hash table with open addressing and probing ? here
How will the search be performed in case of chaining ? here
Making a dynamic hash table with open addressing ? Amortized analysis here and here.
Comparing between chaining and open addressing. here.
Think about how these problems can be resolved. May be TRIE ?
Problem in the code of your EDIT 2:
An outstanding progress from your side !!!
After a quick look, i found the following problems:
Don't use gets() method, use fgets() instead So replace:
gets(Line);
with the following:
fgets(Line,100,stdin);
Line[strlen(Line)-1]='\0'; //fgets stores newline as well. so removing newline.
The line:
if ( j < HashTable[j]->DocumentsCount-1){
is causing segmentation fault. I think you want to access HashTable[i]:
if ( j < HashTable[i]->DocumentsCount-1){
In the line:
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount];
You were supposed to assign some value. May be this:
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount] = DocumentIndex;
Malloc returns void pointer. You should cast it to the appropriate
one:
HashTable[ActualPosition] = (TTable*)malloc (sizeof(TTable));
You should also initialize the Documents array with default value while creating a new node in Hash:
for(j=0;j<5;j++)HashTable[ActualPosition]->Documents[j]=-1;
You are removing everything from your HashTable after finding the
first word given by user. May be you wanted to place that code outside
the while loop.
Your while loop while(1) does not have any terminating condition, You
should have one.
All the best !!!
For a school assignment, you probably don't need to worry about hashing. For a first pass, you can just get away with a straight linear search instead:
Create 3 pointers to char arrays (or a char ** if you prefer), one for each dictionary file.
Scan each text/dictionary file to see how many individual words reside within it. Depending on how the file is formatted, this may be spaces, strings, newlines, commas, etc. Basically, count the words in the file.
Allocate an array of char * times the word count in each file and store it in the char ** for that file. (if 100 words found in the file , num_words=100; fooPtr = malloc(sizeof(char *) * num_words);
Go back through the file a second time and allocate an array of chars to the size of each word in the file and store it in the previously created array. You now have a "jagged 2D array" for every word in each dictionary file.
Now, you have 3 arrays for your dictionaries and can use them to scan for words directly.
When given a word, setup a for loop to look through each file's char array. if the entered word matches with the currently scanned dictionary, you have found a match and should print the result. Once you have scanned all dictionaries, you are done.
Things to make it faster:
Sort each dictionary, then you can binary search them for matches (O(log n)).
Create a hash table and add each string to it for O(1) lookup time (This is what most professional solutions would do, which is why you found so many links on this.)
I've offered almost no code here, just a method. Give it a shot.
One final note - even if you decide to use a the hash method or a list or whatever, the code you write with arrays will still be useful.
I'm looking for a way to check if a specific string exists in a large array of strings. The array is multi-dimensional: all_strings[strings][chars];. So essentially, this array is an array of character arrays. Each character array ends in '\0'
Given another array of characters, I need to check to see if those characters are already in all_strings, kind of similar to the python in keyword.
I'm not really sure how to go about this at all, I know that strcmp might help but I'm not sure how I could implement it.
As lurker suggested, the naive method is to simply loop on the array of strings calling strcmp. His string_in function is unfortunately broken due to a misunderstanding regarding sizeof(string_list), and should probably look like this:
#include <string.h>
int string_in(char *needle, char **haystack, size_t haystack_size) {
for (size_t x = 0; x < haystack_size; x++) {
if (strcmp(needle, haystack[x]) == 0) {
return 1;
}
}
return 0;
}
This is fairly inefficient, however. It'll do if you're only going to use it once in a while, particularly on a small collection of strings, but if you're looking for an efficient way to perform the search again and again, changing the search query for each search, the two options I would consider are:
If all_strings is relatively static, you could sort your array like so: qsort(all_strings, strings, chars, strcmp);... Then when you want to determine whether a word is present, you can use bsearch to execute a binary search like so: char *result = bsearch(search_query, all_strings, strings, chars, strcmp);. Note that when all_strings changes, you'll need to sort it again.
If all_strings changes too often, you'll probably benefit from using some other data structure such as a trie or a hash table.
Use a for loop. C doesn't have a built-in like Python's in:
int i;
for ( i = 0; i < NUM_STRINGS; i++ )
if ( strcmp(all_strings[i], my_other_string) == 0 )
break;
// Here, i is the index of the matched string in all_strings.
// If i == NUM_STRINGS, then the string wasn't found
If you want it to act like Python's in, you could make it a function:
// Assumes C99
#include <string.h>
#include <stdbool.h>
bool string_in(char *my_str, char *string_list[], size_t num_strings)
{
for ( int i = 0; i < num_strings; i++ )
if (strcmp(my_str, string_list[i]) == 0 )
return true;
return false;
}
You could simply check if a string exists in an array of strings. A better solution might be to actually return the string:
/*
* haystack: The array of strings to search.
* needle: The string to find.
* max: The number of strings to search in "haystack".
*/
char *
string_find(char **haystack, char *needle, size_t max)
{
char **end = haystack + max;
for (; haystack != end; ++haystack)
if (strcmp(*haystack, needle) == 0)
return *haystack;
return NULL;
}
If you're wanting the behavior of a set, where all strings in the array are unique, then you can use it that way:
typedef struct set_strings {
char **s_arr;
size_t count;
size_t max;
} StringSet;
.
.
.
int
StringSet_add(StringSet *set, const char *str)
{
// If string exists already, the add operation is "successful".
if (string_find(set->s_arr, str, set->count))
return 1;
// Add string to set and return success if possible.
/*
* Code to add string to StringSet would go here.
*/
return 1;
}
If you want to actually do something with the string, you can use it that way too:
/*
* Reverse the characters of a string.
*
* str: The string to reverse.
* n: The number of characters to reverse.
*/
void
reverse_str(char *str, size_t n)
{
char c;
char *end;
for (end = str + n; str < --end; ++str) {
c = *str;
*str = *end;
*end = c;
}
}
.
.
.
char *found = string_find(words, word, word_count);
if (found)
reverse_str(found, strlen(found));
As a general-purpose algorithm, this is reasonably useful and even can be applied to other data types as necessary (some re-working would be required of course). As pointed out by undefined behaviour's answer, it won't be fast on large amounts of strings, but it is good enough for something simple and small.
If you need something faster, the recommendations given in that answer are good. If you're coding something yourself, and you're able to keep things sorted, it's a great idea to do that. This allows you to use a much better search algorithm than a linear search. The standard bsearch is great, but if you want something suitable for fast insertion, you'd probably want a search routine that would provide you with the position to insert a new item to avoid searching for the position after bsearch returns NULL. In other words, why search twice when you can search once and accomplish the same thing?
I'm trying to implement Boyer-Moore Algorithm in C for searching a particular word in .pcap file. I have referenced code from http://ideone.com/FhJok5. I'm using this code as it is.
Just I'm passing packet as string and the keyword I'm searching for to the function search() in it. When I'm running my code it is giving different values every time. Some times its giving correct value too. But most of times its not identifying some values.
I have obtained results from Naive Algo Implementation. Results are always perfect.
I am using Ubuntu 12.0.4 over VMware 10.0.1. lang: C
My question is It has to give the same result every time right? whether right or wrong. This output keeps on changing every time i run the file on same inputs; and during several runs, it gives correct answer too. Mostly the value is varying between 3 or 4 values.
For Debugging I did so far:
passed strings in stead of packet every time, Its working perfect and same and correct value every time.
checking pcap part, I can see all packets are being passed to the function (I checked by printing packet frame no).
same packets I am sending to Naive Algo code, its giving perfect code.
Please give me some idea, what can be the issue. I suspect some thing wrong with memory management. but how to find which one?
Thanks in advance.
# include <limits.h>
# include <string.h>
# include <stdio.h>
# define NO_OF_CHARS 256
// A utility function to get maximum of two integers
int max (int a, int b) { return (a > b)? a: b; }
// The preprocessing function for Boyer Moore's bad character heuristic
void badCharHeuristic( char *str, int size, int badchar[NO_OF_CHARS])
{
int i;
// Initialize all occurrences as -1
for (i = 0; i < NO_OF_CHARS; i++)
badchar[i] = -1;
// Fill the actual value of last occurrence of a character
for (i = 0; i < size; i++)
badchar[(int) str[i]] = i;
}
/* A pattern searching function that uses Bad Character Heuristic of
Boyer Moore Algorithm */
void search( char *txt, char *pat)
{
int m = strlen(pat);
int n = strlen(txt);
int badchar[NO_OF_CHARS];
/* Fill the bad character array by calling the preprocessing
function badCharHeuristic() for given pattern */
badCharHeuristic(pat, m, badchar);
int s = 0; // s is shift of the pattern with respect to text
while(s <= (n - m))
{
int j = m-1;
/* Keep reducing index j of pattern while characters of
pattern and text are matching at this shift s */
while(j >= 0 && pat[j] == txt[s+j])
j--;
/* If the pattern is present at current shift, then index j
will become -1 after the above loop */
if (j < 0)
{
printf("\n pattern occurs at shift = %d", s);
/* Shift the pattern so that the next character in text
aligns with the last occurrence of it in pattern.
The condition s+m < n is necessary for the case when
pattern occurs at the end of text */
s += (s+m < n)? m-badchar[txt[s+m]] : 1;
}
else
/* Shift the pattern so that the bad character in text
aligns with the last occurrence of it in pattern. The
max function is used to make sure that we get a positive
shift. We may get a negative shift if the last occurrence
of bad character in pattern is on the right side of the
current character. */
s += max(1, j - badchar[txt[s+j]]);
}
}
/* Driver program to test above function */
int main()
{
char txt[] = "ABAAAABAACD";
char pat[] = "AA";
search(txt, pat);
return 0;