iterating string in C, word by word - c

I just started learning C. What I am trying to right now is that I have two strings in which each word is separated by white spaces and I have to return the number of matching words in both strings. So, is there any function in C where I can take each word and compare it to everyother word in another string, if not any idea on how I can do that.

Break up the first string in words, this you can do in any number of ways everything from looping through the character array inserting \0 at each space to using strtok.
For each word found, go through the other string using strstr which checks if a string is in there. just check return value from strstr, if != NULL it found it.

I'd not use strtok but stick with pointer arithmetics length comparison and memcmp to compare strings of equal length.

There are two problems here:
1) splitting each string into words
The strtok() function can split a string into words.
It is a meaningful exercise to imagine how you might write your own equivalent to strtok.
The rosetta project shows both a strtok and a custom method approach to precisely this problem.
I would naturally write my own parser, as its the kind of code that appeals to me. It could be a fun exercise for you.
2) finding those words in one string that are also in another
If you iterate over each word in one string for each word in another, it has O(n*n) complexity.
If you index the words in one string it will take just O(n) which is substantially quicker (if your input is large enough to make this interesting). It is worth imagining how you might build a hashtable of the words in one string so that you can look for the words in the other.

Related

How to compare two strings and return the number of words that are the same?

I am making a code in C, and I have not gotten an efficient way to make this comparison, if someone could help me I would be very grateful.
EXAMPLE:
W1: Big house with white walls
W2: house walls
return: 2
When thinking how to solve the problem, it helps to break it down into steps that you can see clearly how to proceed, and how they progress toward the solution. The usual approach for something like this would be
make a copy of each string
for each string, make an array of pointers to char*, which
point into the copied strings, which in turn
parsed into "words" (putting '\0' at all of the non-word characters
run qsort on the array of strings
Then, having two sorted arrays of pointers to words, you can write a loop using strcmp for checking equality of the words. I suggested strcmp because (since the arrays are sorted) it is simple to check when a word is missing from one or the other of the two arrays versus the other.
The copy/parse/sort part would naturally be a function, given a string and returning the array of pointers. The caller should free that (and the chopped-up string to which it points).

How to sort this specific string array by using the following set of rules(?):

Yes, I'm a newbie and my uncle has challenged me to use the function:
void sortStrings(char str[], const char* delim){...}
to sort the given char array str in a way that every char from delim that shows up in str will separate a group of chars in str thus making them the words you need to sort by hex value. In the process I also need to replace those word separating chars from delim with ';'.
The rules are: I may only use the library <stdio.h> and I can't use malloc/realloc.
Apparently this should be done with an O notation of n^2 (n being the amount of words in str, not chars)
and here's an example of an input and output:
input:
char str[] = "aaa*test,hello.world*abcd.zzz";
sortDelim(str, ",.*")
output: str is now: "aaa;abcd;hello;test;world;zzz"
Well, I've managed it now eventually, the bubble sort thing kinda helped so tyvm for that :)
note: I'll leave this thread here in case anyone wants to take this challenge on himself? It ain't easy I promise :P If you think I should delete it or add the finished code then just ask(Please don't deduct my rep even more ><)
You are off to a good start. After your second for loop you have
replaced all delimiters with semicolon
you know how many words there are, size
you know how many letters each word has, letters
One observation is that you have allocated letters to be 1000 entries. That seems like enough, but is it really? How do you know how many words there are in str? You don't. And you can't use malloc to allocate dynamically, so perhaps you need to look for an algorithm that doesn't require that lookup table? You need an algorithm that works in place
http://en.wikipedia.org/wiki/In-place_algorithm
What's next? You need to sort the words. There are many sorting algorithms. You want something simple, and are allowed complexity O(n^2). Here's a list of sorting algorithms:
http://en.wikipedia.org/wiki/Sorting_algorithm
Notice that in the table, under 'other notes' it tells you that some algorithms are "Tiny code size", that sounds good. Sort the table first by 'other notes' then by 'Average' complexity (click on the triangle in the column header). You now have two algorithms that use a method 'exchanging' (that means in place), with Tiny code size and average complexity O(n^2), these wikipedia links explain how they work, and include pseudocode to get you started:
Bubble sort
Gnome sort
Take your pick and try your hand at it.
Hint: you probably want a subroutine that exchanges (swaps) two consecutive words if the first word compares greater than the second word. This can be done in place.
Suppose you have
abcd;aaa
The first word is greater than the second word, you need to detect that, and swap the words so you end up with
aaa;abcd
Here's a diagram that will give you the general idea.

Trimming a char array in C

I am working on an assignment and I have been noticing a problem in my coding assignment. It is not clear to me how to tackle this problem, probably due to a lack of sleep but anyway. I need to trim a char array of it's white spaces for this assignment.
The solution I thought of involved a second char array and just simply copy the non white spaces to that array and I'm done. But how can I create a char array without knowing it's size, because at that moment I do not yet know the size. I still need to trim it in order to know how many characters need to be copied to the new array, which varies in the assignment
I know there are a lot of good questions out here on stackoverflow but I think this has more to do with the thought process rather then the correct syntax.
My second problem is how do I perform a fscanf/fgetc on a char array since it needs a stream, is it sufficient to give it a pointer rather then a stream?
If making the change in-place simply, shift every chracter after a space back, and repeat till the end of the array. This is very inefficient.
If making a new copy, make a new array of the same length, and then do as you were doing (copy all the non-space characters). If you copy the \0 character as well, then there will be no string termination issue. This is much more efficient.
Going by your comments, it appears you may have the option to input the array in any form you wish. I would then recommend that instead of doing text manipulations later on, just input the string in the form you need.
You can simply use scanf or fscanf repeatedly, to input the separate words into the same array. This will take care of all the whitespaces.
Here is one partial idea: You can make a first pass on the char array and count the blanks, then take the string length minus the blanks for the second array, then perform your copy across skipping the blanks.
You could also create a pass through the array:
Test until end of array:
Is my (Current/Index) position blank? (A space)
If so, grab next available non-blank value and put it there.
then index++
If not, index++
Not sure on the second, will do some checking and see if I can find a good answer there too.

What happens when a string contains only '\0'? C

In my program I'm reading words from a .txt file and I will be inserting them into both a linked list and a hash table.
If two '\n' characters are read in a row after a word then the second word the program will read will be '\n', however I then overwrite it with '\0', so essentially the string contains only '\0'.
Is it worth me putting an if statement so the next part of my program only executes if the word is a real word (i.e. word[0] != '\n')? Would the string '\0' use up space in the hash table/linked list?
In C a character array with first element being \0 is an empty string, i.e. of length zero. There's not much sense in keeping empty strings in containers, if that's what you are asking.
It depends if you consider an empty string a valid entry. You seem to be storing words so I would guess that an empty string is of no interest, but that is application specific.
For example, an environment variable can be present (getenv returns a valid pointer) but the value can be "unset": an empty string. In that case the fact that the value is an empty string might be significant.
So, if an empty string is not significant is it worth adding an if statement to ignore it? Generally that would be a "yes", since the overhead of storing and maintaining the empty string could be significantly more than one if statement per entry. But of course that is only a guess, I don't know what your overheads are, how many times that if would get executed, and how many empty string entries you would be saving. You might not know that either, so my fallback position would be only to store data that is significant.

Help with this ragged array/substring program in C?

so, I have a ragged array (called names) of length 200. each pointer in the array points to a string that is no longer than 50 characters and has no white spaces. I also have a string given through user-input called inname, with length 50, inname will be one of the strings stored in names. I need to find a way to go through the strings in my ragged array and find the string with the largest substring overlap with inname, excluding inname itself as that will be in the file. If no string has an overlap then we print out "no recommendation".
I've been melting trying to puzzle this out for many hours now, helps? O:) SO basically, the program finds the name in the array with the largest substring overlap with inname.
will edit to provide additional info if you need it
You should probably start on the smaller problem of determining the size of the overlap between infunc and a single string.
Wikipedia goes over some algorithms for solving the longest common substring problem (including pseudocode!)
This won't lead you to the most efficient way to find overlap (dynamic programming is one way--there are other crazy methods like suffix trees), but it should get you started:
First, think about how you would find the length of the overlap with the beginning of both strings aligned. For instance, find the longest overlap between these two:
programming
ungrammatical
In this case, only a single m overlaps--a length of 1.
Then think about how you would "shift" the strings and look for the overlap when they are aligned differently. (Don't actually change the strings: just change how you loop through them to compare them.) What is the overlap between these two?
programming
ungrammatical
Think about how to look at all possible alignments. If you keep track of the longest one you find, you have the longest alignment between two particular strings.
After that, move on to checking all the different strings. Keep track of the one with the best match, and again, once you have looked at all of them, you have your answer.

Resources