Function that finds whether 2 string are made from same words - c

I need to create a function in C, which finds out if 2 strings are made from same words. As can be seen in current code, I loaded each string in separate array. I made it that in the array there are words, all in lower case letters with just 1 space between each word and without all non-alpha characters. I though, that I could just sort the string and call strcmp on them, but it can't be done so, because of the reason, that there can be strings such as "dog dog dog cat" and "dog cat" , these strings are from same words, so the function should return 1, but it wouldnt if just sorted and used strcmp. So i though, I could merge all duplicated words in 1 and then sort and strcmp, but there is still one problem, that when there would be words such as "dog" and "god" , these are 2 different words, but the function would still take them as same after sorting.
"dog dog dog cat" "dog cat" - same words
"HI HeLLO!!'" "hi,,,hello hi" - same words
I would be very thankful for any help. I really don't know how to create it, I sat at it for quite some time and still can't figure it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int sameWords( const char * a, const char * b)
{
char * array1=NULL;
char * array2=NULL;
int length1=0, length2=0, i=0, j=0;
while(a[i])
{
if(i>=length1)
{
length1+=250;
array1=(char*)malloc(length1*sizeof(char));
}
if(isspace(a[i]) && !isspace(a[i-1]))
{
array1[i]=a[i];
}
if(isalpha(a[i]))
{
array1[i]=tolower(a[i]);
}
i++;
}
while(b[j])
{
if(j>=length2)
{
length2+=250;
array2=(char*)malloc(length2*sizeof(char));
}
if(isspace(b[j]) && !isspace(b[j-1]))
{
array2[j]=b[j];
}
if(isalpha(b[j]))
{
array2[j]=tolower(b[j]);
}
j++;
}
}
int main()
{
sameWords("This' is string !!! ", "THIS stRing is !! string ");
return 0;
}

You have already learned two ways to go about your problem. The complicated one is to split each of the strings into words, sort them and then weed out duplicates, which is easy in a sorted array. The easier one is to split the first string into words, search for each word in the second. Then do the same the other way round: split the second and check for words in the first.
Both approaches require that you split the strings. That's also where you seem to have problems in your code. (You've got the basic idea to look at word boundaries, but you don't seem to know how to store the words.)
The basic question is: How are you going to represent the words, i.e. the substrings of a C string? There are various ways. You could use pointers into the string together with a string length or you could copy them into another buffer.
Here is a sloution that splits the string a into words and then checks whether each word can be found in b:
/*
* Return 1 if all words in a can be found in b,
* return 0 otherwise.
*/
int split_and_check(const char *a, const char *b)
{
int begin = -1; /* marker for beginning of word */
char word[80]; /* temporary buffer for current word */
int prev = 0; /* previously read char to detect word bounaries */
int len; /* current length of word */
int i;
i = 0;
while (1) {
if (isalpha(a[i])) {
if (!isalpha(prev)) {
begin = i;
len = 0;
}
if (len < 80) word[len++] = a[i];
} else {
if (len > 0) {
word[len] = '\0'; /* manually null-terminate word */
if (strstr(b, word) == NULL) {
/* fail on string mismatch */
return 0;
}
len = 0; /* reset word-length counter */
}
}
if (a[i] == '\0') break; /* check end here to catch last word */
prev = a[i++];
}
return 1;
}
The current word is stored in the local char buffer word and has the length len. Note how the zero end marker '\0' is added to word manually before searching b for word: The library function strstr looks for a string in another one. Both strings must be zero-terminated.
This is only one half of the solution. You must check the strings the other way round:
int same_words(const char *a, const char *b)
{
if (split_and_check(a, b) == 0) return 0;
if (split_and_check(b, a) == 0) return 0;
return 1;
}
This is not yet the exact solution to your problem, because the string matching is done case-sensitively. I've skipped this part, because it was easier that way: strstr is case sensitive and I don't know of any variants that ignore the case.

You are returning nothing from your function sameWords whose return type is int.

I don't pretend to be awarded as the answer, but I would take a look at regular expressions too for this kind of things.
Does C or C++ have a standard regex library?
It would take minutes to solve it, you split the string with regex, lowercase-it, and then iterate to look after common words.

What I would do to solve this problem is create a data structure like a tree into which you can insert words. The insert function would do nothing if the word is already there, otherwise, it would convert it to lowercase and insert it in the tree. Then you could simply convert both strings to these types of trees and compare the trees.
Another way to do this is in bash. While this is probably not allowed for you assignment, if you understand how and why it works, you should be able to code something up that mimics it:
# string1 and string2 are simply strings with spaces separating words
s1="dog dog dog cat"
s2="cat dog"
# Convert to arrays
a1=( $(printf "%s\n" ${s1} | sort | uniq ) )
a2=( $(printf "%s\n" ${s2} | sort | uniq ) )
# Compare the result
if [ "${a1[*]}" == "${a2[*]}" ] ; then
echo "Same"
fi

Related

Count no. of words in a string in C programming ( I am getting empty character constant error)

My code was to count no of words in the string.
But (a[i]=='') is showing empty character constant error
#include <stdio.h>
int main() {
char a[20];
int i,c1=0,c2=0;
scanf("%[^\n]",a);
for(i=0;a[i]!='\0';i++)
{
c1++;
if(a[i]=='')
c2++;
}
printf("%d\n",c1);
printf("%d",c2+1);
return 0;
}
For input - tom is here
I expect the output to be -11
3
Compilation error- In function 'main':
prog.c:10:15: error: empty character constant
if(a[i]=='')
^
#include <stdio.h>
int main() {
char str[50];
int i, numberOfWords=0;
gets(str);
for(i=0; str[i]!='\0'; i++) {
if(str[i] == 32) //ascii code of space is 32
numberOfWords++;
}
printf("number of words = %d\n", numberOfWords + 1);
//adding 1 to numberOfWords because if there are two words, there will be 2-1=1 space between them. eg= "Hello World"
return 0;
}
In contrast to empty string literal (""), character literals always need to contain a character (exactly one)*. Replace '' with ' ' and you code should compile at least.
However, the code as is will count the number of spaces. What will happen if a string contains more than one subsequent space? Additionally, you might want to consider tabs as well? And how would you want to interpret punctuation marks? Part of words or separator? And what about numbers?
Depending on how you answer all these questions, you might need to vary the condition in code below. In any case, I propose a stateful iteration over your input:
int isSeparator = 1; // so you will count the first word occuring, too, even if starting
// at first character of the string
for(char const* s = str; *s; ++s)
{
if(isalnum((unsigned char)*s)) // requires <ctype.h> header; modify condition
// appropriately if you want different
// characters to count as word parts
{
wordCount += isSeparator;
isSeparator = 0;
}
else
{
isSeparator = 1;
}
}
*Actually, the standard does allow multi-byte characters, so to be precise, we'd need to state 'at least one character', but these multi-byte characters have implementation defined meaning and usually aren't useful anyway, so for practical reasons, we might stay with the technically less correct 'exactly one character'...

Writing a program in C with the function isAlphabetic to determine if a string strictly contains alphabetic letters or not

This is what I have so far.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int value;
char c='Z';
char alph[30]="there is a PROF 1 var orada";
char freq[27];
int i;
// The function isAlphabetic will accept a string and test each character to
// verify if it is an alphabetic character ( A through Z , lowercase or uppercase)
// if all characters are alphabetic characters then the function returns 0.
// If a nonalphabetic character is found, it will return the index of the nonalpabetic
// character.
value = isAlphabetic(alph);
if (value == 0)
printf("\n The string is alphabetic");
else
printf("Non alphabetic character is detected at position %d\n",value);
return EXIT_SUCCESS;
}
int isAlphabetic(char *myString) {
}
What I'm confused is how will I have the program scan through a string to detect exactly where a non alphabetic character is, if any? I'm guessing it'll first involve counting all the characters in a string first?
Not going to provide the answer via code (as someone else did), but consider:
A string in C is nothing more than an array of characters and a null terminator.
You can iterate through each item in an array using [] (i.e., input[i]) to check its value against an ASCII table for example.
Your function can exit as soon as it finds one value that is not alphabetic.
There are certainly other ways to solve this problem, but my assumption is that at this level, your professor would be a bit suspicious if you started using a bunch of libraries / tools you haven't been taught.
Let's take your questions one at a time:
...how will I have the program scan through a string...
"Scan through a string" means you skin the cat with a loop:
char xx[] = "ABC DEF 123 456";
int ii;
/* for, while, do while; pick your poison */
for (ii = 0; xx[ii] != '\0'; ++ii)
{
/* Houston, we're scanning. */
}
...to detect...
"Detect" means you skin the cat with a comparison of some sort:
char a, b;
a == b; /* equality of two char's */
a >= b; /* greater-than-or-equal-to relationship of two char's */
a < b; /* I'll bet you can guess what this does now */
...exactly where a non alphabetic character is...
Well by virtue of scanning you'll know "exactly where" due to your index.
Scan from the first alphabet to the last alphabet. Begin with a counter variable set to 0.
Each time you move to next character, do counter++;this will give you the index of non alphabet.
If you find any non-alphabet character,return counter there itself.
I will give you a hint :
#include <stdio.h>
int main()
{
char c = '1';
printf("%d",c-48); //notice this
return 0;
}
Output : 1
Should be more than enough to solve it on your own now :)

How do I parse a string in C?

I am a beginner learning C; so, please go easy on me. :)
I am trying to write a very simple program that takes each word of a string into a "Hi (input)!" sentence (it assumes you type in names). Also, I am using arrays because I need to practice them.
My problem is that, some garbage gets putten into the arrays somewhere, and it messes up the program. I tried to figure out the problem but to no avail; so, it is time to ask for expert help. Where have I made mistakes?
p.s.: It also has an infinite loop somewhere, but it is probably the result of the garbage that is put into the array.
#include <stdio.h>
#define MAX 500 //Maximum Array size.
int main(int argc, const char * argv[])
{
int stringArray [MAX];
int wordArray [MAX];
int counter = 0;
int wordCounter = 0;
printf("Please type in a list of names then hit ENTER:\n");
// Fill up the stringArray with user input.
stringArray[counter] = getchar();
while (stringArray[counter] != '\n') {
stringArray[++counter] = getchar();
}
// Main function.
counter = 0;
while (stringArray[wordCounter] != '\n') {
// Puts first word into temporary wordArray.
while ((stringArray[wordCounter] != ' ') && (stringArray[wordCounter] != '\n')) {
wordArray[counter++] = stringArray[wordCounter++];
}
wordArray[counter] = '\0';
//Prints out the content of wordArray.
counter = 0;
printf("Hi ");
while (wordArray[counter] != '\0') {
putchar(wordArray[counter]);
counter++;
}
printf("!\n");
//Clears temporary wordArray for new use.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
wordCounter++;
counter = 0;
}
return 0;
}
Solved it! I needed to add to following if sentence to the end when I incremented the wordCounter. :)
if (stringArray[wordCounter] != '\n') {
wordCounter++;
}
You are using int arrays to represent strings, probably because getchar() returns in int. However, strings are better represented as char arrays, since that's what they are, in C. The fact that getchar() returns an int is certainly confusing, it's because it needs to be able to return the special value EOF, which doesn't fit in a char. Therefore it uses int, which is a "larger" type (able to represent more different values). So, it can fit all the char values, and EOF.
With char arrays, you can use C's string functions directly:
char stringArray[MAX];
if(fgets(stringArray, sizeof stringArray, stdin) != NULL)
printf("You entered %s", stringArray);
Note that fscanf() will leave the end of line character(s) in the string, so you might want to strip them out. I suggest implementing an in-place function that trims off leading and trailing whitespace, it's a good exercise as well.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
You never enter into this loop.
user1799795,
For what it's worth (now that you've solved your problem) I took the liberty of showing you how I'd do this given the restriction "use arrays", and explaining a bit about why I'd do it that way... Just beware that while I am experienced programmer I'm no C guru... I've worked with guys who absolutely blew me into the C-weeds (pun intended).
#include <stdio.h>
#include <string.h>
#define LINE_SIZE 500
#define MAX_WORDS 50
#define WORD_SIZE 20
// Main function.
int main(int argc, const char * argv[])
{
int counter = 0;
// ----------------------------------
// Read a line of input from the user (ie stdin)
// ----------------------------------
char line[LINE_SIZE];
printf("Please type in a list of names then hit ENTER:\n");
while ( fgets(line, LINE_SIZE, stdin) == NULL )
fprintf(stderr, "You must enter something. Pretty please!");
// A note on that LINE_SIZE parameter to the fgets function:
// wherever possible it's a good idea to use the version of the standard
// library function that allows you specificy the maximum length of the
// string (or indeed any array) because that dramatically reduces the
// incedence "string overruns", which are a major source of bugs in c
// programmes.
// Also note that fgets includes the end-of-line character/sequence in
// the returned string, so you have to ensure there's room for it in the
// destination string, and remember to handle it in your string processing.
// -------------------------
// split the line into words
// -------------------------
// the current word
char word[WORD_SIZE];
int wordLength = 0;
// the list of words
char words[MAX_WORDS][WORD_SIZE]; // an array of upto 50 words of
// upto 20 characters each
int wordCount = 0; // the number of words in the array.
// The below loop syntax is a bit cyptic.
// The "char *c=line;" initialises the char-pointer "c" to the start of "line".
// The " *c;" is ultra-shorthand for: "is the-char-at-c not equal to zero".
// All strings in c end with a "null terminator" character, which has the
// integer value of zero, and is commonly expressed as '\0', 0, or NULL
// (a #defined macro). In the C language any integer may be evaluated as a
// boolean (true|false) expression, where 0 is false, and (pretty obviously)
// everything-else is true. So: If the character at the address-c is not
// zero (the null terminator) then go-round the loop again. Capiche?
// The "++c" moves the char-pointer to the next character in the line. I use
// the pre-increment "++c" in preference to the more common post-increment
// "c++" because it's a smidge more efficient.
//
// Note that this syntax is commonly used by "low level programmers" to loop
// through strings. There is an alternative which is less cryptic and is
// therefore preferred by most programmers, even though it's not quite as
// efficient. In this case the loop would be:
// int lineLength = strlen(line);
// for ( int i=0; i<lineLength; ++i)
// and then to get the current character
// char ch = line[i];
// We get the length of the line once, because the strlen function has to
// loop through the characters in the array looking for the null-terminator
// character at its end (guess what it's implementation looks like ;-)...
// which is inherently an "expensive" operation (totally dependant on the
// length of the string) so we atleast avoid repeating this operation.
//
// I know I might sound like I'm banging on about not-very-much but once you
// start dealing with "real word" magnitude datasets then such habits,
// formed early on, pay huge dividends in the ability to write performant
// code the first time round. Premature optimisation is evil, but my code
// doesn't hardly ever NEED optimising, because it was "fairly efficient"
// to start with. Yeah?
for ( char *c=line; *c; ++c ) { // foreach char in line.
char ch = *c; // "ch" is the character value-at the-char-pointer "c".
if ( ch==' ' // if this char is a space,
|| ch=='\n' // or we've reached the EOL char
) {
// 1. add the word to the end of the words list.
// note that we copy only wordLength characters, instead of
// relying on a null-terminator (which doesn't exist), as we
// would do if we called the more usual strcpy function instead.
strncpy(words[wordCount++], word, wordLength);
// 2. and "clear" the word buffer.
wordLength=0;
} else if (wordLength==WORD_SIZE-1) { // this word is too long
// so split this word into two words.
strncpy(words[wordCount++], word, wordLength);
wordLength=0;
word[wordLength++] = ch;
} else {
// otherwise: append this character to the end of the word.
word[wordLength++] = ch;
}
}
// -------------------------
// print out the words
// -------------------------
for ( int w=0; w<wordCount; ++w ) {
printf("Hi %s!\n", words[w]);
}
return 0;
}
In the real world one can't make such restrictive assumptions about the maximum-length of words, or how many there will be, and if such restrictions are given they're almost allways arbitrary and therefore proven wrong all too soon... so straight-off-the-bat for this problem, I'd be inclined to use a linked-list instead of the "words" array... wait till you get to "dynamic data structures"... You'll love em ;-)
Cheers. Keith.
PS: You're going pretty well... My advise is "just keep on truckin"... this gets a LOT easier with practice.

Tokenize Strings using Pointers in ANSI C

This is in Ansi C. I am given a string. I am supposed to create a method that returns an array of character pointers that point to the beginning of each word of said string. I am not allowed to use Malloc, but instead told that the maximum length of input will be 80.
Also, before anyone flames me for not searching the forum, I can't use strtok :(
char input[80] = "hello world, please tokenize this string"
and the output of the method should have 6 elements;
output[0] points to the "h",
output[1] points to the "w",
and so on.
How should I write the method?
Also, I need a similar method to handle input from a file with maximum of 110 lines.
Pseudocode:
boolean isInWord = false
while (*ptr != NUL character) {
if (!isInWord and isWordCharacter(*ptr)) {
isInWord = true
save ptr
} else if (isInWord and !isWordCharacter(*ptr)) {
isInWord = false
}
increment ptr
}
isWordCharacter checks whether the character is part of the word or not. Depending on your definition, it can be only alphabet character (recognize part-time as 2 words), or it may include - (recognize part-time as one word).
Because it's homework here's a part of what you might need:
char* readPtr = input;
char* wordPtr = input;
int wordCount = 0;
while (*readPtr++ != ' ');
/* Here we have a word from wordPtr to readPtr-1 */
output[wordCount++] = /* something... :) */
You'll need that in a loop, and must consider how to move onto the next word, and check for end of input.

comparing strings (from other indices rather than 0)

How can one compare a string from the middle (or some other point but not the start) to another string?
like i have a string
str1[]="I am genius";
now if i want to find a word in it how should i compare it with the word? for example the word is am.
Here is what i did.Its a bit stupid but works perfectly :D
#include<stdio.h>
#include<string.h>
void print( char string[]);
int main()
{
int i;
char string1[20];
printf("Enter a string:");
gets(string1);
print(string1);
return 0;
getch();
}
void print(char string[])
{
int i,word=1,sum=0,x;
for(i=0; ;i++)
{
sum++;
if(string[i]==' ')
{
printf("Word#%d:%d\n",word,sum-1);
sum=0;
word++;
}/* if ends */
if(string[i]=='\0')
{ // program sai kaam karnay k liye ye code yahan bhi paste hona chahyey
printf("Word#%d:%d\n",word,sum-1);
sum=0;
word++;
break;
}
}/* for ends*/
}
Use strncmp():
strncmp( whereToFind + offsetToStartAt, patternToFind, patternLength );
If you wish to find a substring in a string, use the function strstr():
char *p = strstr(str1, "am");
if (p != NULL)
{
// p now points to start of substring
printf("found substring\n");
}
else
{
printf("substring not found\n");
}
If you want to compare the remainder of string s1 starting at index i1 to the remainder of string s2 starting at i2, it's very easy:
result = strcmp(s1+i1, s2+i2);
If you want to see if the substring of s1 beginning at i1 matches the string s2, try:
result = strcmp(s1+i1, s2);
or:
result = strncmp(s1+i1, s2, strlen(s2));
depending on whether you want the whole remainder of s1 to match or just the portion equal in length to s2 to match (i.e whether s1 contains s2 as a substring beginning at position i1.
If you want to search for a substring, use strstr.
Since this is homework I am assuming you can't use standard functions, so I can think of two solutions:
Split all of the words into a link
list, then just compare each string
until you find your word.
Just use a for loop, start at the
beginning, and you can use [] to
help jump through the string, so
instr[3] would be the fourth
character, as the index is
zero-based. Then you just see if you are at your word yet.
There are optimizations you can do with (2), but I am not trying to do your homework for you. :)
One option you be to use
size_t strspn( char *s1, const char *s2) /* from #include <string.h> */
*returns the length of the longest substring of s1 that begins at the start of s1 and consists only of the characters found in s2.
If it returns ZERO than there is no substring.
You can use parse the string into words and store them in new char arrays/pointers.
Or
Suppose the string you want to find is "am" stored in ptr *str2.
You start comparison using the index[] from str1 till you find a matching char for index 0 from str2
Once you find a match increment both pointers till you reach end of str2 to compare entire string.
If there is no match then continue to find char at index 0 in str2 in str1 from the place where you entered step 2.
Alternatively
You have to use a two dimensinal array.
char str[3][10] = { "i","am","2-darray"};
Here str[1] will contain "am". Thats assuming you want to get indvidual words of a string.
Edit: Removed the point diverting from OP

Resources