Count the occurrences of every word in C - c

I want to count the occurrences of every word of this small text " A broken heart of a broken mind ."
Every word of this text is in 2d array[100][20] in which 100 is the max_words and 20 is the max_word_length. And I have a pointers array[100] in which every pointer points the word. I can't find a clever way to count the same words,
for example
a: 2 times
broken: 2 times
heart: 1 time
mind: 1 time
. : 1 time
These would be the pointers and the words array:
POINTERS ARRAY WORDS ARRAY
point0(points "a") a
point1(points "broken") broken
point2(points "heart") heart
point3(points "of") of
point4 (points "a") mind
point5(points "broken") .
point6(points "mind") \0\0\0\0\0
point7(points ".") \0\0\0\0\0
NULL ..
NULL
..
NULL \0\0\0\0\0
Side note: Every word is lowercase.
void frequence_word(char *pointers[], int frequence_array[]) {
int word = 0;
int i;
int count = 1;
int check[MAX_WORDS];
for (word = 0; word < MAX_WORDS; word++) {
check[word] = -1;
}
for (word = 0; word < MAX_WORDS; word++) {
count = 1;
for (i = word + 1; i < MAX_WORDS; i++) {
if (pointers[word + 1] != NULL
&& strcmp(pointers[word], pointers[i]) == 0) {
count++;
check[i] = 0;
}
}
if (check[word] != 0) {
check[word] = count;
}
}
}
Any ideas please?

This seems like a use case for strstr. You can call strstr, then iteratively reassign to the original string until NULL is reached.
const char substring[] = "A broken heart of a broken mind";
const char* total = ...;
const char* result;
long count = 0;
while (result = strstr(total, substring)) {
count++;
total += (sizeof(substring) - 1);
}
I think this is mostly self-explanetory, but I will explain this line:
total += (sizeof(substring) - 1);
It takes advantage of the fact that sizeof on arrays returns the array length. Thus, sizeof on a character array returns the number of characters in it. We subtract one to ignore the null terminator.

Related

Trying to delete a specific character from a string in C?

I'm trying to delete a specific character (?) from the end of a string and return a pointer to a string, but it's not removing it at all at the moment. What am I doing wrong? Is there a better way to go about it?
char * word_copy = malloc(strlen(word)+1);
strcpy(word_copy, word);
int length = strlen(word_copy);
int i = 0;
int j = 0;
for (i = 0; word_copy[i] != '\0'; i++) {
if (word_copy[length - 1] == '?' && i == length - 1){
break;
}
}
for (int j = i; word_copy[j] != '\0'; j++) {
word_copy[j] = word_copy[j+1];
}
word = strdup(word_copy);
I'm immediately seeing a couple of problems.
The first for loop does nothing. It doesn't actually depend on i so it could be replaced with a single if statement.
if (word_copy[length - 1] == '?') {
i = length - 1;
} else {
i = length + 1;
}
The second for loop also acts as an if statement since it starts at the end of the string and can only ever run 0 or 1 times.
You could instead do something like this to remove the ?. This code will return a new malloced string with the last character removed if its ?.
char *remove_question_mark(char *word) {
unsigned int length = strlen(word);
if (length == 0) {
return calloc(1, 1);
}
if (word[length - 1] == '?') {
char *word_copy = malloc(length);
// Copy up to '?' and put null terminator
memcpy(word_copy, word, length - 1);
word_copy[length - 1] = 0;
return word_copy;
}
char *word_copy = malloc(length + 1);
memcpy(word_copy, word, length + 1);
return word_copy;
}
Or if you are feeling lazy, you could also just make the last character the new null terminator instead. Its essentially creates a memory leak of 1 byte, but that may be an acceptable loss. It should also be a fair bit faster since it doesn't need to allocate any new memory or copy the previous string.
unsigned int length = strlen(word);
if (length > 0 && word[length - 1] == '?') {
word[length] = 0;
}

function doesn't pass certain test case

I have a problem with one of the test for my solution for challenge in codewars. I have to write a function that returns alphabet position of characters in input string. My solution is below. I pass all my test and also tests from codewars but fail on this one (I did not implement this test code it was pat of the test code implemented by code wars):
Test(number_tests, should_pass) {
srand(time(NULL));
char in[11] = {0};
char *ptr;
for (int i = 0; i < 15; i++) {
for (int j = 0; j < 10; j++) {
char c = rand() % 10;
in[j] = c + '0';
}
ptr = alphabet_position(in);
cr_assert_eq(strcmp(ptr, ""), 0);
free(ptr);
}
}
The error I receive is following: The expression (strcmp(ptr, "")) == (0) is false. Thanks for the help!
p.s Also I noticed that I am leaking memory (I don't know how to solve this so I suppose I would use array to keep track of string and don't use malloc) --> I suppose this is not an issue I would just free(ptr) in main function.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *alphabet_position(char *text);
// test
int main()
{
if (!strcmp("1 2 3", alphabet_position("abc")))
{
printf("success...\n");
}
else
{
printf("fail...\n");
}
if (!strcmp("", alphabet_position("..")))
{
printf("success...\n");
}
else
{
printf("fail...\n");
}
if (!strcmp("20 8 5 19 21 14 19 5 20 19 5 20 19 1 20 20 23 5 12 22 5 15 3 12 15 3 11", alphabet_position("The sunset sets at twelve o' clock.")))
{
printf("success...\n");
}
else
{
printf("fail...\n");
}
}
char *alphabet_position(char *text)
{
// signature: string -> string
// purpose: extact alphabet position of letters in input string and
// return string of alphabet positions
// return "123"; // stub
// track numerical value of each letter according to it's alphabet position
char *alph = "abcdefghijklmnopqrstuvwxyz";
// allocate maximum possible space for return string
// each char maps to two digit number + trailing space after number
char *s = malloc(sizeof(char) * (3 * strlen(text) + 1));
// keep track of the begining of return string
char *head = s;
int index = 0;
int flag = 0;
while(*text != '\0')
{
if ( ((*text > 64) && (*text < 91)) || ((*text > 96) && (*text < 123)))
{
flag = 1;
index = (int)(strchr(alph, tolower(*text)) - alph) + 1;
if (index > 9)
{
int n = index / 10;
int m = index % 10;
*s = n + '0';
s++;
*s = m + '0';
s++;
*s = ' ';
s++;
}
else
{
*s = index + '0';
s++;
*s = ' ';
s++;
}
}
text++;
}
if (flag != 0) // if string contains at least one letter
{
*(s -1) = '\0'; // remove the trailing space and insert string termination
}
return head;
}
Here is what I think is happening:
In the cases where none of the characters in the input string is an alphabet character, s is never used, and therefore the memory allocated by malloc() could be anything. malloc() does not clear / zero-out memory.
The fact that your input case of ".." passes is just coincidence. The codewars test case does many such non-alphabetical tests in a row, each of which causes a malloc(), and if any one of them fails, the whole thing fails.
I tried recreating this situation, but it's (as I say) unpredictable. To test this, add a debugging line to output the value of s when flag is still 0:
if (flag != 0) { // if string contains at least one letter
*(s -1) = '\0'; // remove the trailing space and insert string termination
}
else {
printf("flag is still 0 : %s\n", s);
}
I'll wager that sometimes you get a garbage / random string that is not "".

code accounting for multiple delimiters isn't working

I have a program I wrote to take a string of words and, based on the delimiter that appears, separate each word and add it to an array.
I've adjusted it to account for either a ' ' , '.' or '.'. Now the goal is to adjust for multiple delimiters appearing together (as in "the dog,,,was walking") and still only add the word. While my program works, and it doesn't print out extra delimiters, every time it encounters additional delimiters, it includes a space in the output instead of ignoring them.
int main(int argc, const char * argv[]) {
char *givenString = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
int stringCharCount;
//get length of string to allocate enough memory for array
for (int i = 0; i < 1000; i++) {
if (givenString[i] == '\0') {
break;
}
else {
stringCharCount++;
}
}
// counting # of commas in the original string
int commaCount = 1;
for (int i = 0; i < stringCharCount; i++) {
if (givenString[i] == ',' || givenString[i] == '.' || givenString[i] == ' ') {
commaCount++;
}
}
//declare blank Array that is the length of commas (which is the number of elements in the original string)
//char *finalArray[commaCount];
int z = 0;
char *finalArray[commaCount] ;
char *wordFiller = malloc(stringCharCount);
int j = 0;
char current = ' ';
for (int i = 0; i <= stringCharCount; i++) {
if (((givenString[i] == ',' || givenString[i] == '\0' || givenString[i] == ',' || givenString[i] == ' ') && (current != (' ' | '.' | ',')))) {
finalArray[z] = wordFiller;
wordFiller = malloc(stringCharCount);
j=0;
z++;
current = givenString[i];
}
else {
wordFiller[j++] = givenString[i];
}
}
for (int i = 0; i < commaCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
This program took me hours and hours to get together (with help from more experienced developers) and I can't help but get frustrated. I'm using the debugger to my best ability but definitely need more experience with it.
/////////
I went back to pad and paper and kind of rewrote my code. Now I'm trying to store delimiters in an array and compare the elements of that array to the current string value. If they are equal, then we have come across a new word and we add it to the final string array. I'm struggling to figure out the placement and content of the "for" loop that I would use for this.
char * original = "USA,Canada,Mexico,Bermuda,Grenada,Belize";
//creating two intialized variables to count the number of characters and elements to add to the array (so we can allocate enough mmemory)
int stringCharCount = 0;
//by setting elementCount to 1, we can account for the last word that comes after the last comma
int elementCount = 1;
//calculate value of stringCharCount and elementCount to allocate enough memory for temporary word storage and for final array
for (int i = 0; i < 1000; i++) {
if (original[i] == '\0') {
break;
}
else {
stringCharCount++;
if (original[i] == ',') {
elementCount++;
}
}
}
//account for the final element
elementCount = elementCount;
char *tempWord = malloc(stringCharCount);
char *finalArray[elementCount];
int a = 0;
int b = 0;
//int c = 0;
//char *delimiters[4] = {".", ",", " ", "\0"};
for (int i = 0; i <= stringCharCount; i++) {
if (original[i] == ',' || original[i] == '\0') {
finalArray[a] = tempWord;
tempWord = malloc(stringCharCount);
tempWord[b] = '\0';
b = 0;
a++;
}
else {
tempWord[b++] = original[i];
}
}
for (int i = 0; i < elementCount; i++) {
printf("%s\n", finalArray[i]);
}
return 0;
}
Many issues. Suggest dividing code into small pieces and debug those first.
--
Un-initialize data.
// int stringCharCount;
int stringCharCount = 0;
...
stringCharCount++;
Or
int stringCharCount = strlen(givenString);
Other problems too: finalArray[] is never assigned a terminarting null character yet printf("%s\n", finalArray[i]); used.
Unclear use of char *
char *wordFiller = malloc(stringCharCount);
wordFiller = malloc(stringCharCount);
There are more bugs than lines in your code.
I'd suggest you start with something much simpler.
Work through a basic programming book with excercises.
Edit
Or, if this is about learning to program, try another, simpler programming language:
In C# your task looks rather simple:
string givenString = "USA,Canada Mexico,Bermuda.Grenada,Belize";
string [] words = string.Split(new char[] {' ', ',', '.'});
foreach(word in words)
Console.WriteLine(word);
As you see, there are much issues to worry about:
No memory management (alloc/free) this is handeled by the Garbage Collector
no pointers, so nothing can go wrong with them
powerful builtin string capabilities like Split()
foreach makes loops much simpler

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

C - Largest String From a Big One

So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...
Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().
This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.
What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif
While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}
First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.
Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}

Resources