Right algorithm but wrong implementation? - c

I am a beginner in C with some experience in python and java. I want to solve a problem with C. The problem goes like this:
Take an input as a sentence with words separated by blank spaces only (assume lower case only), re-write the sentence with the following rules:
1) If a word occurs the first time, keep it the same.
2) If the word occurs twice, replace the second occurrence with the word being copied twice (e.g. two --> twotwo).
3) If the word occurs three times or more, delete all the occurrences after the second one.
Print the output as a sentence. The maximum lengths of the input sentence and each individual word is 500 chars and 50 chars.
Exemplary input: jingle bells jingle bells jingle all the way
Exemplary output: jingle bells jinglejingle bellsbells all the way
The approach I take is:
1) Read the input, separate each word and put them into an array of char pointers.
2) Use nested for loop to go through the array. For each word after the first word:
A - If there is no word before it that is equal to it, nothing happens.
B - If there is already one word before it that is equal to it, change the word as its "doubled form".
C - If there is already a "doubled form" of itself that exists before it, delete the word (set the element to NULL.
3) Print the modified array.
I am fairly confident about the correctness of this approach. However, when I actually write the code:
'''
int main()
{
char input[500];
char *output[500];
// Gets the input
printf("Enter a string: ");
gets(input);
// Gets the first token, put it in the array
char *token = strtok(input, " ");
output[0] = token;
// Keeps getting tokens and filling the array, untill no blank space is found
int i = 1;
while (token != NULL) {
token = strtok(NULL, " ");
output[i] = token;
i++;
}
// Processes the array, starting from the second element
int j, k;
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
for (k = 0; k < j; k++) {
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
if (strcmp(output[k], doubled) == 0) { // Situation C
output[j] = ' ';
}
}
}
// Convert the array to a string
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
}
else { // If reaches the end of the sentence
break;
}
}
// Prints out the result string
printf("%s", result);
return 0;
}
'''
I did a bunch of tests on each individual block. There are several issues:
1) When processing the array, strcmp, strcat, and strcpy in the loop seem to give Segmentation fault error reports.
2) When printing the array, the words did not show the order that they are supposed to do.
I am now frustrated because it seems that the issues are all coming from some internal structural defects of my code and they are very much related to the memory mechanism of C which I am not really too familiar with. How should I fix this?

One issues jumps out at me. This code is wrong:
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
doubled doesn't point to any actual memory. So trying to copy data to where it points is undefined behavior and will almost certainly cause a SIGSEGV - and it will corrupt memory if it doesn't cause a SIGSEGV.
That needs to be fixed - you can't copy a string with strcpy() or strcat() to an pointer that doesn't point to actual memory.
This would be better, but still not ideal as no checking is done to ensure there's no buffer overflow:
char doubled[ 2000 ];
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
This is also a problem with doubled defined like that:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
That just points output[j] at doubled. The next loop iteration will overwrite doubled, and the data that output[j] still points to to will get changed.
This would address that problem:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = strdup( doubled );
}
strdup() is a POSIX function that, unsurprising, duplicates a string. That string will need to be free()'d later, though, as strdup() is the same as:
char *strdup( const char *input )
{
char *duplicate = malloc( 1 + strlen( input ) );
strcpy( duplicate, input );
return( duplicate );
}
As pointed out, strcat(doubled, doubled); is also a problem. One possible solution:
memmove(doubled + strlen( doubled ), doubled, 1 + strlen( doubled ) );
That copies the contents of the doubled string to the memory starting at the original '\0' terminator. Note that since the original '\0' terminator is part of the string, you can't use strcpy( doubled + strlen( doubled ), doubled );. Nor can you use memcpy(), for the same reason.

Your code invokes undefined behavior in several places by writing to memory that is not yet owned, and is likely the cause of the segmentation fault you are seeing. For example, in this segment:
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
By itself...
char *result = output[0]; //creates a pointer, but provides no memory.
...is not sufficient to receive content such as
strcat(result, blank_space);
strcat(result, output[l]);
It needs memory:
char *result = malloc(501);//or more space if needed.
if(result)
{
//now use strcat or strcpy to add content

Related

Optimizing a do while loop/for loop in C

I was doing an exercise from LeetCode in which consisted in deleting any adjacent elements from a string, until there are only unique characters adjacent to each other. With some help I could make a code that can solve most testcases, but the string length can be up to 10^5, and in a testcase it exceeds the time limit, so I'm in need in some tips on how can I optimize it.
My code:
char res[100000]; //up to 10^5
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
ver = 0;
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
//clar the res string
memset(res, '\0', sizeof res);
} while (ver > 0);
return s;
}
The code can't pass a speed test that has a string that has around the limit (10^5) length, I won't put it here because it's a really big text, but if you want to check it, it is the 104 testcase from the LeetCode Daily Problem
If it was me doing something like that, I would basically do it like a simple naive string copy, but keep track of the last character copied and if the next character to copy is the same as the last then skip it.
Perhaps something like this:
char result[1000]; // Assumes no input string will be longer than this
unsigned source_index; // Index into the source string
unsigned dest_index; // Index into the destination (result) string
// Always copy the first character
result[0] = source_string[0];
// Start with 1 for source index, since we already copies the first character
for (source_index = 1, dest_index = 0; source_string[source_index] != '\0'; ++source_index)
{
if (source_string[source_index] != result[dest_index])
{
// Next character is not equal to last character copied
// That means we can copy this character
result[++dest_index] = source_string[source_index];
}
// Else: Current source character was equal to last copied character
}
// Terminate the destination string
result[dest_index + 1] = '\0';

C-Lang: Segmentation Fault when working with string in for-loop

Quite recently, at the university, we began to study strings in the C programming language, and as a homework, I was given the task of writing a program to remove extra words.
While writing a program, I faced an issue with iteration through a string that I could solve in a hacky way. However, I would like to deal with the problem with your help, since I cannot find the error myself.
The problem is that when I use the strlen(buffer) function as a for-loop condition, the code compiles easily and there are no errors at runtime, although when I use the __act_buffer_len variable, which is assigned a value of strlen(buffer) there will be a segmentation fault at runtime.
I tried many more ways to solve this problem, but the only one, which I already described, worked for me.
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
int __act_buffer_len = strlen(buffer);
// for debugging purposes
printf("__actbuff: %d\n", __act_buffer_len);
printf("sizeof: %d\n", sizeof(buffer));
printf("strlen: %d\n", strlen(buffer));
char* _newbuff = malloc(__act_buffer_len + 1); // <- new buffer without words with less than 2 unique words
char* _tempbuff; // <- used to store current word
int beg_point = 0;
int curr_wlen = 0;
for (int i = 0; i < strlen(buffer); i++) // no errors at runtime, app runs well
// for (int i = 0; i < __act_buffer_len; i++) // <- segmentation fault when loop is reaching a space character
// for (int i = 0; buffer[i] != '\0'; i++) // <- also segmentation fault at the same spot
// for (size_t i = 0; i < strlen(buffer); i++) // <- even this gives a segmentation fault which is totally confusing for me
{
printf("strlen in loop %d\n", i);
if (buffer[i] == delim)
{
char* __cpy;
memcpy(__cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t __letters = __get_letters(__cpy, curr_wlen); // <- will return number of unique letters in word
if (__letters > 2) // <- will remove all the words with less than 2 unique letters
{
strcat(_newbuff, __cpy);
strcat(_newbuff, " ");
}
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
}
else curr_wlen++;
}
return _newbuff;
}
In short, the code above just finds delimiter character in string and counts the number of unique letters of the word before this delimiter.
My fault was in not initializing a __cpy variable.
Also, as #n.1.8e9-where's-my-sharem. stated, I shouldn't name vars with two underscores.
The final code:
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
size_t _act_buffer_len = strlen(buffer);
char* _newbuff = malloc(_act_buffer_len); // <- new buffer without words with less than 2 unique words
int beg_point = 0;
int curr_wlen = 0;
for (size_t i = 0; i < _act_buffer_len; i++)
{
if (buffer[i] == delim)
{
char* _cpy = malloc(curr_wlen);
memcpy(_cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t _letters = _get_letters(_cpy, curr_wlen); // <- will return number of unique letters in word
if (_letters > 2) // <- will remove all the words with less than 2 unique letters
strcat(_newbuff, _cpy);
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
free(_cpy);
}
else curr_wlen++;
}
return _newbuff;
}
Thanks for helping me

Shuffle words from a 1D array

I've been given this sentence and I need to shuffle the words of it:
char array[] = "today it is going to be a beautiful day.";
A correct output would be: "going it beautiful day is a be to today"
I've tried many things like turning it into a 2D array and shuffling the rows, but I can't get it to work.
Your instinct of creating a 2D array is solid. However in C that's more involved than you might expect:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
int main()
{
char array[] = "today it is going to be a beautiful day.";
char out_array[sizeof(array)];
char words[sizeof(array)][46];
int word_count = 0;
int letter_count = 0;
int on_word = 0;
int count = 0;
int i = 0;
int j = 0;
srand(time(NULL));
// parse words into 2D array
for (i = 0; i < sizeof(array); i++) {
if (array[i] == ' ') {
if (on_word) {
words[word_count++][letter_count] = '\0';
letter_count = 0;
on_word = 0;
}
} else if (array[i] == '\0' || array[i] == '.') {
break;
} else {
on_word = 1;
words[word_count][letter_count++] = array[i];
}
}
words[word_count++][letter_count] = '\0';
// randomly swap around words
for (i = 0; i < word_count; i++) {
char temp[46];
int idx = rand() % word_count;
if (idx != i) {
strcpy(temp, words[idx]);
strcpy(words[idx], words[i]);
strcpy(words[i], temp);
}
}
// output words into out_array
for (i = 0; i < word_count; i++) {
for (j = 0; words[i][j] != '\0'; j++) {
out_array[count++] = words[i][j];
}
out_array[count++] = ' ';
}
out_array[count - 1] = '\0';
printf("%s", out_array);
return 0;
}
You need two basic algorithms to solve this problem.
Split the input string into a list of words.
Randomly sample your list of words until there are no more.
1. Split the input string into a list of words.
This is much simpler than you may think. You don’t need to actually copy any words, just find where each one begins in your input string.
today it is going to be a beautiful day.
^---- ^- ^- ^---- ^- ^- ^ ^-------- ^--
There are all kinds of ways you can store that information, but the two most useful would be either an array of integer indices or an array of pointers.
For your example sentence, the following would be a list of indices:
0, 6, 9, 12, 18, 21, 24, 26, 36
To do this, just create an array with a reasonable upper limit on words:
int words[100]; // I wanna use a list of index values
int nwords = 0;
 
char * words[100]; // I wanna use a list of pointers
int nwords = 0;
If you do it yourself either structure is just as easy.
If you use strtok life is much easier with a list of pointers.
All you need at this point is a loop over your input to find the words and populate your list. Remember, a words is any alphabetic or numeric value (and maybe hyphens, if you want to go that far). Everything else is not a word. If you #include <ctype.h> you get a very handy function for classifying a character is “word” or “not-word”:
if (isalnum( input[n] )) its_a_word_character;
else its_not_a_word_character_meaning_we_have_found_the_end_of_the_word;
Now that you have a list of words, you can:
2. Randomly sample your list of words until there are no more.
There are, again, a number of ways you could do this. Already suggested above is to randomly shuffle the list of words (array of indices or array of pointers), and then simply rebuild the sentence by taking the words in order.
→ Beware, Etian’s example is not a correct shuffle, though it would probably go unnoticed or ignored by everyone at your level of instruction as it will appear to work just fine. Google around “coding horror fisher yates” for more.
The other way would be to just select and remove a random word from your array until there are no words left.
The random sampling is not difficult, but it does require some precise thinking, making this the actually most difficult part of your project.
To start you first need to get a proper random number. There is a trick to this that people are generally not taught. Here you go:
int random( int N ) // Return an UNBIASED pseudorandom value in [0, N-1].
{
int max_value = (RAND_MAX / N) * N;
int result;
do result = rand(); while (result >= max_value);
return result % N;
}
And in main() the very first thing you should do is initialize the random number generator:
#include <stdlib.h>
#include <time.h>
int main()
{
srand( (unsigned)time( NULL ) );
Now you can sample / shuffle your array properly. You can google "Fisher-Yates Shuffle" (or follow the link in the comment below your question). Or you can just select the next word:
while (nwords)
{
int index = random( nwords );
// do something with word[index] here //
// Remove the word we just printed from our list of words
// • Do you see what trick we use to remove the word?
// • Do you also know why this does not affect our random selection?
words[index] = words[--nwords];
}
Hopefully you can see that both of these methods are essentially the same thing. Whichever you choose is up to you. I personally would use the latter because of the following consideration:
Output
You can create a new string and then print it, or you can just print each word directly. As the homework (as you presented it) does not require generation of a new string, I would just print the output directly. This makes life simpler in the sense that you do not have to mess with another string array.
As you print each word (or append it to a new string), remember how you separated them to begin with. If you use strtok you can just use something like:
printf( "%s", words[index] ); // print word directly to stdout
 
strcat( output, words[index] ); // append word to output string
If you found the beginnings of each word yourself, you will have to again loop until you find the end of the word:
// Print word, character by character, directly to stdout
for (int n = index; isalnum( words[index+n] ); n++)
{
putchar( words[index+n] );
}
 
// Append word, character by character, to output string
for (int n = index; isalnum( words[index+n] ); n++)
{
char * p = strchr( output, '\0' ); // (Find end of output[])
*p++ = words[index+n]; // (Add char)
*p = '\0'; // (Add null terminator)
}
All that’s left is to pay attention to spaces and periods in your output.
Hopefully this should be enough to get you started.

HEAP CORRUPTION DETECTED: after normal block(#87)

I'm trying to do a program that get number of names from the user, then it get the names from the user and save them in array in strings. After it, it sort the names in the array by abc and then print the names ordered. The program work good, but the problem is when I try to free the dynamic memory I defined.
Here is the code:
#include <stdio.h>
#include <string.h>
#define STR_LEN 51
void myFgets(char str[], int n);
void sortString(char** arr, int numberOfStrings);
int main(void)
{
int i = 0, numberOfFriends = 0, sizeOfMemory = 0;
char name[STR_LEN] = { 0 };
char** arrOfNames = (char*)malloc(sizeof(int) * sizeOfMemory);
printf("Enter number of friends: ");
scanf("%d", &numberOfFriends);
getchar();
for (i = 0; i < numberOfFriends; i++) // In this loop we save the names into the array.
{
printf("Enter name of friend %d: ", i + 1);
myFgets(name, STR_LEN); // Get the name from the user.
sizeOfMemory += 1;
arrOfNames = (char*)realloc(arrOfNames, sizeof(int) * sizeOfMemory); // Change the size of the memory to more place to pointer from the last time.
arrOfNames[i] = (char*)malloc(sizeof(char) * strlen(name) + 1); // Set dynamic size to the name.
*(arrOfNames[i]) = '\0'; // We remove the string in the currnet name.
strncat(arrOfNames[i], name, strlen(name) + 1); // Then, we save the name of the user into the string.
}
sortString(arrOfNames, numberOfFriends); // We use this function to sort the array.
for (i = 0; i < numberOfFriends; i++)
{
printf("Friend %d: %s\n", i + 1, arrOfNames[i]);
}
for (i = 0; i < numberOfFriends; i++)
{
free(arrOfNames[i]);
}
free(arrOfNames);
getchar();
return 0;
}
/*
Function will perform the fgets command and also remove the newline
that might be at the end of the string - a known issue with fgets.
input: the buffer to read into, the number of chars to read
*/
void myFgets(char str[], int n)
{
fgets(str, n, stdin);
str[strcspn(str, "\n")] = 0;
}
/*In this function we get array of strings and sort the array by abc.
Input: The array and the long.
Output: None*/
void sortString(char** arr, int numberOfStrings)
{
int i = 0, x = 0;
char tmp[STR_LEN] = { 0 };
for (i = 0; i < numberOfStrings; i++) // In this loop we run on all the indexes of the array. From the first string to the last.
{
for (x = i + 1; x < numberOfStrings; x++) // In this loop we run on the next indexes and check if is there smaller string than the currnet.
{
if (strcmp(arr[i], arr[x]) > 0) // If the original string is bigger than the currnet string.
{
strncat(tmp, arr[i], strlen(arr[i])); // Save the original string to temp string.
// Switch between the orginal to the smaller string.
arr[i][0] = '\0';
strncat(arr[i], arr[x], strlen(arr[x]));
arr[x][0] = '\0';
strncat(arr[x], tmp, strlen(tmp));
tmp[0] = '\0';
}
}
}
}
After the print of the names, when I want to free the names and the array, in the first try to free, I get an error of: "HEAP CORRUPTION DETECTED: after normal block(#87)". By the way, I get this error only when I enter 4 or more players. If I enter 3 or less players, the program work properly.
Why does that happen and what I should do to fix it?
First of all remove the unnecessary (and partly wrong) casts of the return value of malloc and realloc. In other words: replace (char*)malloc(... with malloc(..., and the same for realloc.
Then there is a big problem here: realloc(arrOfNames, sizeof(int) * sizeOfMemory) : you want to allocate an array of pointers not an array of int and the size of a pointer may or may not be the same as the size of an int. You need sizeof(char**) or rather the less error prone sizeof(*arrOfNames) here.
Furthermore this in too convoluted (but not actually wrong):
*(arrOfNames[i]) = '\0';
strncat(arrOfNames[i], name, strlen(name) + 1);
instead you can simply use this:
strcpy(arrOfNames[i], name);
Same thing in the sort function.
Keep your code simple.
But actually there are more problems in your sort function. You naively swap the contents of the strings (which by the way is inefficient), but the real problem is that if you copy a longer string, say "Walter" into a shorter one, say "Joe", you'll write beyond the end of the allocated memory for "Joe".
Instead of swapping the content of the strings just swap the pointers.
I suggest you take a pencil and a piece of paper and draw the pointers and the memory they point to.

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

Resources