Optimizing a do while loop/for loop in C - c

I was doing an exercise from LeetCode in which consisted in deleting any adjacent elements from a string, until there are only unique characters adjacent to each other. With some help I could make a code that can solve most testcases, but the string length can be up to 10^5, and in a testcase it exceeds the time limit, so I'm in need in some tips on how can I optimize it.
My code:
char res[100000]; //up to 10^5
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
ver = 0;
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
//clar the res string
memset(res, '\0', sizeof res);
} while (ver > 0);
return s;
}
The code can't pass a speed test that has a string that has around the limit (10^5) length, I won't put it here because it's a really big text, but if you want to check it, it is the 104 testcase from the LeetCode Daily Problem

If it was me doing something like that, I would basically do it like a simple naive string copy, but keep track of the last character copied and if the next character to copy is the same as the last then skip it.
Perhaps something like this:
char result[1000]; // Assumes no input string will be longer than this
unsigned source_index; // Index into the source string
unsigned dest_index; // Index into the destination (result) string
// Always copy the first character
result[0] = source_string[0];
// Start with 1 for source index, since we already copies the first character
for (source_index = 1, dest_index = 0; source_string[source_index] != '\0'; ++source_index)
{
if (source_string[source_index] != result[dest_index])
{
// Next character is not equal to last character copied
// That means we can copy this character
result[++dest_index] = source_string[source_index];
}
// Else: Current source character was equal to last copied character
}
// Terminate the destination string
result[dest_index + 1] = '\0';

Related

C Problems with a do while loop in a code that delete duplicate chars

I'm a beginner programmer that is learning C and I'm doing some exercises on LeetCode but I ran with a problem with today's problem, I'm going to put the problem bellow but my difficult is on a do while loop that I did to loop the deleting function if there are adjacent duplicates, my code can delete the duplicates on the first iteration, but it's not looping to do any subsequent tasks, if anyone could help my I would be grateful.
The LeetCode Daily Problem (10/11/2022):
You are given a string s consisting of lowercase English letters. A duplicate removal consists of choosing two adjacent and equal letters and removing them.
We repeatedly make duplicate removals on s until we no longer can.
Return the final string after all such duplicate removals have been made. It can be proven that the answer is unique.
Example 1:
Input: s = "abbaca"
Output: "ca"
Explanation:
For example, in "abbaca" we could remove "bb" since the letters are adjacent and equal, and this is the only possible move. The result of this move is that the string is "aaca", of which only "aa" is possible, so the final string is "ca".
Example 2:
Input: s = "azxxzy"
Output: "ay"
Constraints:
1 <= s.length <= 105
s consists of lowercase English letters.
My code (testcase: "abbaca"):
char res[100]; //awnser
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
int ver = 0;
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
} while (ver > 0);
return res;
}
The fuction returns "aaca".
I did some tweaking with the code and found that after the loop the ver variable always return to 0, but I don't know why.
I see two errors
res isn't terminated so the strcpy may fail
You have two definitions of int ver so the one being incremented is not the one being checked by while (ver > 0); In other words: The do-while only executes once.
Based on your code it can be fixed like:
char res[100]; //awnser
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
ver = 0; // <--------------- Changed
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
res[j] = '\0'; // <---------------- Changed
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
} while (ver > 0);
return res;
}

C-Lang: Segmentation Fault when working with string in for-loop

Quite recently, at the university, we began to study strings in the C programming language, and as a homework, I was given the task of writing a program to remove extra words.
While writing a program, I faced an issue with iteration through a string that I could solve in a hacky way. However, I would like to deal with the problem with your help, since I cannot find the error myself.
The problem is that when I use the strlen(buffer) function as a for-loop condition, the code compiles easily and there are no errors at runtime, although when I use the __act_buffer_len variable, which is assigned a value of strlen(buffer) there will be a segmentation fault at runtime.
I tried many more ways to solve this problem, but the only one, which I already described, worked for me.
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
int __act_buffer_len = strlen(buffer);
// for debugging purposes
printf("__actbuff: %d\n", __act_buffer_len);
printf("sizeof: %d\n", sizeof(buffer));
printf("strlen: %d\n", strlen(buffer));
char* _newbuff = malloc(__act_buffer_len + 1); // <- new buffer without words with less than 2 unique words
char* _tempbuff; // <- used to store current word
int beg_point = 0;
int curr_wlen = 0;
for (int i = 0; i < strlen(buffer); i++) // no errors at runtime, app runs well
// for (int i = 0; i < __act_buffer_len; i++) // <- segmentation fault when loop is reaching a space character
// for (int i = 0; buffer[i] != '\0'; i++) // <- also segmentation fault at the same spot
// for (size_t i = 0; i < strlen(buffer); i++) // <- even this gives a segmentation fault which is totally confusing for me
{
printf("strlen in loop %d\n", i);
if (buffer[i] == delim)
{
char* __cpy;
memcpy(__cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t __letters = __get_letters(__cpy, curr_wlen); // <- will return number of unique letters in word
if (__letters > 2) // <- will remove all the words with less than 2 unique letters
{
strcat(_newbuff, __cpy);
strcat(_newbuff, " ");
}
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
}
else curr_wlen++;
}
return _newbuff;
}
In short, the code above just finds delimiter character in string and counts the number of unique letters of the word before this delimiter.
My fault was in not initializing a __cpy variable.
Also, as #n.1.8e9-where's-my-sharem. stated, I shouldn't name vars with two underscores.
The final code:
// deletes words with <= 2 letters
char* _delete_odd(const char* buffer, char delim)
{
size_t _act_buffer_len = strlen(buffer);
char* _newbuff = malloc(_act_buffer_len); // <- new buffer without words with less than 2 unique words
int beg_point = 0;
int curr_wlen = 0;
for (size_t i = 0; i < _act_buffer_len; i++)
{
if (buffer[i] == delim)
{
char* _cpy = malloc(curr_wlen);
memcpy(_cpy, &buffer[beg_point], curr_wlen); // <- will copy a string starting from the beginning of the word til its end
// this may be commented for testing purposes
__uint32_t _letters = _get_letters(_cpy, curr_wlen); // <- will return number of unique letters in word
if (_letters > 2) // <- will remove all the words with less than 2 unique letters
strcat(_newbuff, _cpy);
beg_point = i + 1; // <- will point on the first letter of the word
curr_wlen = buffer[beg_point] == ' ' ? 0 : 1; // <- if the next symbol after space is another space, than word length should be 0
free(_cpy);
}
else curr_wlen++;
}
return _newbuff;
}
Thanks for helping me

Right algorithm but wrong implementation?

I am a beginner in C with some experience in python and java. I want to solve a problem with C. The problem goes like this:
Take an input as a sentence with words separated by blank spaces only (assume lower case only), re-write the sentence with the following rules:
1) If a word occurs the first time, keep it the same.
2) If the word occurs twice, replace the second occurrence with the word being copied twice (e.g. two --> twotwo).
3) If the word occurs three times or more, delete all the occurrences after the second one.
Print the output as a sentence. The maximum lengths of the input sentence and each individual word is 500 chars and 50 chars.
Exemplary input: jingle bells jingle bells jingle all the way
Exemplary output: jingle bells jinglejingle bellsbells all the way
The approach I take is:
1) Read the input, separate each word and put them into an array of char pointers.
2) Use nested for loop to go through the array. For each word after the first word:
A - If there is no word before it that is equal to it, nothing happens.
B - If there is already one word before it that is equal to it, change the word as its "doubled form".
C - If there is already a "doubled form" of itself that exists before it, delete the word (set the element to NULL.
3) Print the modified array.
I am fairly confident about the correctness of this approach. However, when I actually write the code:
'''
int main()
{
char input[500];
char *output[500];
// Gets the input
printf("Enter a string: ");
gets(input);
// Gets the first token, put it in the array
char *token = strtok(input, " ");
output[0] = token;
// Keeps getting tokens and filling the array, untill no blank space is found
int i = 1;
while (token != NULL) {
token = strtok(NULL, " ");
output[i] = token;
i++;
}
// Processes the array, starting from the second element
int j, k;
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
for (k = 0; k < j; k++) {
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
if (strcmp(output[k], doubled) == 0) { // Situation C
output[j] = ' ';
}
}
}
// Convert the array to a string
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
}
else { // If reaches the end of the sentence
break;
}
}
// Prints out the result string
printf("%s", result);
return 0;
}
'''
I did a bunch of tests on each individual block. There are several issues:
1) When processing the array, strcmp, strcat, and strcpy in the loop seem to give Segmentation fault error reports.
2) When printing the array, the words did not show the order that they are supposed to do.
I am now frustrated because it seems that the issues are all coming from some internal structural defects of my code and they are very much related to the memory mechanism of C which I am not really too familiar with. How should I fix this?
One issues jumps out at me. This code is wrong:
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
doubled doesn't point to any actual memory. So trying to copy data to where it points is undefined behavior and will almost certainly cause a SIGSEGV - and it will corrupt memory if it doesn't cause a SIGSEGV.
That needs to be fixed - you can't copy a string with strcpy() or strcat() to an pointer that doesn't point to actual memory.
This would be better, but still not ideal as no checking is done to ensure there's no buffer overflow:
char doubled[ 2000 ];
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
This is also a problem with doubled defined like that:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
That just points output[j] at doubled. The next loop iteration will overwrite doubled, and the data that output[j] still points to to will get changed.
This would address that problem:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = strdup( doubled );
}
strdup() is a POSIX function that, unsurprising, duplicates a string. That string will need to be free()'d later, though, as strdup() is the same as:
char *strdup( const char *input )
{
char *duplicate = malloc( 1 + strlen( input ) );
strcpy( duplicate, input );
return( duplicate );
}
As pointed out, strcat(doubled, doubled); is also a problem. One possible solution:
memmove(doubled + strlen( doubled ), doubled, 1 + strlen( doubled ) );
That copies the contents of the doubled string to the memory starting at the original '\0' terminator. Note that since the original '\0' terminator is part of the string, you can't use strcpy( doubled + strlen( doubled ), doubled );. Nor can you use memcpy(), for the same reason.
Your code invokes undefined behavior in several places by writing to memory that is not yet owned, and is likely the cause of the segmentation fault you are seeing. For example, in this segment:
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
By itself...
char *result = output[0]; //creates a pointer, but provides no memory.
...is not sufficient to receive content such as
strcat(result, blank_space);
strcat(result, output[l]);
It needs memory:
char *result = malloc(501);//or more space if needed.
if(result)
{
//now use strcat or strcpy to add content

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

Returning the length of a char array in C

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

Resources