Removing a substring from a string in C - c

I already have the code that removes a substring from a string (word) in C, but I don't understand it. Can someone explain it to me? It doesn't use functions from the standard library. I tried to analyze it myself, but certain parts I still don't understand - I put them in the comments. I just need to figure out how does this all work.
Thanks!
#include <stdio.h>
#include <stdlib.h>
void remove(char *s1, char *s2);
int main()
{
char s1[101], s2[101];
printf("First word: ");
scanf("%s", s1);
printf("Second word: ");
scanf("%s", s2);
remove(s1, s2);
printf("The first word after removing is '%s'.", s1);
return 0;
}
void remove(char *s1, char *s2)
{
int i = 0, j, k;
while (s1[i]) // ITERATES THROUGH THE FIRST STRING s1?
{
for (j = 0; s2[j] && s2[j] == s1[i + j]; j++); // WHAT DOES THIS LINE DO?
if (!s2[j]) // IF WE'RE AT THE END OF STRING s2?
{
for (k = i; s1[k + j]; k++) //WHAT DOES THIS ENTIRE BLOCK DO?
s1[k] = s1[k + j];
s1[k] = 0;
}
else
i++; // ???
}
}

Here main working of function is like :
-Skip the common part between both strings and assign the first string with new string.
while (s1[i]) // Yes It ITERATES THROUGH THE FIRST STRING s1
{
for (j = 0; s2[j] && s2[j] == s1[i + j]; j++); // Here it skips the part which is
//similar in both
As this loop just increasing the index of common part so this will skip storing of data in s1.
if (!s2[j]) // IF WE'RE AT THE END OF STRING s2
{
for (k = i; s1[k + j]; k++) //Here it is re assigning the non common part.
s1[k] = s1[k + j];
s1[k] = 0;
}
else
i++; // it is req. if both have more values.
}

The first while (s1[i]) iterates through s1. Yes, you are right.
for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);
The above for loop checks whether the substring s2 is present in s1 starting from s1[i]. If it matches, s2 is completely iterated. If not, at the end of the for loop, s2[j] will not be null character. Example: if s1 = ITERATE and s2 = RAT, then the loop will execute completely only when i=3.
so the if (!s2[j]) holds then it means we have found a substring and i is the starting point of the substring in s1.
for (k = i; s1[k + j]; k++) //WHAT DOES THIS ENTIRE BLOCK DO?
s1[k] = s1[k + j];
s1[k] = 0;
The abov block removes the substring. So, for the ITERATE and RAT example, this is done by copying E and null char at positions where R and A were present. The for loop achieves this. If s2[j] is not null after for loop, the i is incremented to check for substribng from the next position of s1.

Here is an approach of the functionality condensed in the comments
void remove(char *s1, char *s2)
{
int i = 0, j, k;
while (s1[i]) // Iterates through s1 (until it finds a zero)
{
for (j = 0; s2[j] && s2[j] == s1[i + j]; j++); // Iterates through s2 while both it is NOT the end of the string s2 and each character of s2 coincides with s1 (if s2 == s1, j points to the end of s2 => zero)
if (!s2[j]) // If j point to the end of s2 => We've found the coincidence
{
for (k = i; s1[k + j]; k++) //Remove the coincident substring
s1[k] = s1[k + j];
s1[k] = 0;
}
else
i++; // There is no coincidence so we continue to the next character of s1
}
}
Note: I also hace noticed that this may be easily exploted since it iterates out of s1 range.

Let's break it down. You have
while (s1[i])
{
// Code
}
This iterates through s1. Once you get to the end of the string, you have \0, which is the null terminator. When evaluated in a condition, it will evaluate to 0. It may have been better to use a for here.
You then have
for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);
This does nothing but increment j. It should be noted that this expression does not have braces and it terminated with a semicolon, so the code after it shouldn't be executed within the loop body. If it did have the braces correctly, it would loop over the following if/else while s2 was not null and s2[j] == s1[i+j]. I don't really have an explanation for the second part other than the character in s2 is offset by an amount i in s1. This part could likely be improved to remove unnecessary iterations.
Then there's
if (!s2[j])
{
}
else
{
}
This checks to make sure the position in s2 is valid and executes the removal of the string if so and otherwise increments i. It could be improved by returning in the else when s2 could no longer fit in the remainder of s1.
for (k = i; s1[k + j]; k++)
s1[k] = s1[k + j];
s1[k] = 0;
This is another somewhat strange loop since due to the absence of braces, s1[k] = 0 will be set outside of the loop. What happens here is that the string is compacted down by removing s2 and shifting the character at k+j down to k. At the end of the loop s1[k] = 0 ends the string in a null terminator to be properly ended.
If you want a deeper understanding, it may be worth trying to write your own code to do the same thing and then comparing afterwards. I have found that that generally helps more than reading a bunch of tests.

Related

anything wrong with this trim() method in C

this method is for trimming a string in C by deleting spaces from the beginning and end.
BTW iam not that good with C and actually facing some issues with dealding with strings
char* trim(char s[])
{
int i, j;
int size = strlen(s);
//index i will point to first char, while j will point to the last char
for(i = 0; s[i] == ' '; i++);
for(j = size - 1; s[j] == ' '; j--);
if(i > 0)
s[i - 1] = '\0';
if(j < size - 1)
s[j + 1] = '\0';
s = &s[i];
return s;
}
This loop
for(j = size - 1; s[j] == ' '; j--);
will access an out-of-bounds index when:
the input string consists entirely of spaces (e.g., " "), in which case there is nothing stopping j from reaching -1 and beyond, or
the input string is the empty string (""), where j starts at -1.
You need to guard against this in some way
for (j = size - 1; j >= 0 && s[j] == ' '; j--)
The other thing to consider is that you are both: modifying the contents of the original string buffer, and potentially returning an address that does not match the beginning of the buffer.
This is somewhat awkward, as the user must take care to keep the original value of s, as its memory contains the trimmed string (and might be the base pointer for allocated memory), but can only safely access the trimmed string from the returned pointer.
Some things to consider:
moving the trimmed string to the start of the original buffer.
returning a new, dynamically allocated string.
Some people have warned that you cannot pass a string literal to this function, which is true, but passing a string literal to a non-const argument is a terrible idea, in general.

Two null termination signs at the end of a string in C

I'm learning C via "The C Programming Language" book. During one of the exercises, where it's needed to concatenate two strings, I found that there are two null terminating signs (\0) at the end of a resulting string. Is this normal?
Code for the function:
void
copy_str_to_end(char *target, char *destination)
{
while (*destination != '\0')
++destination;
while ((*destination++ = *target++) != '\0')
;
}
Output:
This is a destination. This is a target. Here's everything seems to be OK, but if I run this function to test:
void
prcesc(char *arr)
{
int i;
for (i = 0; i <= strlen(arr) + 1; i++)
if (arr[i] != '\0')
printf("%c", arr[i]);
else
printf("E");
printf("\n");
}
The problem becomes visible: This is a destination. This is a target.EE (E means \0)
So, should I worry about this or not? And if yes, what's the reason for this thing to happen?
The problem is basically caused by the use of the <= operator instead of the < operator inside of the for loop condition:
i <= strlen(arr) + 1
strlen(arr) + 1 gives the amount of elements in the array, arr is pointing to in the caller (which actually contains the string).
When you use i <= strlen(arr) + 1 the loop iterates one time more than expected and you attempt to access an element beyond the bound of the array at the last iteration with
if (arr[i] != '\0')
since index counting starts at 0, not 1.
To access memory beyond the bounds of the array invokes undefined behavior.
The extra E in output is because you are running the while loop in prcesc function for i = strlen(arr) + 1 also. strlen returns the length of string say 'n'. So arr[n-1] is the last element of string and all elements from arr[n] are '\0'.
Hence as you are iterating for both arr[n], arr[n+1], you are getting two null characters.
The following function is what you need :
void
prcesc(char *arr)
{
int i;
for (i = 0; i <= strlen(arr); i++)
if (arr[i] != '\0')
printf("%c", arr[i]);
else
printf("E");
printf("\n");
}

2D string array is storing '\0' when it encounters a word with more than one space or digit

I am pretty new to C programming. My program is supposed to take a string and move it into a 2D array. With the words either being separated by a white-space or a digit. This works perfectly fine if there is one space or digit separating it. However, as soon as there is more than one it starts adding '\0' to my array.
//Move the string into a 2D array
for(i = 0; i < total + 1; i++)
{
if(isalpha( *(tempString + i) ))
{
sortingArray[n][j++] = tempString[i];
input++;
}
else
{
sortingArray[n][j++] = '\0';
n++;
j = 0;
}
if(tempString[i] == '\0')
break;
}
This is a sample of what happens (n = number of rows placed)
./a.out "one more way"
5 inputs
before
one
more
way
After
one
more
way
You need to skip consecutive delimiters:
for(i = 0; i < total; i++)
{
if(isalpha(tempString[i]))
{
sortingArray[n][j] = tempString[i];
++j;
++input;
}
else
{
// skip consecutive delimiters
while (i < total && !isalpha(tempString[i]))
++i;
sortingArray[n][j] = '\0';
++j
++n;
j = 0;
}
}
Disclaimer: not verified by a compiler. Use caution!
I also took the liberty of some improvements to your original code.
there is no sense to check for \0 if you have the length of the string.
changed *(tempString + i) to the clear tempString[i]
moved the increments out of the larger expressions into their own full expression. It is clearer this way.
It's a simple logic failure for which a debugger is ideal for identifying.
Imagine you have the string "hello world".
It stores "hello" into sortingArray[0] easily enough. When it gets to the first space it increments n and starts looking for the next word. But the next character it finds is another space so it increments n again.
A slight change is required to your logic
if(isalpha( *(tempString + i) ))
{
sortingArray[n][j++] = tempString[i];
input++;
}
else if(j>0)
{
sortingArray[n][j++] = '\0';
n++;
j = 0;
}
Now the code will only increment n if the previous character was a letter (by virtue of j being more than 0). Otherwise if it doesn't care and will keep going.
You should also check to see if j is non-zero after the loop as that means there is a new entry in sortingArray that needs a NUL added.
One thing also to note is that the way you're doing the for loop is a little odd. You have this
for(i = 0; i < total + 1; i++)
but also this inside the loop
if(tempString[i] == '\0')
break;
Typically, the way to terminate the for loop would be to write it like this
for(i = 0; tempString[i]!='\0'; i++)
as that way you firstly don't care about the length of the string, but the loop will finish when it hits the NUL character.

Rearranging string letters

I was doing a program to copy all string words other than its first 2 words and putting a x at the end of it.
However i cant put x at its end. Please help!!!!
Below is my code.
#include<stdio.h>
#include<string.h>
int main()
{
char a[25], b[25];
int i, j, count = 0, l, k;
scanf("%[^\n]s", a);
i = strlen(a);
if (i > 20)
printf("Given Sentence is too long.");
else
{/* checking for first 2 words and counting 2 spaces*/
for (j = 0; j < i; j++)
{
if (a[j] == ' ')
count = count + 1;
if (count == 2)
{
k = j;
break;
}
}
/* copying remaining string into new one*/
for (j = 0; j < i - k; j++)
{
b[j] = a[j + k];
}
b[j + 1] = 'x';
printf("%s", b);
}
}
you are removing first two index. But you wrote k=j and if you check the current value j there it's 1. so you are updating wrongly k because you removed 2 indexes. So k value should be 2. So checked the below code
/* copying remaining string into new one*/
for (j = 0; j < i - 2; j++)
{
b[j] = a[j + 2];
}
b[j + 1] = 'x';
printf("%s", b);
Your index is off by one. After your second loop, the condition j < i-k was false, so j now is i-k. Therefore, the character after the end of what you copied is b[j], not b[j+1]. The correct line would therefore be b[j] = 'x';.
Just changing this would leave you with something that is not a string. A string is defined as a sequence of char, ending with a '\0' char. So you have to add b[j+1] = 0; as well.
After these changes, your code does what you intended, but still has undefined behavior.
One problem is that your scanf() will happily overflow your buffer -- use a field width here: scanf("%24[^\n]", a);. And by the way, the s at the and doesn't make any sense, you use either the s conversion or the [] conversion.
A somewhat sensible implementation would use functions suited for the job, like e.g. this:
#include<stdio.h>
#include<string.h>
int main(void)
{
// memory is *cheap* nowadays, these buffers are still somewhat tiny:
char a[256];
char b[256];
// read a line
if (!fgets(a, 256, stdin)) return 1;
// and strip the newline character if present
a[strcspn(a, "\n")] = 0;
// find first space
char *space = strchr(a, ' ');
// find second space
if (space) space = strchr(space+1, ' ');
if (space)
{
// have two spaces, copy the rest
strcpy(b, space+1);
// and append 'x':
strcat(b, "x");
}
else
{
// empty string:
b[0] = 0;
}
printf("%s",b);
return 0;
}
For functions you don't know, google for man <function>.
In C strings are array of chars as you know and the way C knows it is end of the string is '\0' character. In your example you are missing at the last few lines
/* copying remaining string into new one*/
for(j=0;j<i-k;j++)
{
b[j]=a[j+k];
}
b[j+1]='x';
printf("%s",b);
after the loop ends j is already increased 1 before it quits the loop.
So if your string before x is "test", it is like
't', 'e', 's', 't','\0' in char array, and since your j is increased more than it should have, it gets to the point just right of '\0', but characters after '\0' doesnt matter, because it is the end, so your x will not be added. Simple change to
b[j]='x';

Why do I keep getting extra characters at the end of my string?

I have the string, "helLo, wORld!" and I want my program to change it to "Hello, World!". My program works, the characters are changed correctly, but I keep getting extra characters after the exclamation mark. What could I be doing wrong?
void normalize_case(char str[], char result[])
{
if (islower(str[0]) == 1)
{
result[0] = toupper(str[0]);
}
for (int i = 1; str[i] != '\0'; i++)
{
if (isupper(str[i]) == 1)
{
result[i] = tolower(str[i]);
}
else if (islower(str[i]) == 1)
{
result[i] = str[i];
}
if (islower(str[i]) == 0 && isupper(str[i]) == 0)
{
result[i] = str[i];
}
if (str[i] == ' ')
{
result[i] = str[i];
}
if (str[i - 1] == ' ' && islower(str[i]) == 1)
{
result[i] = toupper(str[i]);
}
}
}
You are not null terminating result so when you print it out it will keep going until a null is found. If you move the declaration of i to before the for loop:
int i ;
for ( i = 1; str[i] != '\0'; i++)
you can add:
result[i] = '\0' ;
after the for loop, this is assuming result is large enough.
Extra random-ish characters at the end of a string usually means you've forgotten to null-terminate ('\0') your string. Your loop copies everything up to, but not including, the terminal null into the result.
Add result[i] = '\0'; after the loop before you return.
Normally, you treat the isxxxx() functions (macros) as returning a boolean condition, and you'd ensure that you only have one of the chain of conditions executed. You'd do that with more careful use of else clauses. Your code actually copies str[i] multiple times if it is a blank. In fact, I think you can compress your loop to:
int i;
for (i = 1; str[i] != '\0'; i++)
{
if (isupper(str[i]))
result[i] = tolower(str[i]);
else if (str[i - 1] == ' ' && islower(str[i]))
result[i] = toupper(str[i]);
else
result[i] = str[i];
}
result[i] = '\0';
If I put result[i] outside of the for loop, won't the compiler complain about i?
Yes, it will. In this context, you need i defined outside the loop control, because you need the value after the loop. See the amended code above.
You might also note that your pre-loop code quietly skips the first character of the string if it is not lower-case, leaving garbage as the first character of the result. You should really write:
result[0] = toupper(str[0]);
so that result[0] is always set.
You should add a statement result[i] = '\0' at the end of the loop because in the C language, the string array should end with a special character '\0', which tells the compiler "this is the end of the string".
I took the liberty of simplifying your code as a lot of the checks you do are unnecessary. The others have already explained some basic points to keep in mind:
#include <stdio.h> /* for printf */
#include <ctype.h> /* for islower and the like */
void normalise_case(char str[], char result[])
{
if (islower(str[0]))
{
result[0] = toupper(str[0]); /* capitalise at the start */
}
int i; /* older C standards (pre C99) won't like it if you don't pre-declare 'i' so I've put it here */
for (i = 1; str[i] != '\0'; i++)
{
result[i] = str[i]; /* I've noticed that you copy the string in each if case, so I've put it here at the top */
if (isupper(result[i]))
{
result[i] = tolower(result[i]);
}
if (result[i - 1] == ' ' && islower(result[i])) /* at the start of a word, capitalise! */
{
result[i] = toupper(result[i]);
}
}
result[i] = '\0'; /* this has already been explained */
}
int main()
{
char in[20] = "tESt tHIs StrinG";
char out[20] = ""; /* space to store the output */
normalise_case(in, out);
printf("%s\n", out); /* Prints 'Test This String' */
return 0;
}

Resources