Debug string replace function in C - c

I tried to code a function which replace all string s1 to s2, in a given string s.
however, i don't know why my program stop at the line *p=0 in that replace function without any error reported? ##
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void replace(char * s, char * s1, char * s2) {
char * p; int l=strlen(s2);
while ((p=strstr(s,s1))) {
*p=0;
p+=l;
strcat(s,s2);
strcat(s,p);
}
}
int main(void) {
char *s=(char *)"cmd=ls+-la&abc=xyz";
replace (s, "+", " ");
printf("%s", s);
return EXIT_SUCCESS;
}

There are some problems with the replace function but, first of all, there is a big difference between a pointer to a constant char array vs a character array:
char *str = "some string";
Assigns str the address of the immutable character array (read-only), it does not copy the string, only pointers are involved. Any attempt to modify that string will result in undefined behavior.
char str[] = "some string";
In this case str is an array (of size big enough to hold the string + \0) that is initialized to that string, allowing the modification of individual characters within the array.
Back to your replace function.
I will start with the first thing that I saw which is your use of strstr and strcat inside the loop is highly inefficient. Every time you call strstr it starts from the beginning of the string and searches for the first occurrence of the second string all over, the same problem can be seen with strcat which needs to find the null-terminator every time.
Another issue I see is if the replacement string (s2) is longer than the original string (s1) you must shift the entire string to accommodate for the additional characters of the new string. The same issue will occur if the replacement string is shorter.
a basic method to replace a simple char might look like this:
while (*s)
{
if (*s == c1)
*s = c2;
++s;
}
a little more complex method to replace a string would be:
/* PRECONDITION: strlen(s1) == strlen(s2) */
int l = strlen(s2);
while (*s)
{
if (!strncmp(s, s1, l))
{
memcpy(s, s2, l);
s += l;
}
else
++s;
}

Your compiler is allowed to place string literals into read-only memory, which is probably what it did with s.
Try:
char s[] = "cmd=ls+-la&abc=xyz";
This changes s from a pointer to a string literal into an array initialized with your string.

Related

Can you explain what while(*++str1) and return (str1 - str2) does?

In this context, does the while loop work like a for loop? Also, what does the str1-str2 string subtraction result in?
#include <stdio.h>
int fun(char *str1) {
char *str2 = str1;
while (*++str1);
return (str1 - str2);
}
int main() {
char *str = "GeeksQuiz";
printf("%d", fun(str));
return 0;
}
Notice that you are working here with pointers and not strings, so starting from the end, str1-str2 is a pointers arithmetic.
As you know string should be ended with a null, so in the memory "GeeksQuiz" is actually an array of chars that has the next values: GeeksQuiz\0. In that way, while(*++str1); will run through the values of this array till it reaches \0.
To conclude, this function will return the number of chars in the string.
The purpose of the function is to calculate the length of a string. That is this while loop
while(*++str1);
iterates until the terminating zero character '\0' is encountered. It is supposed that after the while loop the pointer str1 will point to the terminating zero character '\0' while the pointer str2 will point to the beginning of the string due to the initial assignment
char *str2 = str1;
So the difference str1-str2 will yield the length of the string. The length of a string is determinate as the number of characters in the string before the terminating zero character '\0'.
However the function has a bug. If the user will pass an empty string "" that is internally represented as a character array with one element that is equal to the terminating zero character { '\0' } then the function invokes undefined behavior. So an empty string contains in its first character the terminating zero character '\0'. However in the while loop the pointer str1 at first incremented and then already the next character is checked whether it is the terminating zero character '\0'.
That is this while loop
while(*++str1);
may be rewritten the following way
while ( ( ++str1, *str1 != '\0' ) );
As it is seen at first the pointer str1 is incremented.
Apart from this defect the function parameter should have the qualifier const because within the function the passed string is not being changed. Also the return type of the function should be unsigned integer type as for example size_t (it is the return type of the standard C string function strlen that does the same task.)
The function can be declared an define the following way
size_t fun( const char *s )
{
const char *t = s;
while( *t ) ++t;
return t - s;
}
Here is a demonstrative program.
#include <stdio.h>
size_t fun( const char *s )
{
const char *t = s;
while( *t ) ++t;
return t - s;
}
int main(void)
{
const char *s = "";
printf( "The length of the string \"%s\" is equal to %zu\n", s, fun( s ) );
s = "1";
printf( "The length of the string \"%s\" is equal to %zu\n", s, fun( s ) );
s = "12";
printf( "The length of the string \"%s\" is equal to %zu\n", s, fun( s ) );
s = "123";
printf( "The length of the string \"%s\" is equal to %zu\n", s, fun( s ) );
return 0;
}
The program output is
The length of the string "" is equal to 0
The length of the string "1" is equal to 1
The length of the string "12" is equal to 2
The length of the string "123" is equal to 3
The loop while (*++str1); increments the pointer, reads the byte pointed to by the updated str1, and tests if this byte is null, if not it stops otherwise do nothing and repeat. This loop would be more readable with an explicit statement instead of an empty statement ;:
while (*++str1 != '\0')
continue;
return (str1 - str2); computes the difference of pointers str1 and str2 and returns this value as an int. The difference of 2 pointers is defined if they point to the same array and evaluates to the number of elements between them.
The function attempts to compute the length of the string argument but would fail for the empty string because str1 is always incremented before the test, hence would skip the null terminator at offset 0 for the empty string. The behavior is undefined as the code then reads beyond the end of the string. For non empty strings, It prints the number of non null characters, aka the length of the string: fun("GeeksQuiz") returns 9.
Here is a modified version:
#include <stdio.h>
int fun(const char *str) {
const char *start = str;
while (*str != '\0')
str++;
return str - start;
}
int main() {
const char *str = "GeeksQuiz";
printf("length of \"%s\" is %d\n", str, fun(str));
return 0;
}
I ran your code through clang-format [see Note 1] to confirm what I suspected about the weird while loop you've got going on there, and I came up with this as correct formatting for your code:
#include <stdio.h>
int fun(char *str1)
{
char *str2 = str1;
while (*++str1)
;
return (str1 - str2);
}
int main()
{
char *str = "GeeksQuiz";
printf("%d", fun(str));
return 0;
}
Personally, I would have written it like this though, to make the while loop super obvious. You can run this code here: https://onlinegdb.com/BkxlKb75GO.
#include <stdio.h>
int fun(char *str1)
{
char *str2 = str1;
while (*++str1)
{
// do nothing
}
return (str1 - str2);
}
int main()
{
char *str = "GeeksQuiz";
printf("%d", fun(str));
return 0;
}
Even more-readable, however, is this for the fun() function, which I've also renamed to count_num_chars_in_str():
int count_num_chars_in_str(char *str1)
{
char *str2 = str1;
while (*str1 != '\0')
{
str1++;
}
return str1 - str2;
}
A shorter name might be num_chars_in_str(), str_length(), or strlen(). strlen() already exists (see here and here), and this is precisely what it does. It can be included by header string.h, and is part of the C and C++ standard. Here's its description on cplusplus.com:
size_t strlen ( const char * str );
Get string length
Returns the length of the C string str.
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
This should not be confused with the size of the array that holds the string.
So, running any of these programs above, the output is 9. All the program does is count the number of non-null chars (where a null char is 0, or '\0'--same thing) in the string passed in, which is GeeksQuiz in this case. GeeksQuiz contains 9 non-null chars.
Where did you get your code by the way? Please post links and references. You should always reference your sources.
while (*++str) simply keeps incrementing the str pointer one char at a time until a null terminator (0) is found, which occurs right at the end of the string, after the last char in it. Once that happens, the difference between the two char pointers is taken, resulting in the difference between the address location of the null terminator right after the z, and the address location of the first char in the string, which is G. The difference in memory address between these 2 chars is 9 chars.
Not only is the original version less-readable, it also has a bug in it. For this test case, it should print 0, but it prints 1 instead:
char *str = "\0";
printf("%d", fun(str));
My more-readable version in count_num_chars_in_str() corrects this bug too.
Lesson: don't write unreadable or obfuscated code.
[Note 1] The way I ran it through clang-format is I just copy-pasted your original code into a main.c file, then copied that into my eRCaGuy_CodeFormatter repo here, then ran ./run_clang-format.sh.
In this context, does the while loop work like a for loop?
I would say that all while loops act like for loops, and vice versa.
Any time you have a loop
while(condition)
{ /* do something */; }
you can replace it by an equivalent for loop:
for(; condition; )
{ /* do something */; }
Going the other way, any time you have a for loop
for(initial_expression; test_expression; increment_expression)
{ /* do something */; }
you can (almost) replace it with an equivalent while loop:
initial_expression;
while(test_expression) {
/* do something */;
increment_expression;
}
(There's one small difference between the two, but it only shows up if you use a continue statement in the loop.)
If you were stranded on a desert island with a broken C compiler (or if you were stranded in the classroom of an instructor who likes to pose "trick" questions), and you had to write a C program without using the for keyword, you could: you could write all your loops using while instead, without loss of functionality.

Usage of pointers as parameters in the strcpy function. Trying to understand code from book

From my book:
void strcpy (char *s, char *t)
{
int i=0;
while ((s[i] = t[i]) != ’\0’)
++i;
}
I'm trying to understand this snippet of code from my textbook. They give no main function so I'm trying to wrap my head around how the parameters would be used in a call to the function. As I understand it, the "i-number" of characters of string t[ ] are being copied to the string s[ ] until there are no longer characters to read, from the \0 escape sequence. I don't really understand how the parameters would be defined outside of the function. Any help is greatly appreciated. Thank you.
Two things to remember here:
Strings in C are arrays of chars
Arrays are passed to functions as pointers
So you would call this like so:
char destination[16];
char source[] = "Hello world!";
strcpy(destination, source);
printf("%s", destination);
i is just an internal variable, it has no meaning outside the strcpy function (it's not a parameter or anything). This function copies the entire string t to s, and stops when it sees a \0 character (which marks the end of a string by C convention).
EDIT: Also, strcpy is a standard library function, so weird things might happen if you try to redefine it. Give your copy a new name and all will be well.
Here's a main for you:
int main()
{
char buf[30];
strcpy(buf, "Hi!");
puts(buf);
strcpy(buf, "Hello there.");
puts(buf);
}
The point of s and t are to accept character arrays that exist elsewhere in the program. They are defined elsewhere, at this level usually by the immediate caller or one more caller above. Their meanings are replaced at runtime.
Your get compile problems because your book is wrong. Should read
const strcpy (char *s, const char *t)
{
...
return s;
}
Where const means will not modify. Because strcpy is a standard function you really do need it to be correct.
Here is how you might use the function (note you should change the function name as it will conflict with the standard library)
void my_strcpy (char *s, char *t)
{
int i=0;
while ((s[i] = t[i]) != ’\0’)
++i;
}
int main()
{
char *dataToCopy = "This is the data to copy";
char buffer[81]; // This buffer should be at least big enough to hold the data from the
// source string (dataToCopy) plus 1 for the null terminator
// call your strcpy function
my_strcpy(buffer, dataToCopy);
printf("%s", buffer);
}
In the code, the i variable is pointing to the character in the character array. So when i is 0 you are pointing to the first character of s and t. s[i] = t[i]copies the i'th character from t to the i'th character of s. This assignment in C is self an expression and returns the character that was copied, which allows you to compare that to the null terminator 0 ie. (s[i] = t[i]) != ’\0’ which indicates the end of the string, if the copied character is not a null terminator the loop continues otherwise it will end.

Using strcat() with a char from a string?

I'm trying to append characters from a string into a new string. For the code below:
int main(void)
{
char s1[] = "cat";
char s2[] = "hsklsdksdhkjadsfhjkld";
strcat(s1, &s2[1]);
printf("s1 is now %s\n", s1);
}
Why is the output catsklsdksdhkjadsfhjkld and not cats? Why is the whole string added, instead of just the 's' located at s2[1]?
Thanks.
since a char * is only a pointer to the start of a string; C supposes the end of the string is a \0 character. So all characters are added until he meets the \0 character
you suppose &s2[1] points to "s", (which is true), but since it is a char pointer, it points to the whole char array, until the \0 character at the end. Try this for example:
printf("%s\n", &s2[1]);
which will yield:
sklsdksdhkjadsfhjkld
from the reference of strcat:
Concatenate strings
Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
EDIT
if you want to add only one (or some) characters, use strncat:
strncat(s1, &s2[1], 1 /*number of chars to append*/);
EDIT2
make sure your char arrays are large enough, as suggested by #PaulR:
char s1[32] = "cat";
Both arguments to strcat are pointers to char objects, and both are assumed to point to (the initial character of) a string.
For example, this:
strcat(s1, &s2[0]);
is equivalent to this:
strcat(s1, s2);
In both cases, the second argument is a pointer to the initial character of a string, which strcat uses to access the entire string up to and including the terminating '\0' null character.
But your program has undefined behavior. By declaring
char s1[] = "cat";
you let the compiler determine the size of s1 based on the initializer. In this case, s1 is an array of 4 chars (including the terminating '\0'). There is no room to append anything to it. Apparently when you ran it, it copied characters into the memory space immediately following s1, which is why it seemed to "work".
A working version of your program is:
#include <stdio.h>
#include <string.h>
int main(void)
{
char s1[50] = "cat";
char s2[] = "hsklsdksdhkjadsfhjkld";
strcat(s1, &s2[1]);
printf("s1 is now %s\n", s1);
}
Note the #include directives. These are not optional, even though you might get away with omitting them.
You can use strncat instead of strcat:
#include <stdio.h>
#include <string.h>
int main(void)
{
char s1[32] = "cat"; // NB: s1 needs to be large enough to hold additional chars !
char s2[32] = "hsklsdksdhkjadsfhjkld";
strncat(s1, &s2[1], 1);
printf("s1 is now %s\n", s1);
return 0;
}
LIVE DEMO
Because strcat() takes two string arguments, and because &s2[1] isn't a character, it's the string "sklsdksdhkjadsfhjkld";
So strcat(s1, &s2[1]); concatenates
"cat" + "sklsdksdhkjadsfhjkld"
Giving the result
"catsklsdksdhkjadsfhjkld"
If you want to append a single character you could do this
len = strlen (s1);
s1[len] = s2[1];
s1[len+1] = '\0'; // new string terminator
but you would have to ensure there is enough array (string) space available.

remstr function in C

I want to create a function called remstr(). This function removes a given string from another string without using string.h. Example:
str1[30]= "go over stackover"
str2[20]= "ver"
strrem[20]= "go o stacko"
Please help me
C gives you lots of useful building blocks for doing this. In particular, you can build this function using three standard library functions: strstr (to find the string you want to remove), strlen to compute the length of the rest of the string, and memcpy to copy the parts you don't want to delete into the destination (you'll need to use memmove instead of memcpy if you want the function to operate in place). All three functions are declared in <string.h>.
Take a crack at writing the function, and ask specific questions if and when you run into trouble.
The pseudo code is pretty straight forward for what you want to do, and if you can't use string.h functions then you just have to recreate them.
char * remstr(char *str1, char * str2)
{
get length of str1
get length of str2
for(count from 0 to length of str2 - length of str1) {
if ( str1[count] != str2[count])
store str2[count] in to your new string
else
loop for the length of str1 to see if all character match
hold on to them in case they don't and you need to add them into you
new string
}
return your new string
}
You need to figure out the details, does remstr() allocate memory for the new string? Does it take an existing string and update it? What is the sentinel character of your strings?
You'll need a strlen() for this to work, since you can't use it you need to make something like:
int mystrlen(char* str) {
while not at the sentinel character
increment count
return count
}
#include <stdio.h>
#include <stdlib.h>
void remstr(char *str1, char *str2, char *strrem)
{
char *p1, *p2;
if (!*str2) return;
do {
p2 = str2;
p1 = str1;
while (*p1 && *p2 && *p1==*p2) {
p1++;
p2++;
}
if (!(*p2)) str1 = p1-1;
else *strrem++ = *str1;
} while(*str1 && *(++str1));
*strrem = '\0';
}
int main() {
char str1[30]= "go over stackover";
char str2[20]= "ver";
char strrem[30];
remstr(str1, str2, strrem);
printf("%s\n",strrem);
}
with this function you can even put the result in the same string buffer str1:
remstr(str1, str2, str1);
printf("%s\n",str1);

In C - check if a char exists in a char array

I'm trying to check if a character belongs to a list/array of invalid characters.
Coming from a Python background, I used to be able to just say :
for c in string:
if c in invalid_characters:
#do stuff, etc
How can I do this with regular C char arrays?
The less well-known but extremely useful (and standard since C89 — meaning 'forever') functions in the C library provide the information in a single call. Actually, there are multiple functions — an embarrassment of riches. The relevant ones for this are:
7.21.5.3 The strcspn function
Synopsis
#include <string.h>
size_t strcspn(const char *s1, const char *s2);
Description
The strcspn function computes the length of the maximum initial segment of the string
pointed to by s1 which consists entirely of characters not from the string pointed to by
s2.
Returns
The strcspn function returns the length of the segment.
7.21.5.4 The strpbrk function
Synopsis
#include <string.h>
char *strpbrk(const char *s1, const char *s2);
Description
The strpbrk function locates the first occurrence in the string pointed to by s1 of any
character from the string pointed to by s2.
Returns
The strpbrk function returns a pointer to the character, or a null pointer if no character
from s2 occurs in s1.
The question asks about 'for each char in string ... if it is in list of invalid chars'.
With these functions, you can write:
size_t len = strlen(test);
size_t spn = strcspn(test, "invald");
if (spn != len) { ...there's a problem... }
Or:
if (strpbrk(test, "invald") != 0) { ...there's a problem... }
Which is better depends on what else you want to do. There is also the related strspn() function which is sometimes useful (whitelist instead of blacklist).
The equivalent C code looks like this:
#include <stdio.h>
#include <string.h>
// This code outputs: h is in "This is my test string"
int main(int argc, char* argv[])
{
const char *invalid_characters = "hz";
char *mystring = "This is my test string";
char *c = mystring;
while (*c)
{
if (strchr(invalid_characters, *c))
{
printf("%c is in \"%s\"\n", *c, mystring);
}
c++;
}
return 0;
}
Note that invalid_characters is a C string, ie. a null-terminated char array.
Assuming your input is a standard null-terminated C string, you want to use strchr:
#include <string.h>
char* foo = "abcdefghijkl";
if (strchr(foo, 'a') != NULL)
{
// do stuff
}
If on the other hand your array is not null-terminated (i.e. just raw data), you'll need to use memchr and provide a size:
#include <string.h>
char foo[] = { 'a', 'b', 'c', 'd', 'e' }; // note last element isn't '\0'
if (memchr(foo, 'a', sizeof(foo)))
{
// do stuff
}
use strchr function when dealing with C strings.
const char * strchr ( const char * str, int character );
Here is an example of what you want to do.
/* strchr example */
#include <stdio.h>
#include <string.h>
int main ()
{
char invalids[] = ".#<>#";
char * pch;
pch=strchr(invalids,'s');//is s an invalid character?
if (pch!=NULL)
{
printf ("Invalid character");
}
else
{
printf("Valid character");
}
return 0;
}
Use memchr when dealing with memory blocks (as not null terminated arrays)
const void * memchr ( const void * ptr, int value, size_t num );
/* memchr example */
#include <stdio.h>
#include <string.h>
int main ()
{
char * pch;
char invalids[] = "#<>#";
pch = (char*) memchr (invalids, 'p', strlen(invalids));
if (pch!=NULL)
printf (p is an invalid character);
else
printf ("p valid character.\n");
return 0;
}
http://www.cplusplus.com/reference/clibrary/cstring/memchr/
http://www.cplusplus.com/reference/clibrary/cstring/strchr/
You want
strchr (const char *s, int c)
If the character c is in the string s it returns a pointer to the location in s. Otherwise it returns NULL. So just use your list of invalid characters as the string.
strchr for searching a char from start (strrchr from the end):
char str[] = "This is a sample string";
if (strchr(str, 'h') != NULL) {
/* h is in str */
}
I believe the original question said:
a character belongs to a list/array of
invalid characters
and not:
belongs to a null-terminated string
which, if it did, then strchr would indeed be the most suitable answer. If, however, there is no null termination to an array of chars or if the chars are in a list structure, then you will need to either create a null-terminated string and use strchr or manually iterate over the elements in the collection, checking each in turn. If the collection is small, then a linear search will be fine. A large collection may need a more suitable structure to improve the search times - a sorted array or a balanced binary tree for example.
Pick whatever works best for you situation.

Resources