How does strchr implementation work - c

I tried to write my own implementation of the strchr() method.
It now looks like this:
char *mystrchr(const char *s, int c) {
while (*s != (char) c) {
if (!*s++) {
return NULL;
}
}
return (char *)s;
}
The last line originally was
return s;
But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?

I believe this is actually a flaw in the C Standard's definition of the strchr() function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)
Here's what the C standard says:
char *strchr(const char *s, int c);
The strchr function locates the first occurrence of c
(converted to a char) in the string pointed to by s. The
terminating null character is considered to be part of the string.
Which means that this program:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *s = "hello";
char *p = strchr(s, 'l');
*p = 'L';
return 0;
}
even though it carefully defines the pointer to the string literal as a pointer to const char, has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.
The problem is that strchr() takes a const char* argument, which means it promises not to modify the data that s points to -- but it returns a plain char*, which permits the caller to modify the same data.
Here's another example; it doesn't have undefined behavior, but it quietly modifies a const qualified object without any casts (which, on further thought, I believe has undefined behavior):
#include <stdio.h>
#include <string.h>
int main(void) {
const char s[] = "hello";
char *p = strchr(s, 'l');
*p = 'L';
printf("s = \"%s\"\n", s);
return 0;
}
Which means, I think, (to answer your question) that a C implementation of strchr() has to cast its result to convert it from const char* to char*, or do something equivalent.
This is why C++, in one of the few changes it makes to the C standard library, replaces strchr() with two overloaded functions of the same name:
const char * strchr ( const char * str, int character );
char * strchr ( char * str, int character );
Of course C can't do this.
An alternative would have been to replace strchr by two functions, one taking a const char* and returning a const char*, and another taking a char* and returning a char*. Unlike in C++, the two functions would have to have different names, perhaps strchr and strcchr.
(Historically, const was added to C after strchr() had already been defined. This was probably the only way to keep strchr() without breaking existing code.)
strchr() is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:
void *memchr(const void *s, int c, size_t n);
char *strchr(const char *s, int c);
char *strpbrk(const char *s1, const char *s2);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);
(all declared in <string.h>) and:
void *bsearch(const void *key, const void *base,
size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
(declared in <stdlib.h>). All these functions take a pointer to const data that points to the initial element of an array, and return a non-const pointer to an element of that array.

The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established.
The reationale here is simple: strchr by itself is a non-modifying operation. Yet we need strchr functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness.
In C++ you wild be able to use function overloading by declaring two functions with the same name
const char *strchr(const char *s, int c);
char *strchr(char *s, int c);
In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like
const char *strchr_c(const char *s, int c);
char *strchr(char *s, int c);
Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function
char *strchr(const char *s, int c);
which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", i.e. uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly.
I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing.
Now, as for your implementation of strchr, it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner if to catch the early termination condition
for (; *s != '\0'; ++s)
if (*s == c)
return (char *) s;
return NULL;
But things like that are always a matter of personal preference. Someone might prefer to just
for (; *s != '\0' && *s != c; ++s)
;
return *s == c ? (char *) s : NULL;
Some might say that modifying function parameter (s) inside the function is a bad practice.

The const keyword means that the parameter cannot be modified.
You couldn't return s directly because s is declared as const char *s and the return type of the function is char *. If the compiler allowed you to do that, it would be possible to override the const restriction.
Adding a explicit cast to char* tells the compiler that you know what you're doing (though as Eric explained, it would be better if you didn't do it).
UPDATE: For the sake of context I'm quoting Eric's answer, since he seems to have deleted it:
You should not be modifying s since it is a const char *.
Instead, define a local variable that represents the result of type char * and use that in place of s in the method body.

The Function Return Value should be a Constant Pointer to a Character:
strchr accepts a const char* and should return const char* also. You are returning a non constant which is potentially dangerous since the return value points into the input character array (the caller might be expecting the constant argument to remain constant, but it is modifiable if any part of it is returned as as a char * pointer).
The Function return Value should be NULL if No matching Character is Found:
Also strchr is supposed to return NULL if the sought character is not found. If it returns non-NULL when the character is not found, or s in this case, the caller (if he thinks the behavior is the same as strchr)
might assume that the first character in the result actually matches (without the NULL return value
there is no way to tell whether there was a match or not).
(I'm not sure if that is what you intended to do.)
Here is an Example of a Function that Does This:
I wrote and ran several tests on this function; I added a few really obvious sanity checks to avoid potential crashes:
const char *mystrchr1(const char *s, int c) {
if (s == NULL) {
return NULL;
}
if ((c > 255) || (c < 0)) {
return NULL;
}
int s_len;
int i;
s_len = strlen(s);
for (i = 0; i < s_len; i++) {
if ((char) c == s[i]) {
return (const char*) &s[i];
}
}
return NULL;
}

You're no doubt seeing compiler errors anytime you write code that tries to use the char* result of mystrchr to modify the string literal being passed to mystrchr.
Modifying string literals is a security no-no, because it can lead to abnormal program termination and possibly denial-of-service attacks. Compilers may warn you when you pass a string literal to a function taking char*, but they aren't required to.
How do you use strchr correctly? Let's look at an example.
This is an example of what not to do:
#include <stdio.h>
#include <string.h>
/** Truncate a null-terminated string $str starting at the first occurence
* of a character $c. Return the string after truncating it.
*/
const char* trunc(const char* str, char c){
char* pc = strchr(str, c);
if(pc && *pc && *(pc+1)) *(pc+1)=0;
return str;
}
See how it modifies the string literal str via the pointer pc? That's no bueno.
Here's the right way to do it:
#include <stdio.h>
#include <string.h>
/** Truncate a null-terminated string $str of $sz bytes starting at the first
* occurrence of a character $c. Write the truncated string to the output buffer
* $out.
*/
char* trunc(size_t sz, const char* str, char c, char* out){
char* c_pos = strchr(str, c);
if(c_pos){
ptrdiff_t c_idx = c_pos - str;
if((size_t)n < sz){
memcpy(out, str, c_idx); // copy out all chars before c
out[c_idx]=0; // terminate with null byte
}
}
return 0; // strchr couldn't find c, or had serious problems
}
See how the pointer returned by strchr is used to compute the index of the matching character in the string? The index (also equal to the length up to that point, minus one) is then used to copy the desired part of the string to the output buffer.
You might think "Aw, that's dumb! I don't want to use strchr if it's just going to make me memcpy." If that's how you feel, I've never run into a use case of strchr, strrchr, etc. that I couldn't get away with using a while loop and isspace, isalnum, etc. Sometimes it's actually cleaner than using strchr correctly.

Related

is it possible to use one string function inside another string function in c

Is it possible to use one string function inside another string function. See below....
strcmp(string1, strupr(string2));
Here I have used strupr() function to make the all characters of string2 in uppercase then It will use to compare with string1 by strcmp() function.
Below is my program...
#include <stdio.h>
#include <string.h>
int main()
{
char str[41], name[43];
gets(str);
gets(name);
if (strcmp(strupr(str), name) == 0)
printf("\nwow it works.");
return 0;
}
Following is the error shown by compilor..
use of undeclared identifier 'strupr'; did you mean 'strdup'?
To condense the comments into an answer:
As long as the return type of the inner function matches (or is compatible with) the argument type of the outer function, this will work.
A simple example, which will print the number 3.
int add1(int a) {
return a + 1;
}
int main(void) {
printf("%d\n", add1(add1(add1(0))));
}
So as long as your strupr function returns a char * (or const char *, etc.) it will work as the function prototype for strcmp is:
int strcmp(const char *s1, const char *s2);
To create a basic function to uppercase an entire string, you simply loop through the string character by character and use the toupper function on each character. There are several ways to implement this loop.
Here is one such way:
char *upcase(char *s) {
for (char *p = s; p && *p; p++)
*p = toupper(*p);
return s;
}
Note that this is a destructive function, which alters the contents of the string passed to it. It will not work with static strings (e.g., const char *foo = "abcdefg";).
Since your aim seems to compare the two strings in a case insensitive fashion, you should use stricmp:
if (stricmp(str, name) == 0)
printf("\nwow it works.");
It has the additional benefits of not either changing your original string or allocating a new string you will have to free().

Usage of pointers as parameters in the strcpy function. Trying to understand code from book

From my book:
void strcpy (char *s, char *t)
{
int i=0;
while ((s[i] = t[i]) != ’\0’)
++i;
}
I'm trying to understand this snippet of code from my textbook. They give no main function so I'm trying to wrap my head around how the parameters would be used in a call to the function. As I understand it, the "i-number" of characters of string t[ ] are being copied to the string s[ ] until there are no longer characters to read, from the \0 escape sequence. I don't really understand how the parameters would be defined outside of the function. Any help is greatly appreciated. Thank you.
Two things to remember here:
Strings in C are arrays of chars
Arrays are passed to functions as pointers
So you would call this like so:
char destination[16];
char source[] = "Hello world!";
strcpy(destination, source);
printf("%s", destination);
i is just an internal variable, it has no meaning outside the strcpy function (it's not a parameter or anything). This function copies the entire string t to s, and stops when it sees a \0 character (which marks the end of a string by C convention).
EDIT: Also, strcpy is a standard library function, so weird things might happen if you try to redefine it. Give your copy a new name and all will be well.
Here's a main for you:
int main()
{
char buf[30];
strcpy(buf, "Hi!");
puts(buf);
strcpy(buf, "Hello there.");
puts(buf);
}
The point of s and t are to accept character arrays that exist elsewhere in the program. They are defined elsewhere, at this level usually by the immediate caller or one more caller above. Their meanings are replaced at runtime.
Your get compile problems because your book is wrong. Should read
const strcpy (char *s, const char *t)
{
...
return s;
}
Where const means will not modify. Because strcpy is a standard function you really do need it to be correct.
Here is how you might use the function (note you should change the function name as it will conflict with the standard library)
void my_strcpy (char *s, char *t)
{
int i=0;
while ((s[i] = t[i]) != ’\0’)
++i;
}
int main()
{
char *dataToCopy = "This is the data to copy";
char buffer[81]; // This buffer should be at least big enough to hold the data from the
// source string (dataToCopy) plus 1 for the null terminator
// call your strcpy function
my_strcpy(buffer, dataToCopy);
printf("%s", buffer);
}
In the code, the i variable is pointing to the character in the character array. So when i is 0 you are pointing to the first character of s and t. s[i] = t[i]copies the i'th character from t to the i'th character of s. This assignment in C is self an expression and returns the character that was copied, which allows you to compare that to the null terminator 0 ie. (s[i] = t[i]) != ’\0’ which indicates the end of the string, if the copied character is not a null terminator the loop continues otherwise it will end.

c4133 warning c code

I have a problem with warning c4133, it say that the problem is incopatible types from char* to int*, I try to cast the pointer ((char*)x) but with no luck, maybe someone knows whats the problem/
this is my program, the problem in the function.
void replaceSubstring(char *str, char *substr)//function that gets string and substring and make the substring in string big letters
{
int i;
int *x;
x = (strstr(str, substr));//The problem line
while (strstr(str,substr) != NULL)
{
for (i=0;i<strlen(substr);i++)
{
*x = *x - 32;
x++;//move to the next char
}
x = (strstr(str, substr)); //first apear of substr int str
}
}
in your code, x is defined as int * but the return type of strstr() is char *.. Check the man page here.
Worthy to mention, casting is usually considered a bad practice in c. With properly written code, most of the cases, casting can be avoided. Most of the cases, casting may introduce lot of bugs. Double check the data types and stay away from casting.
Sidenote: Just a suggestion, maybe before directly subtracting 32 from *x, maybe you want to perform a range check on *x to be within 97-122 to ensure that is a lower-case letter.
Also, better to perform strlen(substr) outside the loop, store into a variable and use that value in looping. Will save you the overhead of redundant calls to strlen().
Prototype of strstr is
char *strstr(const char *haystack, const char *needle);
Return value is character pointer. So make the x as a character pointer.

Why is the param of strlen a "const"?

I'm learning the C language.
My question is:
Why is the param of strlen a "const" ?
size_t strlen(const char * string);
I'm thinking it's because string is an address so it doesn't change after initialization. If this is right, does that mean every time you build a function using a pointer as a param, it should be set to a constant ?
Like if I decide to build a function that sets an int variable to its double, should it be defined as:
void timesTwo(const int *num)
{
*num *= 2;
}
or
void timesTwo(int *num)
{
*num *= 2;
}
Or does it make no difference at all ?
C string is a pointer to a zero-terminated sequence of characters. const in front of char * indicates to the compiler and to the programmer calling the function that strlen is not going to modify the data pointed to by the string pointer.
This point is easier to understand when you look at strcpy:
char * strcpy ( char * destination, const char * source );
its second argument is const, but its first argument is not. This tells the programmer that the data pointed to by the first pointer may be modified by the function, while the data pointed to by the second pointer will remain constant upon return from strcpy.
The parameter to strlen function is a pointer to a const because the function is not expected to change what the pointer points to - it is only expected to do something with the string without altering it.
In your function 'timesTwo', if you intend to change the value that 'num' points to, you should not pass it in as a pointer-to-const. So, use the second example.
Basically, the function is promising that it will not modify the contents of of the input string through that pointer; it says that the expression *string may not be written to.
Here are the various permutations of the const qualifier:
const char *s; // s may be modified, *s may not be modified
char const *s; // same as above
char * const s; // s may not be modified, *s may be modified
const char * const s; // neither s nor *s may be modified
char const * const s; // same as above

Copying a part of a string (substring) in C

I have a string:
char * someString;
If I want the first five letters of this string and want to set it to otherString, how would I do it?
#include <string.h>
...
char otherString[6]; // note 6, not 5, there's one there for the null terminator
...
strncpy(otherString, someString, 5);
otherString[5] = '\0'; // place the null terminator
Generalized:
char* subString (const char* input, int offset, int len, char* dest)
{
int input_len = strlen (input);
if (offset + len > input_len)
{
return NULL;
}
strncpy (dest, input + offset, len);
return dest;
}
char dest[80];
const char* source = "hello world";
if (subString (source, 0, 5, dest))
{
printf ("%s\n", dest);
}
char* someString = "abcdedgh";
char* otherString = 0;
otherString = (char*)malloc(5+1);
memcpy(otherString,someString,5);
otherString[5] = 0;
UPDATE:
Tip: A good way to understand definitions is called the right-left rule (some links at the end):
Start reading from identifier and say aloud => "someString is..."
Now go to right of someString (statement has ended with a semicolon, nothing to say).
Now go left of identifier (* is encountered) => so say "...a pointer to...".
Now go to left of "*" (the keyword char is found) => say "..char".
Done!
So char* someString; => "someString is a pointer to char".
Since a pointer simply points to a certain memory address, it can also be used as the "starting point" for an "array" of characters.
That works with anything .. give it a go:
char* s[2]; //=> s is an array of two pointers to char
char** someThing; //=> someThing is a pointer to a pointer to char.
//Note: We look in the brackets first, and then move outward
char (* s)[2]; //=> s is a pointer to an array of two char
Some links:
How to interpret complex C/C++ declarations and
How To Read C Declarations
You'll need to allocate memory for the new string otherString. In general for a substring of length n, something like this may work for you (don't forget to do bounds checking...)
char *subString(char *someString, int n)
{
char *new = malloc(sizeof(char)*n+1);
strncpy(new, someString, n);
new[n] = '\0';
return new;
}
This will return a substring of the first n characters of someString. Make sure you free the memory when you are done with it using free().
You can use snprintf to get a substring of a char array with precision:
#include <stdio.h>
int main()
{
const char source[] = "This is a string array";
char dest[17];
// get first 16 characters using precision
snprintf(dest, sizeof(dest), "%.16s", source);
// print substring
puts(dest);
} // end main
Output:
This is a string
Note:
For further information see printf man page.
You can treat C strings like pointers. So when you declare:
char str[10];
str can be used as a pointer. So if you want to copy just a portion of the string you can use:
char str1[24] = "This is a simple string.";
char str2[6];
strncpy(str1 + 10, str2,6);
This will copy 6 characters from the str1 array into str2 starting at the 11th element.
I had not seen this post until now, the present collection of answers form an orgy of bad advise and compiler errors, only a few recommending memcpy are correct. Basically the answer to the question is:
someString = allocated_memory; // statically or dynamically
memcpy(someString, otherString, 5);
someString[5] = '\0';
This assuming that we know that otherString is at least 5 characters long, then this is the correct answer, period. memcpy is faster and safer than strncpy and there is no confusion about whether memcpy null terminates the string or not - it doesn't, so we definitely have to append the null termination manually.
The main problem here is that strncpy is a very dangerous function that should not be used for any purpose. The function was never intended to be used for null terminated strings and it's presence in the C standard is a mistake. See Is strcpy dangerous and what should be used instead?, I will quote some relevant parts from that post for convenience:
Somewhere at the time when Microsoft flagged strcpy as obsolete and dangerous, some other misguided rumour started. This nasty rumour said that strncpy should be used as a safer version of strcpy. Since it takes the size as parameter and it's already part of the C standard lib, so it's portable. This seemed very convenient - spread the word, forget about non-standard strcpy_s, lets use strncpy! No, this is not a good idea...
Looking at the history of strncpy, it goes back to the very earliest days of Unix, where several string formats co-existed. Something called "fixed width strings" existed - they were not null terminated but came with a fixed size stored together with the string. One of the things Dennis Ritchie (the inventor of the C language) wished to avoid when creating C, was to store the size together with arrays [The Development of the C Language, Dennis M. Ritchie]. Likely in the same spirit as this, the "fixed width strings" were getting phased out over time, in favour for null terminated ones.
The function used to copy these old fixed width strings was named strncpy. This is the sole purpose that it was created for. It has no relation to strcpy. In particular it was never intended to be some more secure version - computer program security wasn't even invented when these functions were made.
Somehow strncpy still made it into the first C standard in 1989. A whole lot of highly questionable functions did - the reason was always backwards compatibility. We can also read the story about strncpy in the C99 rationale 7.21.2.4:
The strncpy function
strncpy was initially introduced into the C library to deal with fixed-length name fields in
structures such as directory entries. Such fields are not used in the same way as strings: the
trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter
5 names to null assures efficient field-wise comparisons. strncpy is not by origin a “bounded
strcpy,” and the Committee preferred to recognize existing practice rather than alter the function
to better suit it to such use.
The Codidact link also contains some examples showing how strncpy will fail to terminate a copied string.
I think it's easy way... but I don't know how I can pass the result variable directly then I create a local char array as temp and return it.
char* substr(char *buff, uint8_t start,uint8_t len, char* substr)
{
strncpy(substr, buff+start, len);
substr[len] = 0;
return substr;
}
strncpy(otherString, someString, 5);
Don't forget to allocate memory for otherString.
#include <stdio.h>
#include <string.h>
int main ()
{
char someString[]="abcdedgh";
char otherString[]="00000";
memcpy (otherString, someString, 5);
printf ("someString: %s\notherString: %s\n", someString, otherString);
return 0;
}
You will not need stdio.h if you don't use the printf statement and putting constants in all but the smallest programs is bad form and should be avoided.
Doing it all in two fell swoops:
char *otherString = strncpy((char*)malloc(6), someString);
otherString[5] = 0;
char largeSrt[] = "123456789-123"; // original string
char * substr;
substr = strchr(largeSrt, '-'); // we save the new string "-123"
int substringLength = strlen(largeSrt) - strlen(substr); // 13-4=9 (bigger string size) - (new string size)
char *newStr = malloc(sizeof(char) * substringLength + 1);// keep memory free to new string
strncpy(newStr, largeSrt, substringLength); // copy only 9 characters
newStr[substringLength] = '\0'; // close the new string with final character
printf("newStr=%s\n", newStr);
free(newStr); // you free the memory
Try this code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char* substr(const char *src, unsigned int start, unsigned int end);
int main(void)
{
char *text = "The test string is here";
char *subtext = substr(text,9,14);
printf("The original string is: %s\n",text);
printf("Substring is: %s",subtext);
return 0;
}
char* substr(const char *src, unsigned int start, unsigned int end)
{
unsigned int subtext_len = end-start+2;
char *subtext = malloc(sizeof(char)*subtext_len);
strncpy(subtext,&src[start],subtext_len-1);
subtext[subtext_len-1] = '\0';
return subtext;
}

Resources