Replicating the strcmp() function from string.h library - c

I am trying to replicate the strcmp() function from the string.h library and here is my code
/**
* string_compare - this function compares two strings pointed
* by s1 and s2. Is a replica of the strcmp from the string.h library
* #s1: The first string to be compared
* #s2: The second string to be compared
*
* Return: On success, it returns:
* 0 if s1 is equal to s2
* negative value if s1 is less that s2
* positive value if s1 is greater than s2
*/
int string_compare(char *s1, char *s2)
{
int sum = 0, i;
for (i = 0; s1[i] != '\0' && s2[i] != '\0'; i++)
sum += (s1[i] - s2[i]);
for ( ; s1[i] != '\0'; i++)
sum += (s1[i] - 0);
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
return (sum);
}
I tried my function using this sample code:
#include <stdio.h>
int main(void)
{
char s1[] = "Hello";
char s2[] = "World!";
printf("%d\n", string_compare(s1, s2));
printf("%d\n", string_compare(s2, s1));
printf("%d\n", string_compare(s1, s1));
return (0);
}
And I get the following output,
-53
-500
0
But I should be getting:
-15
15
0
Why am I getting such a result??

This approach is incorrect.
Let's assume that the first string is "B" and the second string is "AB".
It is evident that the first string is greater than the second string in the lexicographical order.
But the result will be negative due to this for loop
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
though the function shall return a positive value.
Moreover there can occur an overflow for the variable sum of the type int.
Also the function should be declared at least like
int string_compare( const char *s1, const char *s2);
because passed strings are not changed within the function.
The function can be defined the following way
int string_compare( const char *s1, const char *s2 )
{
while ( *s1 && *s1 == *s2 )
{
++s1;
++s2;
}
return ( unsigned char )*s1 - ( unsigned char )*s2;
}

You are overcomplicating very simple function.
#define UC unsigned char
int mystrcmp(const char *s1, const char *s2)
{
int result;
while(!(result = (UC)*s1 - (UC)*s2++) && *s1++);
return result;
}

Strings in C are arrays of characters terminated with a null character (\0).
When you pass a string to a function, you are passing a pointer to its first element. That pointer is passed by value. You can modify that pointer within the function without any side-effects on the string it points to, as long as you don't dereference and assign to the address it points to.
That's why the pointer math from
0___________'s answer works.
int mystrcmp1(const char *s1, const char *s2) {
int result = 0;
while(!(result = *s1 - *s2++) && *s1++);
return result;
}
*s1++ could be rewritten as *(s1++) to disambiguate. s1++ returns the current pointer to the beginning of the first string, and then increments the pointer so it points to the next character. That pointer is then dereferenced to give us the character. The same happens with the s2 pointer.
Then we're comparing them by subtraction. If they're the same, we get 0, which in C is false in a boolean context. This result is assigned to result.
We can now see that the loop continues while corresponding characters in the two strings are equal and while dereferencing s1 does not give us the null terminator.
When the loop continues it means there was either a difference or we reached the end of the first string.
The difference will be stored in result, which the function returns.

Related

How exactly does pointer value incrementing work?

so I was working on creating the raw function of concatenating a string in c. One solution that was provided to me was :
char *_strcat(char *dest, char *src)
{
int c, c2;
c = 0;
while (dest[c])
c++;
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
return (dest);
}
The part that confuses me is while (dest[c]), and other similar parts. I've already gone through pointers through various resources but I can't seem to understand this part. A good explanation will be much appreciated.
For starters the function is incorrect. It does not build a concatenated string because it does not append the terminating zero character '\0' to the result (dest) string in this for loop
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
Also the function should be declared like
char * _strcat( char *dest, const char *src );
because the appended string (src) is not changed.
This while loop
while (dest[c])
c++;
is equivalent to
while (dest[c] != '\0' )
c++;
and this for loop
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
is equivalent to
for (c2 = 0; src[c2] != '\0' ; c2++)
dest[c++] = src[c2];
That is the loops continue their iterations until the terminating zero character '\0' is encountered in the while loop in the string dest (to find its end) and in the second loop in the string src to find its end..
A non-zero scalar expression is evaluated as a logical true in conditions.
And the variables c and c2 should have the unsigned type size_t instead of the type int because objects of the type int can be not large enough to store string lengths.
Also you should not define names starting from the underscore character.
As for your question
How exactly does pointer value incrementing work?
then the pointers themselves are not incremented. There are used expressions with the subscript operator to access elements of strings as for example dest[c] or dest[c++] or src[c2].
The function can be defined the following way
char * my_strcat( char *dest, const char *src )
{
char *p = dest;
while ( *p != '\0' ) ++p;
while ( ( *p++ = *src++ ) != '\0' );
return dest;
}
In the shown function there are indeed incremented pointers p and src and neither expression with the subscript operator is used..
char *dest is an pointer to char and it's pointed to the first character by default. Then the following loop will move the pointer offset to the end of the string.
c = 0;
while (dest[c])
c++;

Recursive Programm to print all string combinations of 'a' and 'b' of given length n in c

The task is:
Write a full program that takes an int n > 0 and recursively prints all combinations of characters 'a' and 'b' on the screen.
Example for n=3: aaa, baa, bba, aba, bab, aab, abb, bbb.
I assume I have to use something similar to Backtracking.
This is what I have, but Im not able to think of the rest.
void rep(int n, char str, int pos) { //n would be the length and str would be the pointer
char c[n + 1];
char d[3];
d[0] = 'a';
d[1] = 'b';
for (int j = 0; j < 2; j++) {
if (strlen(c) == n) { // if c is n long recursion ends
printf("%s", c);
} else {
c[pos] = d[j]; // put 'a' or 'b' in c[pos]
rep(n, c, pos + 1); // update pos to next position
}
}
}
The variable length array c is not initialized
char c[n+1]
Thus the call of strlen in this if statement
if(strlen(c) == n){
invokes undefined behavior.
Moreover the parameter str is not used within the function.
I can suggest the following solution as it is shown in the demonstration program below
#include <stdio.h>
#include <string.h>
void rep( char *s )
{
puts( s );
char *p = strchr( s, 'a' );
if (p != NULL)
{
memset( s, 'a', p - s );
*p = 'b';
rep( s );
}
}
int main()
{
char s[] = "aaa";
rep( s );
}
The program output is
aaa
baa
aba
bba
aab
bab
abb
bbb
That is the function rep is initially called with an array that contains a string of the required size n (in the demonstration program n is equal to 3) consisting of all characters equal to the character 'a' and recursively outputs all combinations until the string contains all characters equal to the character 'b'.
There a some issues in your code:
the str argument should have type char *
you so not need new arrays in the recursive function, but use the one the str argument points to.
you do not set a null terminator at the end of your char arrays.
instead of strlen(), use pos to determine if the recursion should stop.
Here is a modified version
#include <stdio.h>
// n is the length and str points to an array of length n+1
void rep(int n, char *str, int pos) {
if (pos >= n) {
str[n] = '\0'; // set the null terminator
printf("%s\n", str);
} else {
str[pos] = 'a';
rep(n, str, pos + 1);
str[pos] = 'b';
rep(n, str, pos + 1);
}
}
#define LEN 3
int main() {
char array[LEN + 1];
rep(LEN, array, 0);
return 0;
}

Allocate char array and strings

I have problem understanding the code below.
What value index=strlen(strs[0]) gets?
char *a= malloc (sizeof(char)*(index+1)) Is this the standard way to allocate array for char array?
What does strs[i][j] represent?
This is the code I found on leetcode. Just trying to understand the code. (code from sanghi user on leetcode)
#include<string.h>
char* longestCommonPrefix(char** strs, int strsSize)
{
int i=0; int j=0;int index;int tempindex=0;
if(strsSize<1)
return "";
index=strlen(strs[0]);
char *a;
a= malloc(sizeof(char)*(index+1));
strcpy(a,strs[0]);
for(i=1;i<strsSize;i++)
{ tempindex=0;
for(j=0;j<index;j++)
{
if(a[j]==strs[i][j])
tempindex++;
else
{a[j]='\0';
break;
}
}
if (tempindex==0)return ("");
if(tempindex<index)index=tempindex;
}
return a;
}
Expected results can be found on https://leetcode.com/problems/longest-common-prefix/
strs is an array of strings. strsSize is the number of strings in the array.
index = strlen(strs[0]);
This simply gets the length of strs[0], the first string in the array.
a = malloc(sizeof(char)*index+1);
This will allocate enough memory to store a string of the same size. I say enough memory because each string actually has length + 1 characters. The last character is \0, a null terminator. You always have to make sure to terminate your strings or else a bunch of weird buffer overflow stuff can happen.
str[i][j]
This accesses the jth character in the ith string in the array.
For starters the program is bad and invalid.:)
For example the size of the one dimensional array first element of which is pointed to by the parameter strs shall have the type size_t instead of int.
And all other variables that deal with indices also shall have the type size_t as for example
size_t index = strlen( strs[0] );
because the standard C function strlen has the return type size_t.
The source array is not changed in the function so the first parameter shall be declared with the qualifier const.
That is the function declaration shall look like
char * longestCommonPrefix( const char** strs, size_t strsSize);
Farther the elements (strings) of the array can have different lengths, So this loop
for(j=0;j<index;j++)
has undefined behavior because some element (string) of the array can have length less than the value of the variable index.
In fact there is no need to calculate lengths of the elements of the array. The loop can use the condition
for( j=0; j < index && strs[i][j] != '\0'; j++)
And moreover the function has a memory leak due to this return sub-statement in the if statement
a= malloc(sizeof(char)*(index+1));
//...
if (tempindex==0)return ("");
That is the allocated memory pointed to by the pointer a will not released.
What value index=strlen(strs[0]) gets?
index gets the length of the string stored in the first element of the array of strings.
For example if you have an array
char *strs[] = { "Hello", "Bye", "Good Morning" };
then index is set to the length of the string "Hello".
char a= malloc (sizeof(char)(index+1)) Is this the standard way to
allocate array for char array?
Yes in this declaration there is allocated a memory large enough to store the string (including its terminating zero) of the first element of the array pointed to by strs.
What does strs[i][j] represent?
strs[i][j] access j-th character of the i-th element of the array pointed to by strs.
For example for the declaration above strs[0][0] is equal to 'H', strs[0][1] is equal to 'e', strs[1][0] is equal to 'B' and so on.
P.S. A better approach to define the function is the following as it is shown in the demonstrative program.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
size_t longestCommonPrefix( const char **strs, size_t strsSize )
{
size_t n = 0;
if ( strsSize != 0 )
{
n = strlen( *strs );
for ( size_t i = 1; n != 0 && i < strsSize; i++ )
{
size_t j = 0;
while ( j < n && strs[i][j] == strs[i-1][j] ) j++;
if ( j < n ) n = j;
}
}
return n;
}
int main(void)
{
char * strs[] = { "0123456789", "012345", "0123" };
size_t n = longestCommonPrefix( ( const char ** )strs, sizeof( strs ) / sizeof( *strs ) );
char *p = NULL;
if ( n != 0 )
{
p = malloc( n + 1 );
memcpy( p, strs[0], n );
p[n] = '\0';
printf( "The longest common prefix is \"%s\"\n", p );
}
free( p );
return 0;
}
The program output is
The longest common prefix is "0123"

Parse string to number in C

I tried to write a function to convert a string to an int:
int convert(char *str, int *n){
int i;
if (str == NULL) return 0;
for (i = 0; i < strlen(str); i++)
if ((isdigit(*(str+i))) == 0) return 0;
*n = *str;
return 1;
}
So what's wrong with my code?
*n = *str means:
Set the 4 bytes of memory that n points to, to the 1 byte of memory that str points to. This is perfectly fine but it's probably not your intention.
Why are you trying to convert a char* to an int* in the first place? If you literally just need to do a conversion and make the compiler happy, you can just do int *foo = (int*)bar where bar is the char*.
Sorry, I don't have the reputation to make this a comment.
The function definitely does not perform as intended.
Here are some issues:
you should include <ctype.h> for isdigit() to be properly defined.
isdigit(*(str+i)) has undefined behavior if str contains negative char values. You should cast the argument:
isdigit((unsigned char)str[i])
the function returns 0 if there is any non digit character in the string. What about "-1" and "+2"? atoi and strtol are more lenient with non digit characters, they skip initial white space, process an optional sign and subsequent digits, stopping at the first non digit.
the test for (i = 0; i < strlen(str); i++) is very inefficient: strlen may be invoked for each character in the string, with O(N2) time complexity. Use this instead:
for (i = 0; str[i] != '\0'; i++)
*n = *str does not convert the number represented by the digits in str, it merely stores the value of the first character into n, for example '0' will convert to 48 on ASCII systems. You should instead process every digit in the string, multiplying the value converted so far by 10 and adding the value represented by the digit with str[i] - '0'.
Here is a corrected version with your restrictive semantics:
int convert(const char *str, int *n) {
int value = 0;
if (str == NULL)
return 0;
while (*str) {
if (isdigit((unsigned char)*str)) {
value = value * 10 + *str++ - '0';
} else {
return 0;
}
}
*n = value;
return 1;
}
conversion of char* pointer to int*
#include
main()
{
char c ,*cc;
int i, *ii;
float f,*ff;
c = 'A'; /* ascii value of A gets
stored in c */
i=25;
f=3.14;
cc =&c;
ii=&i;
ff=&f;
printf("\n Address contained
in cc =%u",cc);
printf("\n Address contained
in ii =%u",ii);
printf(:\n Address contained
in ff=%u",ff);
printf(\n value of c= %c",
*cc);
printf(\n value of i=%d",
**ii);
printf(\n value of f=%f",
**ff);
}

How does strcmp() work?

I've been looking around a fair bit for an answer. I'm going to make a series of my own string functions like my_strcmp(), my_strcat(), etc.
Does strcmp() work through each index of two arrays of characters and if the ASCII value is smaller at an identical index of two strings, that string is there alphabetically greater and therefore a 0 or 1 or 2 is returned? I guess what Im asking is, does it use the ASCII values of characters to return these results?
Any help would be greatly appreciated.
[REVISED]
OK, so I have come up with this... it works for all cases except when the second string is greater than the first.
Any tips?
int my_strcmp(char s1[], char s2[])
{
int i = 0;
while ( s1[i] != '\0' )
{
if( s2[i] == '\0' ) { return 1; }
else if( s1[i] < s2[i] ) { return -1; }
else if( s1[i] > s2[i] ) { return 1; }
i++;
}
return 0;
}
int main (int argc, char *argv[])
{
int result = my_strcmp(argv[1], argv[2]);
printf("Value: %d \n", result);
return 0;
}
The pseudo-code "implementation" of strcmp would go something like:
define strcmp (s1, s2):
p1 = address of first character of str1
p2 = address of first character of str2
while contents of p1 not equal to null:
if contents of p2 equal to null:
return 1
if contents of p2 greater than contents of p1:
return -1
if contents of p1 greater than contents of p2:
return 1
advance p1
advance p2
if contents of p2 not equal to null:
return -1
return 0
That's basically it. Each character is compared in turn an a decision is made as to whether the first or second string is greater, based on that character.
Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.
Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.
Turning that into real C would be relatively simple:
int myStrCmp (const char *s1, const char *s2) {
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;
while (*p1 != '\0') {
if (*p2 == '\0') return 1;
if (*p2 > *p1) return -1;
if (*p1 > *p2) return 1;
p1++;
p2++;
}
if (*p2 != '\0') return -1;
return 0;
}
Also keep in mind that "greater" in the context of characters is not necessarily based on simple ASCII ordering for all string functions.
C has a concept called 'locales' which specify (among other things) collation, or ordering of the underlying character set and you may find, for example, that the characters a, á, à and ä are all considered identical. This will happen for functions like strcoll.
Here is the BSD implementation:
int
strcmp(s1, s2)
register const char *s1, *s2;
{
while (*s1 == *s2++)
if (*s1++ == 0)
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
Once there is a mismatch between two characters, it just returns the difference between those two characters.
It uses the byte values of the characters, returning a negative value if the first string appears before the second (ordered by byte values), zero if they are equal, and a positive value if the first appears after the second. Since it operates on bytes, it is not encoding-aware.
For example:
strcmp("abc", "def") < 0
strcmp("abc", "abcd") < 0 // null character is less than 'd'
strcmp("abc", "ABC") > 0 // 'a' > 'A' in ASCII
strcmp("abc", "abc") == 0
More precisely, as described in the strcmp Open Group specification:
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.
Note that the return value may not be equal to this difference, but it will carry the same sign.
This, from the masters themselves (K&R, 2nd ed., pg. 106):
// strcmp: return < 0 if s < t, 0 if s == t, > 0 if s > t
int strcmp(char *s, char *t)
{
int i;
for (i = 0; s[i] == t[i]; i++)
if (s[i] == '\0')
return 0;
return s[i] - t[i];
}
Here is my version, written for small microcontroller applications, MISRA-C compliant.
The main aim with this code was to write readable code, instead of the one-line goo found in most compiler libs.
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( (*s1 != '\0') && (*s1 == *s2) )
{
s1++;
s2++;
}
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}
Note: the code assumes 16 bit int type.
This code is equivalent, shorter, and more readable:
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while( (*s1!='\0') && (*s1==*s2) ){
s1++;
s2++;
}
return (int8_t)*s1 - (int8_t)*s2;
}
We only need to test for end of s1, because if we reach the end of s2 before end of s1, the loop will terminate (since *s2 != *s1).
The return expression calculates the correct value in every case, provided we are only using 7-bit (pure ASCII) characters. Careful thought is needed to produce correct code for 8-bit characters, because of the risk of integer overflow.
I found this on web.
http://www.opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c
int strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}
This is how I implemented my strcmp:
it works like this:
it compares first letter of the two strings, if it is identical, it continues to the next letter. If not, it returns the corresponding value. It is very simple and easy to understand:
#include
//function declaration:
int strcmp(char string1[], char string2[]);
int main()
{
char string1[]=" The San Antonio spurs";
char string2[]=" will be champins again!";
//calling the function- strcmp
printf("\n number returned by the strcmp function: %d", strcmp(string1, string2));
getch();
return(0);
}
/**This function calculates the dictionary value of the string and compares it to another string.
it returns a number bigger than 0 if the first string is bigger than the second
it returns a number smaller than 0 if the second string is bigger than the first
input: string1, string2
output: value- can be 1, 0 or -1 according to the case*/
int strcmp(char string1[], char string2[])
{
int i=0;
int value=2; //this initialization value could be any number but the numbers that can be returned by the function
while(value==2)
{
if (string1[i]>string2[i])
{
value=1;
}
else if (string1[i]<string2[i])
{
value=-1;
}
else
{
i++;
}
}
return(value);
}
Is just this:
int strcmp(char *str1, char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
if you want more fast, you can add "register " before type, like this:
register char
then, like this:
int strcmp(register char *str1, register char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
this way, if possible, the register of the ALU are used.

Resources