How does strcmp() work? - c

I've been looking around a fair bit for an answer. I'm going to make a series of my own string functions like my_strcmp(), my_strcat(), etc.
Does strcmp() work through each index of two arrays of characters and if the ASCII value is smaller at an identical index of two strings, that string is there alphabetically greater and therefore a 0 or 1 or 2 is returned? I guess what Im asking is, does it use the ASCII values of characters to return these results?
Any help would be greatly appreciated.
[REVISED]
OK, so I have come up with this... it works for all cases except when the second string is greater than the first.
Any tips?
int my_strcmp(char s1[], char s2[])
{
int i = 0;
while ( s1[i] != '\0' )
{
if( s2[i] == '\0' ) { return 1; }
else if( s1[i] < s2[i] ) { return -1; }
else if( s1[i] > s2[i] ) { return 1; }
i++;
}
return 0;
}
int main (int argc, char *argv[])
{
int result = my_strcmp(argv[1], argv[2]);
printf("Value: %d \n", result);
return 0;
}

The pseudo-code "implementation" of strcmp would go something like:
define strcmp (s1, s2):
p1 = address of first character of str1
p2 = address of first character of str2
while contents of p1 not equal to null:
if contents of p2 equal to null:
return 1
if contents of p2 greater than contents of p1:
return -1
if contents of p1 greater than contents of p2:
return 1
advance p1
advance p2
if contents of p2 not equal to null:
return -1
return 0
That's basically it. Each character is compared in turn an a decision is made as to whether the first or second string is greater, based on that character.
Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.
Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.
Turning that into real C would be relatively simple:
int myStrCmp (const char *s1, const char *s2) {
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;
while (*p1 != '\0') {
if (*p2 == '\0') return 1;
if (*p2 > *p1) return -1;
if (*p1 > *p2) return 1;
p1++;
p2++;
}
if (*p2 != '\0') return -1;
return 0;
}
Also keep in mind that "greater" in the context of characters is not necessarily based on simple ASCII ordering for all string functions.
C has a concept called 'locales' which specify (among other things) collation, or ordering of the underlying character set and you may find, for example, that the characters a, á, à and ä are all considered identical. This will happen for functions like strcoll.

Here is the BSD implementation:
int
strcmp(s1, s2)
register const char *s1, *s2;
{
while (*s1 == *s2++)
if (*s1++ == 0)
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
Once there is a mismatch between two characters, it just returns the difference between those two characters.

It uses the byte values of the characters, returning a negative value if the first string appears before the second (ordered by byte values), zero if they are equal, and a positive value if the first appears after the second. Since it operates on bytes, it is not encoding-aware.
For example:
strcmp("abc", "def") < 0
strcmp("abc", "abcd") < 0 // null character is less than 'd'
strcmp("abc", "ABC") > 0 // 'a' > 'A' in ASCII
strcmp("abc", "abc") == 0
More precisely, as described in the strcmp Open Group specification:
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.
Note that the return value may not be equal to this difference, but it will carry the same sign.

This, from the masters themselves (K&R, 2nd ed., pg. 106):
// strcmp: return < 0 if s < t, 0 if s == t, > 0 if s > t
int strcmp(char *s, char *t)
{
int i;
for (i = 0; s[i] == t[i]; i++)
if (s[i] == '\0')
return 0;
return s[i] - t[i];
}

Here is my version, written for small microcontroller applications, MISRA-C compliant.
The main aim with this code was to write readable code, instead of the one-line goo found in most compiler libs.
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( (*s1 != '\0') && (*s1 == *s2) )
{
s1++;
s2++;
}
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}
Note: the code assumes 16 bit int type.

This code is equivalent, shorter, and more readable:
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while( (*s1!='\0') && (*s1==*s2) ){
s1++;
s2++;
}
return (int8_t)*s1 - (int8_t)*s2;
}
We only need to test for end of s1, because if we reach the end of s2 before end of s1, the loop will terminate (since *s2 != *s1).
The return expression calculates the correct value in every case, provided we are only using 7-bit (pure ASCII) characters. Careful thought is needed to produce correct code for 8-bit characters, because of the risk of integer overflow.

I found this on web.
http://www.opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c
int strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

This is how I implemented my strcmp:
it works like this:
it compares first letter of the two strings, if it is identical, it continues to the next letter. If not, it returns the corresponding value. It is very simple and easy to understand:
#include
//function declaration:
int strcmp(char string1[], char string2[]);
int main()
{
char string1[]=" The San Antonio spurs";
char string2[]=" will be champins again!";
//calling the function- strcmp
printf("\n number returned by the strcmp function: %d", strcmp(string1, string2));
getch();
return(0);
}
/**This function calculates the dictionary value of the string and compares it to another string.
it returns a number bigger than 0 if the first string is bigger than the second
it returns a number smaller than 0 if the second string is bigger than the first
input: string1, string2
output: value- can be 1, 0 or -1 according to the case*/
int strcmp(char string1[], char string2[])
{
int i=0;
int value=2; //this initialization value could be any number but the numbers that can be returned by the function
while(value==2)
{
if (string1[i]>string2[i])
{
value=1;
}
else if (string1[i]<string2[i])
{
value=-1;
}
else
{
i++;
}
}
return(value);
}

Is just this:
int strcmp(char *str1, char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
if you want more fast, you can add "register " before type, like this:
register char
then, like this:
int strcmp(register char *str1, register char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
this way, if possible, the register of the ALU are used.

Related

Replicating the strcmp() function from string.h library

I am trying to replicate the strcmp() function from the string.h library and here is my code
/**
* string_compare - this function compares two strings pointed
* by s1 and s2. Is a replica of the strcmp from the string.h library
* #s1: The first string to be compared
* #s2: The second string to be compared
*
* Return: On success, it returns:
* 0 if s1 is equal to s2
* negative value if s1 is less that s2
* positive value if s1 is greater than s2
*/
int string_compare(char *s1, char *s2)
{
int sum = 0, i;
for (i = 0; s1[i] != '\0' && s2[i] != '\0'; i++)
sum += (s1[i] - s2[i]);
for ( ; s1[i] != '\0'; i++)
sum += (s1[i] - 0);
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
return (sum);
}
I tried my function using this sample code:
#include <stdio.h>
int main(void)
{
char s1[] = "Hello";
char s2[] = "World!";
printf("%d\n", string_compare(s1, s2));
printf("%d\n", string_compare(s2, s1));
printf("%d\n", string_compare(s1, s1));
return (0);
}
And I get the following output,
-53
-500
0
But I should be getting:
-15
15
0
Why am I getting such a result??
This approach is incorrect.
Let's assume that the first string is "B" and the second string is "AB".
It is evident that the first string is greater than the second string in the lexicographical order.
But the result will be negative due to this for loop
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
though the function shall return a positive value.
Moreover there can occur an overflow for the variable sum of the type int.
Also the function should be declared at least like
int string_compare( const char *s1, const char *s2);
because passed strings are not changed within the function.
The function can be defined the following way
int string_compare( const char *s1, const char *s2 )
{
while ( *s1 && *s1 == *s2 )
{
++s1;
++s2;
}
return ( unsigned char )*s1 - ( unsigned char )*s2;
}
You are overcomplicating very simple function.
#define UC unsigned char
int mystrcmp(const char *s1, const char *s2)
{
int result;
while(!(result = (UC)*s1 - (UC)*s2++) && *s1++);
return result;
}
Strings in C are arrays of characters terminated with a null character (\0).
When you pass a string to a function, you are passing a pointer to its first element. That pointer is passed by value. You can modify that pointer within the function without any side-effects on the string it points to, as long as you don't dereference and assign to the address it points to.
That's why the pointer math from
0___________'s answer works.
int mystrcmp1(const char *s1, const char *s2) {
int result = 0;
while(!(result = *s1 - *s2++) && *s1++);
return result;
}
*s1++ could be rewritten as *(s1++) to disambiguate. s1++ returns the current pointer to the beginning of the first string, and then increments the pointer so it points to the next character. That pointer is then dereferenced to give us the character. The same happens with the s2 pointer.
Then we're comparing them by subtraction. If they're the same, we get 0, which in C is false in a boolean context. This result is assigned to result.
We can now see that the loop continues while corresponding characters in the two strings are equal and while dereferencing s1 does not give us the null terminator.
When the loop continues it means there was either a difference or we reached the end of the first string.
The difference will be stored in result, which the function returns.

My program which creates a abbreviation of an char array, does not print anything. Where is my mistake?

I am supposed to create a program, which creates an array with the abbreviation of an constant char Array. While my program does not return any errors, it also does not print any characters at my certain printf spots. Because of that I assume that my program does not work properly, and it isn't filling my array with any characters.
void abbrev(const char s[], char a[], size_t size) {
int i = 0;
while (*s != '\0') {
printf('%c', *s);
if (*s != ' ' && *s - 1 == ' ') {
a[i] = *s;
i++;
printf('%c', a[i]);
}
s++;
}
}
void main() {
char jordan1[60] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
system("PAUSE");
}
The actual result is nothing. At least I assume so, since my console isn't showing anything. The result should be "EFF" and the size_t size is supposed to limit my char array a, in case the abbreviation is too long. So it should only implement the letters until my array is full and then the '\0', but I did not implement it yet, since my program is apparantly not filling the array at all.
#include <stdio.h>
#include <ctype.h>
/* in: the string to abbreviate
out: output abbreviation. Function assumes there's enough room */
void abbrev(const char in[], char out[])
{
const char *p;
int zbPosOut = 0; /* current zero-based position within the `out` array */
for (p = in; *p; ++p) { /* iterate through `in` until we see a zero terminator */
/* if the letter is uppercase
OR if (the letter is alphabetic AND we are not at the zero
position AND the previous char. is a space character) OR if the
letter is lowercase and it is the first char. of the array... */
if (isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)) {
out[zbPosOut++] = *p; /* ... then the letter is the start letter
of a word, so add it to our `out` array, and
increment the current `zbPosOut` */
}
}
out[zbPosOut] = 0; /* null-terminate the out array */
}
This code says a lot in few lines. Let's take a look:
isupper(*p) || (isalpha(*p) && (p - in) > 0 && isspace(p[-1]))
|| (islower(*p) && p == in)
If the current character (*p) is an uppercase character OR if it is alphabetc (isalpha(*p) and the previous character p[-1] is a space, then we may consider *p to be the first character of a word, and it should be added to our out array. We include the test (p - in) > 0 because if p == in, then we are at the zero position of the array and therefore p[-1] is undefined.
The order in this expression matters a lot. If we were to put (p - in) > 0 after the isspace(p[-1]) test, then we would not be taking advantage of the laziness of the && operator: as soon as it encounters a false operand, the following operand is not evaluated. This is important because if p - in == 0, then we do not want to evaluate the isspace(p[-1]) expression. The order in which we have written the tests makes sure that isspace(p[-1]) is evaluated after making sure we are not at the zero position.
The final expression (islower(*p) && p == in) handles the case where the first letter is lowercase.
out[zbPosOut++] = *p;
We append the character *p to the out array. The current position of out is kept track of by the zbPosOut variable, which is incremented afterwards (which is why we use postscript ++ rather than prefix).
Code to test the operation of abbrev:
int main()
{
char jordan1[] = " electronic frontier foundation ";
char out[16];
abbrev(jordan1, out);
puts(out);
return 0;
}
It gives eff as the output. For it to look like an acronym, we can change the code to append the letter *p to out to:
out[zbPosOut++] = toupper(*p);
which capitalizes each letter added to the out array (if *p is already uppercase, toupper just returns *p).
void print_without_duplicate_leading_trailing_spaces(const char *str)
{
while(*str == ' ' && *str) str++;
while(*str)
{
if(*str != ' ' || (*str == ' ' && *(str + 1) != ' ' && *str))
{
putchar(*str);
}
str++;
}
}
What you want to do could be simplified with a for() loop.
#include <stdio.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
int pos = 0;
// Loop for every character in 's'.
for (int i = 0; i < strlen(s); i++)
// If the character just before was a space, and this character is not a
// space, and we are still in the size bounds (we subtract 1 for the
// terminator), then copy and append.
if (s[i] != ' ' && s[i - 1] == ' ' && pos < size - 1)
a[pos++] = s[i];
printf("%s\n", a); // Print.
}
void main() {
char jordan1[] = " Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
}
However, I don't think this is the best way to achieve what you are trying to do. Firstly, char s[0] cannot be gotten due to the check on the previous character. Which brings me to the second reason: On the first index you will be checking s[-1] which probably isn't a good idea. If I were implementing this function I would do this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void abbrev(const char s[], char a[], size_t size) {
char *str = strdup(s); // Make local copy.
size_t i = 0;
// Break it up into words, then grab the first character of each word.
for (char *w = strdup(strtok(str, " ")); w != NULL; w = strtok(NULL, " "))
if (i < size - 1)
a[i++] = w[0];
free(str); // Release our local copy.
printf("%s\n", a);
}
int main() {
char jordan1[] = "Electronic Frontier Foundation ";
char a[5];
size_t size = 5;
abbrev(jordan1, a, size);
return 0;
}

Parse string to number in C

I tried to write a function to convert a string to an int:
int convert(char *str, int *n){
int i;
if (str == NULL) return 0;
for (i = 0; i < strlen(str); i++)
if ((isdigit(*(str+i))) == 0) return 0;
*n = *str;
return 1;
}
So what's wrong with my code?
*n = *str means:
Set the 4 bytes of memory that n points to, to the 1 byte of memory that str points to. This is perfectly fine but it's probably not your intention.
Why are you trying to convert a char* to an int* in the first place? If you literally just need to do a conversion and make the compiler happy, you can just do int *foo = (int*)bar where bar is the char*.
Sorry, I don't have the reputation to make this a comment.
The function definitely does not perform as intended.
Here are some issues:
you should include <ctype.h> for isdigit() to be properly defined.
isdigit(*(str+i)) has undefined behavior if str contains negative char values. You should cast the argument:
isdigit((unsigned char)str[i])
the function returns 0 if there is any non digit character in the string. What about "-1" and "+2"? atoi and strtol are more lenient with non digit characters, they skip initial white space, process an optional sign and subsequent digits, stopping at the first non digit.
the test for (i = 0; i < strlen(str); i++) is very inefficient: strlen may be invoked for each character in the string, with O(N2) time complexity. Use this instead:
for (i = 0; str[i] != '\0'; i++)
*n = *str does not convert the number represented by the digits in str, it merely stores the value of the first character into n, for example '0' will convert to 48 on ASCII systems. You should instead process every digit in the string, multiplying the value converted so far by 10 and adding the value represented by the digit with str[i] - '0'.
Here is a corrected version with your restrictive semantics:
int convert(const char *str, int *n) {
int value = 0;
if (str == NULL)
return 0;
while (*str) {
if (isdigit((unsigned char)*str)) {
value = value * 10 + *str++ - '0';
} else {
return 0;
}
}
*n = value;
return 1;
}
conversion of char* pointer to int*
#include
main()
{
char c ,*cc;
int i, *ii;
float f,*ff;
c = 'A'; /* ascii value of A gets
stored in c */
i=25;
f=3.14;
cc =&c;
ii=&i;
ff=&f;
printf("\n Address contained
in cc =%u",cc);
printf("\n Address contained
in ii =%u",ii);
printf(:\n Address contained
in ff=%u",ff);
printf(\n value of c= %c",
*cc);
printf(\n value of i=%d",
**ii);
printf(\n value of f=%f",
**ff);
}

C - Passing an int value from a char string

I have the following function, and I want to test if the two strings are anagrams. One way I thought about doing it would be to sum the values of each of the characters in the strings and then compare their values.
However, I am getting a segmentation fault in both the for loops when I try to run my program. I am not understanding this correctly, is there anything I am doing incorrectly in my code?
int anagram(char *a, char *b)
{
int sum1 = 0;
int sum2 = 0;
char *p, *q;
for (p=a; p != '\0'; p++) {
sum1 += *p - 'a';
}
for (q=b; q != '\0'; q++) {
sum2 += *q - 'a';
}
if ( sum1 == sum2 )
return 1;
else
return 0;
}
In your for loops you must check
*p != '\0'
*q != '\0'
This is the cause of the seg-fault.
Furthermore, even fixed, that code will give you false positives:
"bc" anagram of "ad"
I suggest you a different approach:
make two arrays of ints sized 256, zero initialized.
Let every item of each array keep the count of every letter (char) of each string.
Finally compare if the two arrays are the same.
I leave the task of writig the code to you.
"p !=0" should be "*p != 0", as it is now you are waiting for the pointer to become null.
Since we're already giving answers about better approaches, here's mine:
Get a list of (preferably small) prime numbers. You need one for every possible character of your input strings, thus when you want to check strings containing only digits 0 to 9 you need 10 prime numbers. Let's take these:
static unsigned const primes[10] = {
2, 3, 5, 7, 11, 13, 17, 19, 23, 29};
Now, since each number has exactly one prime factorisation, and because of multiplication being commutative, you can just build the product of the prime numbers for each character of your string. If they're identical, then for each character holds that it has been the same number of times in both strings. Thus, both strings are anagrams of each other.
unsigned long prime_product(char const * str) {
assert(str != NULL);
unsigned long product = 1;
for (; *str != '\0'; ++str) {
assert(*str >= '0');
assert(*str <= '9');
product *= primes[*str - '0'];
}
return product;
}
char is_anagram(char const * one, char const * two) {
return prime_product(one) == prime_product(two);
}
This should even work to some extend when the product overflows, though then false positives are possible (though their likelihood can be greatly reduced when also comparing the length of the two strings).
As can be seen this version has O(n) time and constant space complexity.
Here is a complete solution for your problem:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int cmp(const void *str1, const void *str2) {
return (*((char*)str1) - *((char*)str2));
}
bool areAnagram(char *str1, char *str2) {
int n1 = strlen(str1);
int n2 = strlen(str2);
if (n1 != n2)
return false;
qsort(str1, n1, 1, &cmp);
qsort(str2, n2, 1, &cmp);
for (int i = 0; i < n1; i++)
if (str1[i] != str2[i])
return false;
return true;
}
int main()
{
char str1[] = "test";
char str2[] = "tset";
if (areAnagram(str1, str2))
printf("The two strings are anagram of each other");
else
printf("The two strings are not anagram of each other");
return 0;
}

Initialize string array in C and count number of letters inside

I found this example
int SizeofCharArray(char *phrase)
{
int size = 0;
int value = phrase[size];
while(value != 0)
{
value = phrase[size];
size++;
};
//printf("%i%s", size, "\n");
return size;
}
here
But how can I count number of letters in string array using pure C? Even I do not understand how can I initialize string array?!
Thank you!
The posted code is of rather poor quality. The name of the function, SizeofCharArray, does not match the description, count number of letters in string array.
If you want to return the number of characters in the array, use:
int SizeofCharArray(char *phrase)
{
int size = 0;
char* cp = phrase;
while( *cp != '\0')
{
size++;
cp++;
};
return size;
}
If you want to return the number of letters in the array, use:
int isLetter(char c)
{
return (( c >= 'a' && c <= 'z' ) || ( c >= 'A' && c <= 'Z' ));
}
int GetNumberOfLetters(char *phrase)
{
int num = 0;
char* cp = phrase;
while( *cp != '\0')
{
if ( isLetter(*cp) )
{
num++;
}
cp++;
};
return num;
}
This will count the number of alphabetic characters in a c-string:
#include <ctype.h>
int numberOfLetters(char *s)
{
int n = 0;
while (*s)
if (isalpha(*s++))
n++;
return n;
}
If you want the actual number of characters, counting characters like spaces and numbers, just use strlen(s) located in string.h.
To find the length of C string, you can use strlen() function
#include<string.h>
char str[]="GJHKL";
const char *str1="hhkjj";
int len1=strlen(str)<<"\n";
int len2=strlen(str1);
It's not particularly good C. I doesn't give you the size of a char array -- that's impossible to determine if you've lost that information. What it does give you is the size of a null-terminated char array (AKA a c-string), and it does so by counting the characters until it finds the null-terminator (0 byte or '\n'). As a matter of fact, what you've got up top is a not particularly good strlen implementation (strlen is a standard library function that does the same thing -- determine the size of a null-terminated char array)
I believe this below should be a little more C-ish implementation of the same thing:
size_t strlen(const char *s){
const char* ptr = s;
for(; *ptr; ++ptr); //move the pointer until you get '\0'
return ptr-s; //return the difference from the original position (=string length;)
}
It returns size_t (64 bit unsigned int if you're on a 64 bit machine and 32 on 32 machines, so it will work on arbitrarily long strings as long as they fit into memory) and it also declares that it won't modify the array it measures (const char *s means a pointer you promise not to use to change what it points to).

Resources