using the integer difference in strcmp in c programming - c

i have this below program
#include <stdio.h>
#include <stdlib.h>
int main()
{
char text1[30],text2[30];
int diff;
puts("Enter text1:");
fgets(text1,30,stdin);
puts("Enter text2:");
fgets(text2,30,stdin);
diff=strcmp(text1,text2);
printf("Difference between %s and %s is %d",text1,text2,diff);
}
if i give text1 as inputtext and text2 as differencetext , then the difference should be 5 , but i am getting as 1 for different inputs , i am not sure where i am going wrong.

The specification for strcmp in the C standard says only that it “returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2” (C 2011 N1570 7.24.4.2 3, C 2018 ibid).
You may not rely on more specific behavior, such as returning a specific value, unless you have an additional guarantee from your C implementation.

All that the specifications say is that strcmp will return a number "less than", "greater than" or "equal to" zero depending on the result of the comparison.
I'm not sure why you believe that the difference should be 5.

I think you misunderstood what strcmp does:
int strcmp(const char *s1, const char *s2);
Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

From cplusplus.com:
About strcmp return value
Returns an integral value indicating the relationship between the strings:
A zero value indicates that both strings are equal.
A value greater than zero indicates that the first character that does not match has a greater value in str1 than in str2; And a value less than zero indicates the opposite.

That's because strcmp return an int: negative if first is less than second, positive non-zero if second is less that first and 0 if equal.

Related

Why does memcmp return a negative value when there is a positive difference?

#include <stdio.h>
#include <string.h>
int main()
{
int test1 = 8410092; // 0x8053EC
int test2 = 8404974; // 0x803FEE
char *t1 = ( char*) &test1;
char *t2 = (char*) &test2;
int ret2 = memcmp(t1,t2,4);
printf("%d",ret2);
}
Here's a very basic function that when run prints -2. Maybe I am totally misunderstanding memcmp, but I thought if it returns the difference between the first different bytes. Since test1 is a larger num than test2, shouldn't the printed value be positive?
I am using the standard gcc.7 compiler for ubuntu.
As pointed out in the comments, memcmp() runs byte comparison. Here is a man quote
int memcmp(const void *s1, const void *s2, size_t n);
RETURN VALUE:
The memcmp() function returns an integer less than, equal to, or
greater than zero if the first n bytes of s1 is found, respectively,
to be less than, to match, or be greater than the first n bytes of
s2
For a nonzero return value, the sign is determined by the sign of the
difference between the first pair of bytes (interpreted as unsigned
char) that differ in s1 and s2.
If n is zero, the return value is zero.
http://man7.org/linux/man-pages/man3/memcmp.3.html
If the bytes are not the same, the sign of the difference depends on the target endianness.
One application of memcmp() is testing if two large arrays are the same, which could be faster than writing a loop that runs element by element comparison. Refer to this stack questions for more details. Why is memcmp so much faster than a for loop check?
memcmp compares memory. That is, it compares the bytes used to represent objects. The bytes used to represent objects may vary from one C implementation to another. Per C 2018 6.2.6 2:
Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,
order, and encoding of which are either explicitly specified or implementation-defined.
To compare the values represented by objects, use the ordinary operators <, <=, >, >=, ==, and !=. Comparing the memory of objects with memcmp should be used for limited purposes, such as inserting objects into a tree that only needs to be able to store and retrieve items without caring about what their values mean.

Weird return value in strcmp [duplicate]

This question already has answers here:
Inconsistent strcmp() return value when passing strings as pointers or as literals
(2 answers)
Closed 4 years ago.
While checking the return value of strcmp function, I found some strange behavior in gcc. Here's my code:
#include <stdio.h>
#include <string.h>
char str0[] = "hello world!";
char str1[] = "Hello world!";
int main() {
printf("%d\n", strcmp("hello world!", "Hello world!"));
printf("%d\n", strcmp(str0, str1));
}
When I compile this with clang, both calls to strcmp return 32. However, when compiling with gcc, the first call returns 1, and the second call returns 32. I don't understand why the first and second calls to strcmp return different values when compiled using gcc.
Below is my test environment.
Ubuntu 18.04 64bit
gcc 7.3.0
clang 6.0.0
It looks like you didn't enable optimizations (e.g. -O2).
From my tests it looks like gcc always recognizes strcmp with constant arguments and optimizes it, even with -O0 (no optimizations). Clang needs at least -O1 to do so.
That's where the difference comes from: The code produced by clang calls strcmp twice, but the code produced by gcc just does printf("%d\n", 1) in the first case because it knows that 'h' > 'H' (ASCIIbetically, that is). It's just constant folding, really.
Live example: https://godbolt.org/z/8Hg-gI
As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1. The strcmp library function apparently uses a different value.
The standard defines the result of strcmp to be negative, if lhs appears before rhs in lexical order, zero if they are equal, or a positive value if lhs appears lexically after rhs.
It's up to the implementation how to implement that and what exactly to return. You must not depend on a specific value in your programs, or they won't be portable. Simply check with comparisons (<, >, ==).
See https://en.cppreference.com/w/c/string/byte/strcmp
Background
One simple implementation might just calculate the difference of each character c1 - c2 and do that until the result is not zero, or one of the strings ends. The result will then be the numeric difference between the first character, in which the two strings differed.
For example, this GLibC implementation: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;hb=HEAD
The strcmp function is only specified to return a value larger than zero, zero, or less than zero. There's nothing specified what those positive and negative values have to be.
The exact values returned by strcmp in the case of the strings not being equal are not specified. From the man page:
#include <string.h>
int strcmp(const char *s1, const char *s2);
int strncmp(const char *s1, const char *s2, size_t n);
The strcmp() and strncmp() functions return an integer less than,
equal to, or greater than zero if s1 (or the first n bytes thereof) is
found, respectively, to be less than, to match, or be greater than s2.
Since str1 compares greater than str2, the value must be positive, which it is in both cases.
As for the difference between the two compilers, it appears that clang is returning the difference between the ASCII values for the corresponding characters that mismatched, while gcc is opting for a simple -1, 0, or 1. Both are valid, so your code should only need to check if the value is 0, greater than 0, or less than 0.

What exactly is strcmp(String comparison) doing?

My following code for testing strcmp is as follows:
char s1[10] = "racecar";
char *s2 = "raceCar"; //yes, a capital 'C'
int diff;
diff = strcmp(s1,s2);
printf(" %d\n", diff);
So I am confused on why the output is 32. What exactly is it comparing to get that result? I appreciate your time and help.
Whatever it wants. In this case, it looks like the value you're getting is 'c' - 'C' (the difference between the two characters at the first point where the strings differ), which is equal to 32 on many systems, but you shouldn't by any means count on that. The only thing that you can count on is that the return will be 0 if the two strings are equal, negative if s1 comes before s2, and positive if s1 comes after s2.
The man pages states that the output will be greater than 0 or less than 0 if the strings are not the same. It doesn't say anything else regarding the exact value (if not 0).
That being said, the ASCII codes for c and C differ by 32. That's probably where the result is coming from. You can't depend on this behavior being identical in any two given implementations however.
It is not specified. According to the standard:
7.24.4.2 The strcmp function
#include <string.h>
int strcmp(const char *s1, const char *s2);
Description
The strcmp function compares the string pointed to by s1 to the string pointed to by
s2.
Returns
The strcmp function returns an integer greater than, equal to, or less than zero,
accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
According to the C standard (N1570 7.24.4.2):
The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by s1 is
greater than, equal to, or less than the string pointed to by
s2.
It says nothing about which positive or negative value it will return if the strings are unequal, and portable code should only check whether the result is less than, equal to, or greater than zero.
Having said that, a straightforward implementation of strcmp would likely return the numeric difference in the values of the first characters that don't match. In your case, the first non-matching characters are 'c' and 'C', which happen to differ by 32 in ASCII.
Don't count on this.
"strcmp" compares strings and when it reaches a different character, it will return the difference between them.
In your case, it reaches 'c' in your first string, and 'C' in your second string. 'c' in hex is 0x63 while 'C' is 0x43. Subtract and you get 0x20, which is 32 in decimal.
We use strcmp to check if strings are equal if the function returns 0.
strcmp compares the strings character by character until it reaches characters that don't match or the terminating null-character.
so the strcmp function sees that c (which is 99 in ASCII) is greater than C (which is 67 in ascii), so it returns a positive integer. Whatever positive integer it returns is I think usually defined by your system or whatever version of c you are compiling.

When will strcmp not return -1, 0 or 1?

From the man page:
The strcmp() and strncmp() functions return an integer less than, equal
to, or greater than zero if s1 (or the first n bytes thereof) is found,
respectively, to be less than, to match, or be greater than s2.
Example code in C (prints -15 on my machine, swapping test1 and test2 inverts the value):
#include <stdio.h>
#include <string.h>
int main() {
char* test1 = "hello";
char* test2 = "world";
printf("%d\n", strcmp(test1, test2));
}
I found this code (taken from this question) that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort). To me, this is terrible style and depends on undocumented features.
I guess I have two, related questions:
Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero? If not, what does the standard implementation do?
Is the return value consistent across the Linux, Windows and the BSDs?
Edit:
After leaving my computer for 5 minutes, I realized that there is in fact no error with the code in question. I struck out the parts that I figured out before reading the comments/answers, but I left them there to keep the comments relevant. I think this is still an interesting question and may cause hiccups for programmers used to other languages that always return -1, 0 or 1 (e.g. Python seems to do this, but it's not documented that way).
FWIW, I think that relying on something other than the documented behavior is bad style.
Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero?
No. The tightest constraint is that it should be zero, less than zero or more than zero, as specified in the documentation of this particular function.
If not, what does the standard implementation do?
There's no such thing as "the standard implementation". Even if there was, it would probably just
return zero, less than zero or more than zero;
:-)
Is the return value consistent across the Linux, Windows and the BSDs?
I can confirm that it's consistent across Linux and OS X as of 10.7.4 (specifically, it's -1, 0 or +1). I have no idea about Windows, but I bet Microsoft guys use -2 and +3 just to break code :P
Also, let me also point out that you have completely misunderstood what the code does.
I found this code (taken from this question) that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort). To me, this is terrible style and depends on undocumented features.
No, it actually doesn't. The C standard library is designed with consistency and ease of use in mind. That is, what qsort() requires is that its comparator function returns a negative or a positive number or zero - exactly what strcmp() is guaranteed to do. So this is not "terrible style", it's perfectly standards-conformant code which does not depend upon undocumented features.
In the C99 standard, §7.21.4.2 The strcmp function:
The strcmp function returns an integer greater than, equal to, or less than zero,
accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
Emphasis added.
It means the standard doesn't guarantee about the -1, 0 or 1; it may vary according to operating systems.
The value you are getting is the difference between w and h which is 15.
In your case hello and world so 'h'-'w' = -15 < 0 and that's why strcmp returns -15.
• Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero? If not, what does the standard implementation do?
No, as you mentioned yourself the man page says less than, equal to, or greater than zero and that's what the standard says as well.
• Is the return value consistent across the Linux, Windows and the BSDs?
No.
On Linux (OpenSuSE 12.1, kernel 3.1) with gcc, I get -15/15 depending on if test1 or test2 is first. On Windows 7 (VS 2010) I get -1/1.
Based on the loose definition of strcmp(), both are fine.
...that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort).
An interesting side note for you... if you take a look at the qsort() man page, the example there is pretty much the same as the Bell code you posted using strcmp(). The reason being the comparator function that qsort() requires is actually a great fit for the return from strcmp():
The comparison function must return an integer less than, equal to, or
greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second.
In reality, the return value of strcmp is likely to be the difference between the values of the bytes at the first position that differed, simply because returning this difference is a lot more efficient than doing an additional conditional branch to convert it to -1 or 1. Unfortunately, some broken software has been known to assume the result fits in 8 bits, leading to serious vulnerabilities. In short, you should never use anything but the sign of the result.
For details on the issues, read the article I linked above:
https://communities.coverity.com/blogs/security/2012/07/19/more-defects-like-the-mysql-memcmp-vulnerability
In this page:
The strcmp() function compares the string pointed to by s1 to the string pointed to by s2.
The sign of a non-zero return value is determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.
Here is an implementation of strcmp in FreeBSD.
#include <string.h>
/*
* Compare strings.
*/
int
strcmp(s1, s2)
register const char *s1, *s2;
{
while (*s1 == *s2++)
if (*s1++ == 0)
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
From the manual page:
RETURN VALUE
The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes
thereof) is found, respectively, to
be less than, to match, or be greater than s2.
It only specifies that it is greater or less than 0, doesn't say anything about specific values, those are implementation specific i suppose.
CONFORMING TO
SVr4, 4.3BSD, C89, C99.
This says in which standards it is included. The function must exist and behave as specified, but the specification doesn't say anything about the actual returned values, so you can't rely on them.
There's nothing in the C standard that talks about the value returned by strcmp() (that is, other than the sign of that value):
7.21.4.2 The strcmp function
Synopsis
#include <string.h>
int strcmp(const char *s1, const char *s2);
Description
The strcmp function compares the string pointed to by s1
to the string pointed to by s2.
Returns
The strcmp function returns an integer greater than, equal
to, or less than zero, accordingly as the string pointed to by s1 is
greater than, equal to, or less than the string pointed to by s2.
It is therefore pretty clear that using anything other than the sign of the returned value is a poor practice.

strncmp C Exercise

I'm trying to do exercise 5-4 in the K&R C book. I have written the methods for strncpy and strncat, but I'm having some trouble understanding exactly what to return for the strncmp part of the exercise.
The definition of strncmp (from Appendix B in K&R book) is:
compare at most n characters of string s to string t; return <0 if s<t, 0 if s==t, or >0 if s>t
Lets say I have 3 strings:
char s[128] = "abc"
char t[128] = "abcdefghijk"
char u[128] = "hello"
And I want to compare them using the strncmp function I have to write. I know that
strncmp(s, t, 3)
will return 0 ,because abc == abc. Where I'm confused is the other comparisons. For example
strncmp(s, t, 5) and
strncmp(s, u, 4)
The first matches up the 3th position and then after that they no longer match and the second example doesn't match at all.
I really just want know what those 2 other comparisons return and why so that I can write my version of strncmp and finish the exercise.
Both return a negative number (it just compares using character order). I just did a quick test and on my machine it's returning the difference of the last-compared characters. So:
strncmp(s, t, 5) = -100 // '\0' - 'd'
strncmp(s, u, 4) = -7 // 'a' - 'h'
Is that what you're looking for?
The characters in the first non-matching positions are cast to unsigned char and then compared numerically - if that character in s1 is less than the corresponding character in s2, then a negative number is returned; if it's greater, a positive number is returned.
The contract for strncmp is to return an integral value whose sign indicates the result of the comparison:
a negative value indicates that the 1st operand compares as being "less than" the 2nd operand,
a positive, non-zero value indicates that the 1st operand compares as being "greater than" than the 2nd operand, and
0 indicates that the two operands compare as being "equal to" each other.
The reason it's defined that way, rather than, say, "return -1 for "less than", 0 for "equal to" and +1 for "greater than" is to not constrain the implementation.
The value returned for a particular C runtime library is dependent upon how the function is implemented. The Posix specification (IEEE 1003.1) for strncmp() (which tracks the C Standard) says:
The strncmp() function shall compare not more than n bytes (bytes that follow a null
byte are not compared) from the array pointed to by s1 to the array pointed to by s2.
The sign of a non-zero return value is determined by the sign of the difference
between the values of the first pair of bytes (both interpreted as type unsigned
char) that differ in the strings being compared.
That should be about all you need to know to implement it. You should note, though that:
strncmp() is not "safe", in the sense that it is subject to buffer overflows. A proper implementation will merrily compare characters until it encounters an ASCII NUL, hits the maximum length, or tries to access protected memory.
The specification says that the sign of the return value is based on the delta between the 1st pair of characters that differ; no particular return value is mandated.
Good luck.
it is lexicographic order, strings are compared in alphabetical order from left to right.
So abc < abcdefghijk < hello
strncmp(s, t, 5) = -1
strncmp(s, t, 5) = -1

Resources