C miscalculating distances [duplicate] - c

I have a struct here with something like:
char *sname;
........
players[i].sname
equalling "James".
I need to check for equality between values like so:
if (players[i].sname == 'Lee')
but am not having much luck. Is there a str* function I should be using or is there anyway to fix up my if statement?

The short answer: strcmp().
The long answer: So you've got this:
if(players[i].sname == 'Lee')
This is wrong in several respects. First, single-quotes mean "character literal" not "string literal" in C.
Secondly, and more importantly, "string1" == "string2" doesn't compare strings, it compares char *s, or pointers to characters. It will tell you if two strings are stored in the same memory location. That would mean they're equal, but a false result wouldn't mean they're inequal.
strcmp() will basically go through and compare each character in the strings, stopping at the first character that isn't equal, and returning the difference between the two characters (which is why you have to say strcmp() == 0 or !strcmp() for equality).
Note also the functions strncmp() and memcmp(), which are similar to strcmp() but are safer.

You should be using strcmp():
if (strcmp(players[i].sname, "Lee") == 0) { ...
Also note that strings in C are surrounded by double quotes: "". Single characters are surrounded by single quotes: ''. I'm not sure exactly what your compiler might be doing with 'Lee', but it's almost certainly not what you want.

You'd be looking for strcmp() from the header <string.h>.
Note that you need a string — 'Lee' is not a string but a multi-character constant, which is allowed but seldom useful, not least because the representation is defined by the compiler, not the C standard.
If you are looking to compare two strings — call the pointers to them first and second, then you write:
if (strcmp(first, second) == 0) // first equal to second
if (strcmp(first, second) <= 0) // first less than or equal to second
if (strcmp(first, second) < 0) // first less than second
if (strcmp(first, second) >= 0) // first greater than or equal to second
if (strcmp(first, second) > 0) // first greater than second
if (strcmp(first, second) != 0) // first unequal to second
This, in my view, makes it clear what the comparison is and so the notation should be used. Note that strcmp() may return any negative value to indicate 'less than' or any positive value to indicate 'greater than'.
You will find people who prefer:
if (strcmp(first, second)) // first unequal to second
if (!strcmp(first, second)) // first equal to second
IMO, they have the advantage of brevity but the disadvantage of being less clear than the explicit comparisons with zero. YMMV.
Be cautious about using strncmp() instead of strcmp(), which was suggested in one answer. If you have:
if (strncmp(first, "command", 7) == 0)
then if first contains "commander", the match will be valid. If that's not what you want but you want to use strncmp() anyway, you would write:
if (strncmp(first, "command", sizeof("command")) == 0)
This will correctly reject "commander".

strcmp works fine, provided one of the strings is null terminated. If both are max length and identical, the comparison will walk off the end, and most likely give a false negative result for the equality test. If one string is fixed inside of "" marks in the strcmp itself, that's not an issue, because we know it's null terminated.
If we are comparing two string variables, and we don't know if they are max length or not (maximum length ones are not null terminated), then we need to use strncmp, and use sizeof on one of them to get the third parameter. This solves the problem, because the sizeof is aware of the maximum length. Strcmp is 100% safe if one of the strings is a literal in double quotes.

Related

specifically about strings on C [duplicate]

I have a struct here with something like:
char *sname;
........
players[i].sname
equalling "James".
I need to check for equality between values like so:
if (players[i].sname == 'Lee')
but am not having much luck. Is there a str* function I should be using or is there anyway to fix up my if statement?
The short answer: strcmp().
The long answer: So you've got this:
if(players[i].sname == 'Lee')
This is wrong in several respects. First, single-quotes mean "character literal" not "string literal" in C.
Secondly, and more importantly, "string1" == "string2" doesn't compare strings, it compares char *s, or pointers to characters. It will tell you if two strings are stored in the same memory location. That would mean they're equal, but a false result wouldn't mean they're inequal.
strcmp() will basically go through and compare each character in the strings, stopping at the first character that isn't equal, and returning the difference between the two characters (which is why you have to say strcmp() == 0 or !strcmp() for equality).
Note also the functions strncmp() and memcmp(), which are similar to strcmp() but are safer.
You should be using strcmp():
if (strcmp(players[i].sname, "Lee") == 0) { ...
Also note that strings in C are surrounded by double quotes: "". Single characters are surrounded by single quotes: ''. I'm not sure exactly what your compiler might be doing with 'Lee', but it's almost certainly not what you want.
You'd be looking for strcmp() from the header <string.h>.
Note that you need a string — 'Lee' is not a string but a multi-character constant, which is allowed but seldom useful, not least because the representation is defined by the compiler, not the C standard.
If you are looking to compare two strings — call the pointers to them first and second, then you write:
if (strcmp(first, second) == 0) // first equal to second
if (strcmp(first, second) <= 0) // first less than or equal to second
if (strcmp(first, second) < 0) // first less than second
if (strcmp(first, second) >= 0) // first greater than or equal to second
if (strcmp(first, second) > 0) // first greater than second
if (strcmp(first, second) != 0) // first unequal to second
This, in my view, makes it clear what the comparison is and so the notation should be used. Note that strcmp() may return any negative value to indicate 'less than' or any positive value to indicate 'greater than'.
You will find people who prefer:
if (strcmp(first, second)) // first unequal to second
if (!strcmp(first, second)) // first equal to second
IMO, they have the advantage of brevity but the disadvantage of being less clear than the explicit comparisons with zero. YMMV.
Be cautious about using strncmp() instead of strcmp(), which was suggested in one answer. If you have:
if (strncmp(first, "command", 7) == 0)
then if first contains "commander", the match will be valid. If that's not what you want but you want to use strncmp() anyway, you would write:
if (strncmp(first, "command", sizeof("command")) == 0)
This will correctly reject "commander".
strcmp works fine, provided one of the strings is null terminated. If both are max length and identical, the comparison will walk off the end, and most likely give a false negative result for the equality test. If one string is fixed inside of "" marks in the strcmp itself, that's not an issue, because we know it's null terminated.
If we are comparing two string variables, and we don't know if they are max length or not (maximum length ones are not null terminated), then we need to use strncmp, and use sizeof on one of them to get the third parameter. This solves the problem, because the sizeof is aware of the maximum length. Strcmp is 100% safe if one of the strings is a literal in double quotes.

Is there an idiomatic use of strncmp()?

The strncmp() function really only has one use case (for lexicographical ordering):
One of the strings has a known length,† the other string is known to be NUL terminated. (As a bonus, the string with known length need not be NUL terminated at all.)
The reasons I believe there is just one use case (prefix match detection is not lexicographical ordering):&ddagger; (1) If both strings are NUL terminated, strcmp() should be used, as it will do the job correctly; and (2) If both strings have known length, memcmp() should be used, as it will avoid the unnecessary check against NUL on a byte per byte basis.
I am seeking an idiomatic (and readable) way to use the function to lexicographically compare two such arguments correctly (one of them is NUL terminated, one of them is not necessarily NUL terminated, with known length).
Does an idiom exist? If so, what is it? If not, what should it be, or what should be used instead?
Simply using the result of strncmp() won't work, because it will result in a false equality result in the case that the argument with known length is shorter than the NUL terminated one, and it happens to be a prefix. Therefore, extra code is required to test for that case.
As a standalone function I don't see much wrong with this construction, and it appears idiomatic:
/* s1 is NUL terminated */
int variation_as_function (const char *s1, const char *s2, size_t s2len) {
int result = strncmp(s1, s2, s2len);
if (result == 0) {
result = (s1[s2len] != '\0');
}
return result;
}
However, when inlining this construction into code, it results in a double test for 0 when equality needs special action:
int result = strncmp(key, input, inputlen);
if (result == 0) {
result = (key[inputlen] != '\0');
}
if (result == 0) {
do_something();
} else {
do_something_else();
}
The motivation for inlining the call is because the standalone function is esoteric: It matters which string argument is NUL terminated and which one is not.
Please note, the question is not about performance, but about writing idiomatic code and adopting best practices for coding style. I see there is some DRY violation with the comparison. Is there a straightforward way to avoid the duplication?
† By known length, I mean the length is correct (there is no embedded NUL that would truncate the length). In other words, the input was validated at some earlier point in the program, and its length was recorded, but the input is not explicitly NUL terminated. As a hypothetical example, a scanner on a stream of text could have this property.
&ddagger; As has been pointed out by addy2012, strncmp() could be used for prefix matching. I as focused on lexicographical ordering. However, (1) If the length of the prefix string is used as the length argument, both arguments need to be NUL terminated to guard against reading past an input string shorter than the prefix string. (2) If the minimum length is known between the prefix string and the input string, then memcmp() would be a better choice in terms of providing equivalent functionality at less CPU cost and no loss in readability.
The strncmp() function really only has one use case:
One of the strings has a known length, the other string is known to be
NUL terminated.
No, you can use it to compare the beginnings of two strings, no matter if the length of any string is known or not. For example, if you have an array / a list with last names, and you want to find all which begin with "Mac".
In fact, strncmp should generally be used in preference to strcmp unless you know absolutely know that both strings are well-formed and nul-terminated.
Why? Because otherwise you have a vulnerability to buffer overflows.
This rule is unfortunately not followed often.
There are a lot of buffer overflow errors.
Update
I think the core error here is in "one of the strings has a known length". No C string has a known length a priori. They're not like Pascal or Java strings, which are essentially a pair of (length, buffer). A C string is by definition a char[] identifying a chunk of memory, with the distinguished symbol \0 to identify the end. strncmp, strncpy etc exist to protect against attempts to use a chunk of memory as a string that is not well-formed.

Does strlen() in a strncmp() expression defeat the purpose of using strncmp() over strcmp()?

By my understanding, strcmp() (no 'n'), upon seeing a null character in either argument, immediately stops processing and returns a result.
Therefore, if one of the arguments is known with 100% certainty to be null-terminated (e.g. it is a string literal), there is no security benefit whatsoever in using strncmp() (with 'n') with a call to strlen() as part of the third argument to limit the comparison to the known string length, because strcmp() will already never read more characters than are in that known-terminating string.
In fact, it seems to me that a call to strncmp() whose length argument is a strlen() on one of the first two arguments is only different from the strcmp() case in that it wastes time linear in the size of the known-terminating string by evaluating the strlen() expression.
Consider:
Sample code A:
if (strcmp(user_input, "status") == 0)
reply_with_status();
Sample code B:
if (strncmp(user_input, "status", strlen("status")+1) == 0)
reply_with_status();
Is there any benefit to the former over the latter? Because I see it in other people's code a lot.
Do I have a flawed understanding of how these functions work?
In your particular example, I would say it's detrimental to use strncmp because of:
Using strlen does the scan anyway
Repetition of the string literal "status"
Addition of 1 to make sure that strings are indeed equal
All these add to clutter, and in neither case would you be protected from overflowing user_input if it indeed was shorter than 6 characters and contained the same characters as the test string.
That would be exceptional. If you know that your input string always has more memory than the number of characters in your test strings, then don't worry. Otherwise, you might need to worry, or at least consider it. strncmp is useful for testing stuff inside large buffers.
My preference is for code readability.
Yes it does. If you use strlen in a strncmp, it will traverse the pointer until it sees a null in the string. That makes it functionally equivalent to strcmp.
In the particular case you've given, it's indeed useless. However, a slight alteration is more common:
if (strncmp(user_input, "status", strlen("status")) == 0)
reply_with_status();
This version just checks if user_input starts with "status", so it has different semantics.
Beyond tricks to check if the beginning of a string matches an input, strncmp is only going to be useful when you are not 100% sure that a string is null terminated before the end of its allocated space.
So, if you had a fixed size buffer that you took your user input in with you could use:
strncmp(user_input, "status", sizeof(user_input))
Therefore ensuring that your comparison does not overflow.
However in this case you do have to be careful, since if your user_input wasn't null terminated, then it will really be checking if user_input matches the beginning of status.
A better way might be to say:
if (user_input[sizeof(user_input) - 1] != '\0') {
// handle it, since it is _not_ equal to your string
// unless filling the buffer is valid
}
else if (strcmp(user_input, "status")) { ... }
Now, I agree that this isn't particularly useful use of strncmp(), and I can't see any benefit in this over strcmp().
However, if we change the code by removing the +1 after strlen, then it starts to be useful.
strncmp(user_input, "status", strlen("status"))
since that compares the first 6 characters of user_input with `"status" - which, at least sometimes is meaningful.
So, if the +1 is there, it becomes a regular strcmp - and it's just a waste of time calculating the length. But without the +1, it's quite a useful comparison (under the right circumstances).
strncmp() has limited use. The normal strcmp() will stop if it encounters a NUL on any of the two strings. (and in that case the strings are different) Strncmp() would stop and return zero ("the strings are equal in the first N characters")
One possible use of stncmp() is parsing options, upto the non-signifacant part, eg
if (!strncmp("-st", argv[xx], 3)) {}
, which would return zero for "-string" or "-state" or "-st0", but not for "-sadistic".

What is the difference between memcmp, strcmp and strncmp in C?

I wrote this small piece of code in C to test memcmp() strncmp() strcmp() functions in C.
Here is the code that I wrote:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *word1="apple",*word2="atoms";
if (strncmp(word1,word2,5)==0)
printf("strncmp result.\n");
if (memcmp(word1,word2,5)==0)
printf("memcmp result.\n");
if (strcmp(word1,word2)==0)
printf("strcmp result.\n");
}
Can somebody explain me the differences because I am confused with these three functions?
My main problem is that I have a file in which I tokenize its line of it,the problem is that when I tokenize the word "atoms" in the file I have to stop the process of tokenizing.
I first tried strcmp() but unfortunately when it reached to the point where the word "atoms" were placed in the file it didn't stop and it continued,but when I used either the memcmp() or the strncmp() it stopped and I was happy.
But then I thought,what if there will be a case in which there is one string in which the first 5 letters are a,t,o,m,s and these are being followed by other letters.
Unfortunately,my thoughts were right as I tested it using the above code by initializing word1 to "atomsaaaaa" and word2 to atoms and memcmp() and strncmp() in the if statements returned 0.On the other hand strcmp() it didn't. It seems that I must use strcmp().
In short:
strcmp compares null-terminated C strings
strncmp compares at most N characters of null-terminated C strings
memcmp compares binary byte buffers of N bytes
So, if you have these strings:
const char s1[] = "atoms\0\0\0\0"; // extra null bytes at end
const char s2[] = "atoms\0abc"; // embedded null byte
const char s3[] = "atomsaaa";
Then these results hold true:
strcmp(s1, s2) == 0 // strcmp stops at null terminator
strcmp(s1, s3) != 0 // Strings are different
strncmp(s1, s3, 5) == 0 // First 5 characters of strings are the same
memcmp(s1, s3, 5) == 0 // First 5 bytes are the same
strncmp(s1, s2, 8) == 0 // Strings are the same up through the null terminator
memcmp(s1, s2, 8) != 0 // First 8 bytes are different
memcmp compares a number of bytes.
strcmp and the like compare strings.
You kind of cheat in your example because you know that both strings are 5 characters long (plus the null terminator). However, what if you don't know the length of the strings, which is often the case? Well, you use strcmp because it knows how to deal with strings, memcmp does not.
memcmp is all about comparing byte sequences. If you know how long each string is then yeah, you could use memcmp to compare them, but how often is that the case? Rarely. You often need string comparison functions because, well... they know what a string is and how to compare them.
As for any other issues you are experiencing it is unclear from your question and code. Rest assured though that strcmp is better equipped in the general case for string comparisons than memcmp is.
strcmp():
It is used to compare the two string stored in two variable, It takes some time to compare them. And so it slows down the process.
strncmp():
It is very much similar to the previous one, but in this one, it compares the first n number of characters alone. This also slows down the process.
memcmp():
This function is used compare two variables using their memory. It doesn't compare them one by one, It compares four characters at one time. If your program is too concerned about speed, I recommend using memcmp().
To summarize:
strncmp() and strcmp() treat a 0 byte as the end of a string, and don't compare beyond it
to memcmp(), a 0 byte has no special meaning
strncmp and memcmp are same except the fact that former takes care of NULL terminated string.
For strcmp you'll want to be only comparing what you know are going to be strings however sometimes this is not always the case such as reading lines of binary files and there for you would want to use memcmp to compare certain lines of input that contain NUL characters but match and you may want to continue checking further lengths of input.

Checking contents of char variable - C Programming

This might seem like a very simple question, but I am struggling with it. I have been writing iPhone apps with Objective C for a few months now, but decided to learn C Programming to give myself a better grounding.
In Objective-C if I had a UILabel called 'label1' which contained some text, and I wanted to run some instructions based on that text then it might be something like;
if (label1.text == #"Hello, World!")
{
NSLog(#"This statement is true");
}
else {
NSLog(#"Uh Oh, an error has occurred");
}
I have written a VERY simple C Program I have written which uses printf() to ask for some input then uses scanf() to accept some input from the user, so something like this;
int main()
{
char[3] decision;
Printf("Hi, welcome to the introduction program. Are you ready to answer some questions? (Answer yes or no)");
scanf("%s", &decision);
}
What I wanted to do is apply an if statement to say if the user entered yes then continue with more questions, else print out a line of text saying thanks.
After using the scanf() function I am capturing the users input and assigning it to the variable 'decision' so that should now equal yes or no. So I assumed I could do something like this;
if (decision == yes)
{
printf("Ok, let's continue with the questions");
}
else
{
printf("Ok, thank you for your time. Have a nice day.");
}
That brings up an error of "use of undeclared identifier yes". I have also tried;
if (decision == "yes")
Which brings up "result of comparison against a string literal is unspecified"
I have tried seeing if it works by counting the number of characters so have put;
if (decision > 3)
But get "Ordered comparison between pointer and integer 'Char and int'"
And I have also tried this to check the size of the variable, if it is greater than 2 characters it must be a yes;
if (sizeof (decision > 2))
I appreciate this is probably something simple or trivial I am overlooking but any help would be great, thanks.
Daniel Haviv's answer told you what you should do. I wanted to explain why the things you tried didn't work:
if (decision == yes)
There is no identifier 'yes', so this isn't legal.
if (decision == "yes")
Here, "yes" is a string literal which evaluates to a pointer to its first character. This compares 'decision' to a pointer for equivalence. If it were legal, it would be true if they both pointed to the same place, which is not what you want. In fact, if you do this:
if ("yes" == "yes")
The behavior is undefined. They will both point to the same place if the implementation collapses identical string literals to the same memory location, which it may or may not do. So that's definitely not what you want.
if (sizeof (decision > 2))
I assume you meant:
if( sizeof(decision) > 2 )
The 'sizeof' operator evaluates at compile time, not run time. And it's independent of what's stored. The sizeof decision is 3 because you defined it to hold three characters. So this doesn't test anything useful.
As mentioned in the other answer, C has the 'strcmp' operator to compare two strings. You could also write your own code to compare them character by character if you wanted to. C++ has much better ways to do this, including string classes.
Here's an example of how you might do that:
int StringCompare(const char *s1, const char *s2)
{ // returns 0 if the strings are equivalent, 1 if they're not
while( (*s1!=0) && (*s2!=0) )
{ // loop until either string runs out
if(*s1!=*s2) return 1; // check if they match
s1++; // skip to next character
s2++;
}
if( (*s1==0) && (*s2==0) ) // did both strings run out at the same length?
return 0;
return 1; // one is longer than the other
}
You should use strcmp:
if(strcmp(decision, "yes") == 0)
{
/* ... */
}
You should be especially careful with null-terminated string in C programming. It is not object. It is a pointer to a memory address. So you can't compare content of decision directly with a constant string "yes" which is at another address. Use strcmp() instead.
And be careful that "yes" is actually "yes\0" which will take 4 bytes and the "\0" is very important to strcmp() which will be recognized as the termination during the comparison loop.
Ok a few things:
decision needs to be an array of 4 chars in order to fit the string "yes" in it. That's because in C, the end of a string is indicated by the NUL char ('\0'). So your char array will look like: { 'y', 'e', 's', '\0' }.
Strings are compared using functions such as strcmp, which compare the contents of the string (char array), and not the location/pointer. A return value of 0 indicates that the two strings match.
With: scanf("%s", &decision);, you don't need to use the address-of operator, the label of an array is the address of the start of the array.
You use strlen to get the length of a string, which will just increment a counter until it reaches the NUL char, '\0'. You don't use sizeof to check the length of strings, it's a compile-time operation which will return the value 3 * sizeof(char) for a char[3].
scanf is unsafe to use with strings, you should alternatively use fgets(stdin...), or include a width specifier in the format string (such as "3%s") in order to prevent overflowing your buffer. Note that if you use fgets, take into account it'll store the newline char '\n' if it reads a whole line of text.
To compare you could use strcmp like this:
if(strcmp(decision, "yes") == 0) {
// decision is equal to 'yes'
}
Also you should change char decision[3] into char decision[4] so that the buffer has
room for a terminating null character.
char decision[4] = {0}; // initialize to 0
There's several issues here:
You haven't allocated enough storage for the answer:
char[3] decision;
C strings are bytes in the string followed by an ASCII NUL byte: 0x00, \0. You have only allocated enough space for ye\0 at this point. (Well, scanf(3) will give you yes\0 and place that NUL in unrelated memory. C can be cruel.) Amend that to include space for the terminating \0 and amend your scanf(3) call to prevent the buffer overflow:
char[4] decision;
/* ... */
scanf("%3s", decision);
(I've left off the &, because simply giving the name of the array is the same as giving the address of its first element. It doesn't matter, but I believe this is more idiomatic.)
C strings cannot be compared with ==. Use strcmp(3) or strncmp(3) or strcasecmp(3) or strncasecmp(3) to compare your strings:
if(strcasecmp(decision, "yes") == 0) {
/* yes */
}
C has lots of lib functions to handle this but it pays to know what you are declaring.
Declaring
char[3] decision;
is actually declaring a char array of length 3. So therefor attempting a comparison of
if(decision == "yes")
is comparing a literal against and array and therefor will not work. Since there is no defined string type in C you have to use pointers, but not directly, if you don't want to. In C strings are in fact arrays of char so you can declare them both ways eg:
char[3] decision ;
* char decision ;
Both will in point of fact work but you in the first instance the compiler will allocate the memory for you, but it will ONLY allocate 3 bytes. Now since strings in C are null terminated you need to actually allocate 4 bytes since you need room for "yes" and the null. Declaring it the second way simply declares a pointer to someplace in memory but you have no idea really where. You would then have to allocate memory to contain whatever you are going to put there since to do otherwise will more then likely cause a SEGFAULT.
To compare what you get from input you have two options, either use the strcomp() function or do it yourself by iterating through decision and comparing each individual byte against "Y" and "E" and "S" until you hit null aka \0.
There are variations on strcomp() to deal with uppercase and lowercase and they are part of the standard string.h library.

Resources