I wrote this small piece of code in C to test memcmp() strncmp() strcmp() functions in C.
Here is the code that I wrote:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *word1="apple",*word2="atoms";
if (strncmp(word1,word2,5)==0)
printf("strncmp result.\n");
if (memcmp(word1,word2,5)==0)
printf("memcmp result.\n");
if (strcmp(word1,word2)==0)
printf("strcmp result.\n");
}
Can somebody explain me the differences because I am confused with these three functions?
My main problem is that I have a file in which I tokenize its line of it,the problem is that when I tokenize the word "atoms" in the file I have to stop the process of tokenizing.
I first tried strcmp() but unfortunately when it reached to the point where the word "atoms" were placed in the file it didn't stop and it continued,but when I used either the memcmp() or the strncmp() it stopped and I was happy.
But then I thought,what if there will be a case in which there is one string in which the first 5 letters are a,t,o,m,s and these are being followed by other letters.
Unfortunately,my thoughts were right as I tested it using the above code by initializing word1 to "atomsaaaaa" and word2 to atoms and memcmp() and strncmp() in the if statements returned 0.On the other hand strcmp() it didn't. It seems that I must use strcmp().
In short:
strcmp compares null-terminated C strings
strncmp compares at most N characters of null-terminated C strings
memcmp compares binary byte buffers of N bytes
So, if you have these strings:
const char s1[] = "atoms\0\0\0\0"; // extra null bytes at end
const char s2[] = "atoms\0abc"; // embedded null byte
const char s3[] = "atomsaaa";
Then these results hold true:
strcmp(s1, s2) == 0 // strcmp stops at null terminator
strcmp(s1, s3) != 0 // Strings are different
strncmp(s1, s3, 5) == 0 // First 5 characters of strings are the same
memcmp(s1, s3, 5) == 0 // First 5 bytes are the same
strncmp(s1, s2, 8) == 0 // Strings are the same up through the null terminator
memcmp(s1, s2, 8) != 0 // First 8 bytes are different
memcmp compares a number of bytes.
strcmp and the like compare strings.
You kind of cheat in your example because you know that both strings are 5 characters long (plus the null terminator). However, what if you don't know the length of the strings, which is often the case? Well, you use strcmp because it knows how to deal with strings, memcmp does not.
memcmp is all about comparing byte sequences. If you know how long each string is then yeah, you could use memcmp to compare them, but how often is that the case? Rarely. You often need string comparison functions because, well... they know what a string is and how to compare them.
As for any other issues you are experiencing it is unclear from your question and code. Rest assured though that strcmp is better equipped in the general case for string comparisons than memcmp is.
strcmp():
It is used to compare the two string stored in two variable, It takes some time to compare them. And so it slows down the process.
strncmp():
It is very much similar to the previous one, but in this one, it compares the first n number of characters alone. This also slows down the process.
memcmp():
This function is used compare two variables using their memory. It doesn't compare them one by one, It compares four characters at one time. If your program is too concerned about speed, I recommend using memcmp().
To summarize:
strncmp() and strcmp() treat a 0 byte as the end of a string, and don't compare beyond it
to memcmp(), a 0 byte has no special meaning
strncmp and memcmp are same except the fact that former takes care of NULL terminated string.
For strcmp you'll want to be only comparing what you know are going to be strings however sometimes this is not always the case such as reading lines of binary files and there for you would want to use memcmp to compare certain lines of input that contain NUL characters but match and you may want to continue checking further lengths of input.
Related
I have a global definition as following:
#define globalstring "example1"
typedef struct
{
char key[100];
char trail[10][100];
bson_value_t value;
} ObjectInfo;
typedef struct
{
ObjectInfo CurrentOrderInfoSet[5];
} DataPackage;
DataPackage GlobalDataPackage[10];
And I would like to use the strcpy() function in some of my functions as following:
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, "example2");
char string[100] = "example3";
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, string);
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, globalstring);
First question: Are the global defined strings all initiated with 100 times '\0'?
Second qestion: I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it is or does it fully delete any content of the destination string prior to that?
Third question: All my strings are fixed length of 100. If I use the 3 examples of strcpy() above, with my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it? Meaning do I run into problems when using functions like strlen(), printf() later?
Fourth question: What happens when I strcpy() empty strings?
I plan to overwrite these strings in loops various times and would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Thx.
Are the global defined strings all initiated with 100 times '\0'?
Yes. Global char arrays will be initilizated to all zeros.
I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it
Exactly. It copies the characters up until and including '\0' and does not care about the rest.
If I use ... my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it?
Yes, but NULL is a pointer, it's terminated with zero byte, sometimes called NUL. You might want to see What is the difference between NUL and NULL? .
Meaning do I run into problems when using functions like strlen(), printf() later?
Not if your string lengths are less than or equal to 99.
What happens when I strcpy() empty strings?
It just copies one zero byte.
would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Safety is a broad concept. As far as safety as in if the program will execute properly, there is no point in caring about anything after zero byte, so just strcpy it.
But you should check if your strings are less than 99 characters and handle what to do it they are longer. You might be interested in strnlen, but the interface is confusing - I recommend to use memcpy + explicitly manually set zero byte.
My question is that how will strcmp() handle the following case:
strcmp("goodpassT", "goodpass");
I read that the comparison continues until a different character is found or null character (\0) is found in any of the strings. In the above case, when it encounters \0 for the second argument, will it just stop comparison, or will it still compare to the T character ? The return value is 1, but I'm not sure about the stopping condition.
The comparison is done using unsigned char. Thus the shorter string is smaller as its terminating 0 is smaller than other unsigned nonzero char in the longer string.
See http://port70.net/~nsz/c/c11/n1570.html#7.24.4p1
The answer for this function strcmp("goodpassT", "goodpass"); will be 1 only.The point upto which lengths of both the string are same will be compared on the basis of their ASCII value.
This question already has answers here:
How do I properly compare strings in C?
(10 answers)
Closed 5 years ago.
I ve been coding in C++, completly new in C.
Why doesnt it work? I want to end program by typing exit
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char command[4];
do{
printf( " -> " ) ;
scanf("%c", &command);
}while(&command != "exit");
return 0;
}
Because in C you have to use strcmp for string comparison.
In C a string is a sequence of characters that ends with the '\0'-terminating byte, whose value is 0.
The string "exit" looks like this in memory:
+-----+-----+-----+-----+------+
| 'e' | 'x' | 'i' | 't' | '\0' |
+-----+-----+-----+-----+------+
where 'e' == 101, 'x' == 120, etc.
The values of the characters are determined by the codes of the ASCII Table.
&command != "exit"
is just comparing pointers.
while(strcmp(command, "exit") != 0);
would be correct. strcmp returns 0 when both strings are equal, a non-zero
value otherwise. See
man strcmp
#include <string.h>
int strcmp(const char *s1, const char *s2);
DESCRIPTION
The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is
found, respectively, to be less than, to match, or be greater than s2.
But you've made another error:
scanf("%c", &command);
Here you are reading 1 character only, this command is not a string.
scanf("%s", command);
would be correct.
The next error would be
char command[4];
This can hold strings with a maximal length of 3 characters, so "exit" doesn't
fit in the buffer.
Make it
char command[1024];
Then you can store a string with max. length of 1023 bytes.
In general, of want to save a string of length n, you need a char array of
at least n+1 dimension.
You use strcmp, obviously:
while (strcmp(c, "exit"))
What your code does is compare the address of the input buffer with the address of the static string "exit", which of course will never match. You must compare the characters at the pointers.
The orher problem is you have a four byte buffer for a five byte string, the terminator character needs to fit. C is extremely tricky this way, you'll need to allocate a "big enough" buffer for whatever people might type in or the program will immediately crash. Use 1024 or something reasonably big for test programs.
Now I say "obviously" because when writing C code you should have a C standard library reference open at all times to be sure you're using the correct functions and arguments, plus to know what tools you have available.
Multiple issues with the code
You need five chars not four to include the ending null char \0.
You should use %s for inputting string. %c is for characters
You are comparing memory locations (or pointers) in the while loop. You need strcmp for comparing strings.
The strncmp() function really only has one use case (for lexicographical ordering):
One of the strings has a known length,† the other string is known to be NUL terminated. (As a bonus, the string with known length need not be NUL terminated at all.)
The reasons I believe there is just one use case (prefix match detection is not lexicographical ordering):‡ (1) If both strings are NUL terminated, strcmp() should be used, as it will do the job correctly; and (2) If both strings have known length, memcmp() should be used, as it will avoid the unnecessary check against NUL on a byte per byte basis.
I am seeking an idiomatic (and readable) way to use the function to lexicographically compare two such arguments correctly (one of them is NUL terminated, one of them is not necessarily NUL terminated, with known length).
Does an idiom exist? If so, what is it? If not, what should it be, or what should be used instead?
Simply using the result of strncmp() won't work, because it will result in a false equality result in the case that the argument with known length is shorter than the NUL terminated one, and it happens to be a prefix. Therefore, extra code is required to test for that case.
As a standalone function I don't see much wrong with this construction, and it appears idiomatic:
/* s1 is NUL terminated */
int variation_as_function (const char *s1, const char *s2, size_t s2len) {
int result = strncmp(s1, s2, s2len);
if (result == 0) {
result = (s1[s2len] != '\0');
}
return result;
}
However, when inlining this construction into code, it results in a double test for 0 when equality needs special action:
int result = strncmp(key, input, inputlen);
if (result == 0) {
result = (key[inputlen] != '\0');
}
if (result == 0) {
do_something();
} else {
do_something_else();
}
The motivation for inlining the call is because the standalone function is esoteric: It matters which string argument is NUL terminated and which one is not.
Please note, the question is not about performance, but about writing idiomatic code and adopting best practices for coding style. I see there is some DRY violation with the comparison. Is there a straightforward way to avoid the duplication?
† By known length, I mean the length is correct (there is no embedded NUL that would truncate the length). In other words, the input was validated at some earlier point in the program, and its length was recorded, but the input is not explicitly NUL terminated. As a hypothetical example, a scanner on a stream of text could have this property.
‡ As has been pointed out by addy2012, strncmp() could be used for prefix matching. I as focused on lexicographical ordering. However, (1) If the length of the prefix string is used as the length argument, both arguments need to be NUL terminated to guard against reading past an input string shorter than the prefix string. (2) If the minimum length is known between the prefix string and the input string, then memcmp() would be a better choice in terms of providing equivalent functionality at less CPU cost and no loss in readability.
The strncmp() function really only has one use case:
One of the strings has a known length, the other string is known to be
NUL terminated.
No, you can use it to compare the beginnings of two strings, no matter if the length of any string is known or not. For example, if you have an array / a list with last names, and you want to find all which begin with "Mac".
In fact, strncmp should generally be used in preference to strcmp unless you know absolutely know that both strings are well-formed and nul-terminated.
Why? Because otherwise you have a vulnerability to buffer overflows.
This rule is unfortunately not followed often.
There are a lot of buffer overflow errors.
Update
I think the core error here is in "one of the strings has a known length". No C string has a known length a priori. They're not like Pascal or Java strings, which are essentially a pair of (length, buffer). A C string is by definition a char[] identifying a chunk of memory, with the distinguished symbol \0 to identify the end. strncmp, strncpy etc exist to protect against attempts to use a chunk of memory as a string that is not well-formed.
in one of my university assignments I am restricted in the libraries I use. I am new to C and pointers and want to see if two strings (or should I say char's) are equal.
Part of me wants to loop through every char of the 'char string' and test equivalence, but then it comes back how to test equivalence (lol).
Any help is appreciated.
edit: I am seeing this:
warning: result of comparison against a string literal is
unspecified (use strncmp instead) [-Wstring-compare]
which leads to a segmentation fault. I know it has to do with this piece of code because all I added was:
if (example.name == "testName"){
printf("here!\n");
}
Part of me wants to loop through every char of the 'char string' and test equivalence
That's exactly what you need to do. Make a function mystrcmp with the signature identical to regular strcmp,
int mystrcmp ( const char * str1, const char * str2 );
and write your own implementation.
but then it comes back how to test equivalence.
When you loop character-by-character, you test equivalence of individual characters, not strings. Characters in C can be treated like numbers: you can compare them for equality using ==, check what character code is less than or greater than using < and >, and so on.
The only thing left to do now is deciding when to stop. You do that by comparing the current character of each string to zero, which is the null terminator.
Don't forget to forward-declare your mystrcmp function before using it.
A string in C is terminated with null character(0x00 or \0).You should compare both strings in a loop character by character till null char for either of the string is reached.
Loop should be broken if characters are not equal.
EDIT:
To answer your edit in question:
You should take two character pointers pointing to both strings and then copmare them like
//loop start,loop till null for any one of the string is found
if(*ptr1 != *ptr2)
{
//break loop
}
ptr1++;ptr2++;
//end loop
if((*ptr1 == *ptr2) &&(*ptr1== 0x00))
{
//strings are equal
}
Given that this is a university assignment, you should pay heed to chars just being small integers. You should also pay heed that C strings are contiguous memory buffers terminated by a binary zero (0x00).
You should also learn about pointer math. You will learn ways to shorten the code you have to write while learning something really interesting concerning the C language and how computers work. It will certainly help you if you choose a career on lower-level programming.