C programming language: Behavior of strcmp(str1, str2) - c

In C, I have a character array:
d[20]
Which is assigned the value 'if' with a null termination character:
d[0]='i'
d[1]='f'
d[2]='\0'
Should the value of strcmp(d,"if") be 0? Why?
I expect strcmp to return a value of 0. When I run it, I get the value of -1

If you mean d[0] = 'i'; d[1] = 'f'; d[2] = '\0';, then yes, that should return 0. d[2] = '/0' will assign something entirely different and your string won't be null terminated. At least not where you expect it to be - strcmp will probably head off into the weeds and start sucking mud.

#everyone: The '/0' typo was introduced by #Mark when he edited the original question. The original had the proper '\0'. Most of your answers (and assumptions about the OP) are misdirected.
#mekasperasky The following code correctly produces the value of 0. If you can compare it to your personal code and find the difference, you may have solved your own problem.
int main(int argc, char* argv[])
{
char d[20] = {0};
d[0] = 'i';
d[1] = 'f';
d[2] = '\0';
int value = strcmp(d,"if");
cout << value << endl;
return 0;
}

d[2] should be '\0', not '/0'.

The null value is indicated by:
d[2]='\0'
and not /0 as you wrote.

As other answers have mentioned, '/0' is not a null termination character, '\0' is.
You might expect that specifying more than one character in a character literal might generate an error; unfortunately (at least in this case) C allows 'multi-character' literal characters - but the exact behavior of multi-character literals is implementation defined (6.4.4.4/2 "Character constants"):
An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
So your '/0' 'character' ends up being some implementation defined int value that gets truncated when stored in d[2]. Your compiler might generate a warning for 'multi-character' literals, but that would probably also depend on the exact options you give the compiler.
For example, I get the following warning from GCC (I happen to have -Wall set):
C:\temp\test.cpp:6:14: warning: multi-character character constant
In my tests with MSVC and MinGW, the value of '/0' is 0x00002f30, so d[2] = '/0' ends up being equivalent to d[2] = '0'.

This works on every C compiler I have. I did not expect differently. It also works on Codepad.
Does this work on your C compiler?
#include <stdio.h>
#include <string.h>
char d[20];
int main(void) {
d[0]='i';
d[1]='f';
d[2]='\0';
printf("Value of strcmp=%i\n\n",strcmp(d,"if")); /* will print 0 */
return 0;
}

Related

What's the length of a string in C when I use the "\x00" to interrupt a string?

char buf1[1024] = "771675175\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
char buf2[1024] = "771675175\x00";
char buf3[1024] = "771675175\0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
char buf4[1024] = "771675175\0";
char buf5[1024] = "771675175";
buf5[9] = 0;
char buf6[1024] = "771675175";
buf6[9] = 0;
buf6[10] = "A";
printf("%d\n", strlen(buf1));
printf("%d\n", strlen(buf2));
printf("%d\n", strlen(buf3));
printf("%d\n", strlen(buf4));
printf("%d\n", strlen(buf5));
printf("%d\n", strlen(buf6));
if("\0" == "\x00"){
printf("YES!");
}
Output:
10
9
9
9
9
9
YES!
As shown above, I use the "\x00" to interrupt a string.
As far as I know, when the strlen() meet the "\x00", it will return the number of characters before the terminator, and does not include the "\x00".
But here, why is the length of the buf1 equal to 10?
As pointed out in the comments section, hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. All of the subsequent A characters are valid hexadecimal digits, so they are part of the escape sequence. Therefore, the result of the escape sequence does not fit in a char, so the result is unspecified.
You should change
char buf1[1024] = "771675175\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
to:
char buf1[1024] = "771675175\x00" "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
Also, strlen returns a value of type size_t. The correct printf format specifier for size_t is %zu, not %d. Even if %d works on your platform, it may fail on other platforms.
The following program will print the desired result of 9:
#include <stdio.h>
#include <string.h>
int main( void )
{
char buf1[1024] = "771675175\x00" "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
printf( "%zu\n", strlen(buf1) );
}
Also, it is worth nothing that the following line does not make sense:
if("\0" == "\x00")
In that if condition, you are comparing the addresses of two pointers, which point to string literals. It depends on the compiler whether it is storing both string literals in the same memory location. Some compilers may merge identical string literals into the same memory location, some may not. Normally, this is irrelevant to the programmer. Therefore, it does not make much sense to compare these memory addresses.
You probably wanted to write the following instead, which will compare the actual character values:
if( '\0' == '\x00' )
There is a big difference between a string literal and a character constant.

Array character as parameters in c [duplicate]

When should I use single quotes and double quotes in C or C++ programming?
In C and in C++ single quotes identify a single character, while double quotes create a string literal. 'a' is a single a character literal, while "a" is a string literal containing an 'a' and a null terminator (that is a 2 char array).
In C++ the type of a character literal is char, but note that in C, the type of a character literal is int, that is sizeof 'a' is 4 in an architecture where ints are 32bit (and CHAR_BIT is 8), while sizeof(char) is 1 everywhere.
Some compilers also implement an extension, that allows multi-character constants. The C99 standard says:
6.4.4.4p10: "The value of an integer character constant containing more
than one character (e.g., 'ab'), or
containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."
This could look like this, for instance:
const uint32_t png_ihdr = 'IHDR';
The resulting constant (in GCC, which implements this) has the value you get by taking each character and shifting it up, so that 'I' ends up in the most significant bits of the 32-bit value. Obviously, you shouldn't rely on this if you are writing platform independent code.
Single quotes are characters (char), double quotes are null-terminated strings (char *).
char c = 'x';
char *s = "Hello World";
'x' is an integer, representing the numerical value of the
letter x in the machine’s character set
"x" is an array of characters, two characters long,
consisting of ‘x’ followed by ‘\0’
I was poking around stuff like: int cc = 'cc'; It happens that it's basically a byte-wise copy to an integer. Hence the way to look at it is that 'cc' which is basically 2 c's are copied to lower 2 bytes of the integer cc. If you are looking for a trivia, then
printf("%d %d", 'c', 'cc'); would give:
99 25443
that's because 25443 = 99 + 256*99
So 'cc' is a multi-character constant and not a string.
Cheers
Single quotes are for a single character. Double quotes are for a string (array of characters). You can use single quotes to build up a string one character at a time, if you like.
char myChar = 'A';
char myString[] = "Hello Mum";
char myOtherString[] = { 'H','e','l','l','o','\0' };
single quote is for character;
double quote is for string.
In C, single-quotes such as 'a' indicate character constants whereas "a" is an array of characters, always terminated with the \0 character
Double quotes are for string literals, e.g.:
char str[] = "Hello world";
Single quotes are for single character literals, e.g.:
char c = 'x';
EDIT As David stated in another answer, the type of a character literal is int.
A single quote is used for character, while double quotes are used for strings.
For example...
printf("%c \n",'a');
printf("%s","Hello World");
Output
a
Hello World
If you used these in vice versa case and used a single quote for string and double quotes for a character, this will be the result:
printf("%c \n","a");
printf("%s",'Hello World');
output :
For the first line. You will get a garbage value or unexpected value or you may get an output like this:
�
While for the second statement, you will see nothing. One more thing, if you have more statements after this, they will also give you no result.
Note: PHP language gives you the flexibility to use single and double-quotes easily.
Use single quote with single char as:
char ch = 'a';
here 'a' is a char constant and is equal to the ASCII value of char a.
Use double quote with strings as:
char str[] = "foo";
here "foo" is a string literal.
Its okay to use "a" but its not okay to use 'foo'
Single quotes are denoting a char, double denote a string.
In Java, it is also the same.
While I'm sure this doesn't answer what the original asker asked, in case you end up here looking for single quote in literal integers like I have...
C++14 added the ability to add single quotes (') in the middle of number literals to add some visual grouping to the numbers.
constexpr int oneBillion = 1'000'000'000;
constexpr int binary = 0b1010'0101;
constexpr int hex = 0x12'34'5678;
constexpr double pi = 3.1415926535'8979323846'2643383279'5028841971'6939937510;
In C & C++ single quotes is known as a character ('a') whereas double quotes is know as a string ("Hello"). The difference is that a character can store anything but only one alphabet/number etc. A string can store anything.
But also remember that there is a difference between '1' and 1.
If you type
cout<<'1'<<endl<<1;
The output would be the same, but not in this case:
cout<<int('1')<<endl<<int(1);
This time the first line would be 48. As when you convert a character to an int it converts to its ascii and the ascii for '1' is 48.
Same, if you do:
string s="Hi";
s+=48; //This will add "1" to the string
s+="1"; This will also add "1" to the string
different way to declare a char / string
char char_simple = 'a'; // bytes 1 : -128 to 127 or 0 to 255
signed char char_signed = 'a'; // bytes 1: -128 to 127
unsigned char char_u = 'a'; // bytes 2: 0 to 255
// double quote is for string.
char string_simple[] = "myString";
char string_simple_2[] = {'m', 'S', 't', 'r', 'i', 'n', 'g'};
char string_fixed_size[8] = "myString";
char *string_pointer = "myString";
char string_poionter_2 = *"myString";
printf("char = %ld\n", sizeof(char_simple));
printf("char_signed = %ld\n", sizeof(char_signed));
printf("char_u = %ld\n", sizeof(char_u));
printf("string_simple[] = %ld\n", sizeof(string_simple));
printf("string_simple_2[] = %ld\n", sizeof(string_simple_2));
printf("string_fixed_size[8] = %ld\n", sizeof(string_fixed_size));
printf("*string_pointer = %ld\n", sizeof(string_pointer));
printf("string_poionter_2 = %ld\n", sizeof(string_poionter_2));

no code execution after while- or for-loop in C [duplicate]

This question already has answers here:
Comparing unsigned char and EOF
(6 answers)
Closed 5 years ago.
I’m learning C using Xcode 8 and the compiler doesn’t run any code after a while- or for-loop executes. is this a bug? how can I fix it?
In the example provided below printf("code executed after while-loop"); never executes
#include <stdio.h>
int getTheLine(char string[]);
int getTheLine(char string[]) {
char character;
int index;
index = 0;
while ((character = getchar()) >= EOF) {
string[index] = character;
++index;
}
printf("code executed after while-loop");
return index;
}
int main(int argc, const char * argv[]) {
char string[100];
int length = getTheLine(string);
printf("length %d\n", length);
return 0;
}
getchar returns an int not a char, and comparison with EOF should be done with the != operator instead of the >= operator.
...
int character; // int instead of char
int index;
index = 0;
while ((character = getchar()) != EOF) { // != instead of >=
...
It's the >= EOF, which will let the condition be always true. The reason is that a "valid" result of getchar() will be a positive integer, and a "non-valid" result like end-of-file will be EOF, which is negative (cf. getchar()):
EOF ... integer constant expression of type int and negative value
Hence, any valid result from getchar will be >EOF, while the end-of-file-result will be ==EOF, such that >= EOF will always match.
Write != EOF instead.
Note further that you do not terminate your string by the string-terminating-character '\0', such that using string like a string (e.g. in a printf("%s",string)) will yield undefined behaviour (crash or something else probably unwanted).
So write at least:
while ((character = getchar()) != EOF) {
string[index] = character;
++index;
}
string[index]='\0';
Then there is still the issue that you may write out of bounds, e.g. if one enters more then 100 characters in your example. But checking this is now beyond the actual question, which was about the infinite loop.
The symbolic constant EOF is an integer constant, of type int. It's (usually) defined as a macro as -1.
The problem is that the value -1 as an (32-bit) int has the value 0xffffffff and as a (8-bit) char the same value would be 0xff. Those two values are not equal. Which in turn means that your loop condition will never be false, leading to an infinite loop.
The solution to this problem is that all standard functions that reads characters returns them as an int. Which means your variable character needs to be of that type too.
Important note: It's a compiler implementation detail if plain char is a signed or an unsigned type. If it is signed then a comparison to an int would lead to sign extension when the char value is promoted in the comparison. That means a signed char with the value 0xff would be extended to the int value 0xffffffff. That means if char is signed then the comparison would work.
This means that your compile have char as unsigned char. So the unsigned char value 0xff after promotion to int will be 0x000000ff.
As for why the value -1 becomes 0xffffffff is because of how negative numbers are usually represented on computers, with something called two's complement.
You also have another couple of flaws in your code.
The first is that since the loop is infinite you will go way out of bounds of the string array, leading to undefined behavior (and a possible crash sooner or later). The solution to this is to add a condition to make sure that index never reaches 100 (in the specific case of your array, should really be passed as an argument).
The second problem is that if you intend to use the string array as an actual string, you need to terminate it. Strings in C are actually called null terminated strings. That terminator is the character '\0' (equal to integer 0), and need to be put at the end of every string you want to pass to a standard function handling such strings. Having this terminator means that an array of 100 characters only can have 99 characters in it, to be able to fit the terminator. This have implications to the solution to the above problem. As for how to add the terminator, simply do string[index] = '\0'; after the loop (if index is within bounds of course).

sizeof() showing different output

Here is a snippet of C99 code:
int main(void)
{
char c[] = "\0";
printf("%d %d\n", sizeof(c), strlen(c));
return 0;
}
The program is outputting 2 0. I do not understand why sizeof(c) implies 2 seeing as I defined c to be a string literal that is immediately NULL terminated. Can someone explain why this is the case? Can you also provide a (some) resource(s) where I can investigate this phenomenon further on my own time.
didn't understand why size of is showing 2.
A string literal has an implicit terminating null character, so the ch[] is actually \0\0, so the size is two. From section 6.4.5 String literals of the C99 standard (draft n1124), clause 5:
In translation phase 7, a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals
As for strlen(), it stops counting when it encounters the first null terminating character. The value returned is unrelated to the sizeof the array that is containing the string. In the case of ch[], zero will be returned as the first character in the array is a null terminator.
In C, "" means: give me a string and null terminate it for me.
For example arr[] = "A" is completely equivalent to arr[] = {'A', '\0'};
Thus "\0" means: give me a string containing a null termination, then null terminate it for me.
arr [] = "\0"" is equivalent to arr[] = {'\0', '\0'};
"\0" is not the same as "". String literals are nul-terminated, so the first is the same as the compound literal (char){ 0, 0 } whereas the second is just (char){ 0 }. strlen finds the first character to be zero, so assumes the string ends. That doesn't mean the data ends.
When you declare a string literal as :
char c[]="\0";
It already has a '\0' character at the end so the sizeof(c) gives 2 because your string literal is actually : \0\0.
strlen(c) still gives 0 because it stops at the first \0.
strlen measures to the first \0 and gives the count of characters before the \0, so the answer is zero
sizeof on a char x[] gives the amount of storage used in bytes which is two, including the explict \0 at the end of the string
Great question. Consider this ...
ubuntu#amrith:/tmp$ more x.c
#include <stdio.h>
#include <string.h>
int main() {
char c[16];
printf("%d %d\n",sizeof(c),strlen(c));
return 0;
}
ubuntu#amrith:/tmp$ ./x
16 0
ubuntu#amrith:/tmp$
Consider also this:
ubuntu#amrith:/tmp$ more x.c
#include <stdio.h>
#include <string.h>
int main() {
int c[16];
printf("%d\n",sizeof(c));
return 0;
}
ubuntu#amrith:/tmp$ ./x
64
ubuntu#amrith:/tmp$
When you initialize a variable as an array (which is effectively what c[] is), sizeof(c) will give you the allocated size of the array.
The string "\0" is the literal string \NUL\NUL which takes two bytes.
On the other hand, strlen() computes the string length which is the offset into the string of the first termination character and that turns out to be zero and hence you get 2, 0.

Warning: comparison is always true due to limited range of data type

I'm testing this function that's supposed to read input from the user but it throws me a segmentation fault
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_STRING_LENGTH 10
int
readinput(char *input)
{
int c;
int i=0;
while((c=getchar()) != EOF && c != '\n') //Here is where the warning occurs.
{
input[i]=c;
i++;
}
input[i]=0;
if(strlen(input)>0)
{
if(isalpha(input[0]) && input[1]=='-' && isalpha(input[2]) && strlen(input)==3)
return 0;
else if(!strcmp(input, "quit"))
return 1;
else if(!strncmp(input, "save ", 5))
return 2;
else if(!strcmp(input, "undo"))
return 3;
}
return -1;
}
int main()
{
char *str;
printf("write a string\n");
int nr=readinput(str);
printf("%d\n", nr);
printf("%s\n", str);
return 0;
}
I did notice the stupid error I made, but still, segmentation fault, why?
This is because EOF is defined (in my compiler) as -1 and char is unsigned byte. so it is always !=
c != '/n' is wrong
change it to
c != '\n'
c != '/n' should be c != '\n'
\ is an escape character which indicates, in the case where it is followed by n, a newline. /n will be treated as two distinct characters, which cannot properly be compared to a single char variable.
As for you segmentation fault, you'll need to allocate some space for str in your main function:
char* str = malloc(sizeof(char)*MAX_STRING_LENGTH);
or
char str[MAX_STRING_LENGTH];
but you'll also have to ensure you don't try to read a string that has more characters than your str array can hold.
It faults because you never allocated space for str and it points to a random location which causes readinput to try to store data in a place that doesn't exist.
The segmentation fault arises because you've passed an uninitialized pointer to the function readinput(). You need to do something like:
char str[4096];
int nr = readinput(str);
You should pass in a length of the array so that the called code can verify that it does not overflow its boundaries. Or you can live dangerously and decide that 4096 is big enough, which it probably will be until someone is trying to break your program deliberately.
The original compiler warning was because the multi-character constant '/n' has a value (of type int) which is outside the range of values that can be stored in a char, so when c is promoted to int, the != comparison with the (implementation-defined) value of '/n' is bound to be true. Hence the warning:
Warning: comparison is always true due to limited range of data type
All multi-character character constants have implementation-defined values. There are no portable multi-character character constants.
ISO/IEC 9899:2011 §6.4.4.4 Character constants
¶10 ... The value of an integer character constant containing more than one character (e.g.,
'ab'), or containing a character or escape sequence that does not map to a single-byte
execution character, is implementation-defined. ...

Resources