Why one string writes two char in C? [duplicate] - c

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.
Question 1:
Why is there a difference in the outcomes of source code 1 and source code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[2]="Hi";
printf("%d\n", strlen(c)); //returns 3 (not 2!?)
return 0;
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[3]="Hi";
printf("%d\n", strlen(c)); //returns 2 (not 3!?)
return 0;
}
Question 2:
How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
#define name "Mick" //what is the size? Is there a \0?
Question 3:
Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?
It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).
Detailed answers are appreciated. But keeping your answer brief helps my understanding :)
Thank you in advance!

Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.
Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.
char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);
Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.
Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.

Question 1:
Why is there a difference in the outcomes of source code 1 and source
code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main()
{
char c[2]="Hi";
printf("%d", strlen(c)); //returns 3 (not 2!?)
getchar();
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main()
{
char c[3]="Hi";
printf("%d", strlen(c)); //returns 2 (not 3!?)
getchar();
}
answer:
Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.
Question 2:
How is a string variable different from a char array? How to declare
them with the minimum required index numbers allowing \0 to be stored
if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
answer
Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.
#define name "Mick" //what is the size? Is there a \0?
This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.
Question 3:
Does the terminating null ONLY follow strings but not char arrays? So
the actual value of the string "Hi" is [H][i][\0] and the actual value
of the char array "Hi" is [H][i]?
Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how
this is done, using gets(c) maybe?). So where is the \0 stored? Is it
stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be
appended with a \0 to become c[3] which is [H][i][\0]?
In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.
It is quiet confusing that sometimes there is a \0 following the
string/char array and causes trouble when I compare two variables by
if (c1==c2) as it most likely returns FALSE (0).
If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].
[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.

Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.
The remaining answers should follow easily from this fact.
To autosize a char array to match the size of a string literal at initialization, simply specify no array size:
char c[] = "This will automatically size the c array (including the NUL).";
Note that you cannot compare char arrays with the == operator. You have to use
if (strcmp(c1, c2) == 0) {
/* Equal. */
} else {
/* Not equal. */
}

strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen().
In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.
How to declare them with the minimum required index numbers allowing \0 to be stored if any ?
When you are not sure about the size of char array , Do like this :
char c1[] = "Mike"; // strlen = 4
char c2[] = "Omkant" // strlen = 6
NOTE :
EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().
strlen() returns only number of charaters
sizeof gives number of characters plus one more (for \0 character).
So sizeof always gives exactly 1 more than the number returned by strlen().

Related

Inner workings of c language as it pertains to character arrays

Iā€™m wondering why this code prints a zero.
int x = 10;
char newChar[x];
printf(ā€œ%d\nā€,strlen(newChar));
strlen() computes the amount of characters in a string. It counts characters up to the null character \0, which terminates a string. It returns the amount of characters read up to but not including that null character.
In your provided example the char array newChar is uninitialized; it has no string in it.
Thus, the output shall be 0.
The behavior is per standard undefined, when using uninitialized objects. In production code, Always initialize objects or, in the case of VLAs like newChar is, assign its elements with "placeholder" values or use strcpy (with VLAS of char) to copy an empty-string ("") into it, before using them in any manner.
Technically, it is possible that strlen() will return a value different than 0 on a different machine or already at a different execution of the program just because the behavior is undefined -> one or more elements could have a non-zero value followed by an element with a 0 value - making the pseudo-string effect perfect.
Maybe you wanted to calculate the number of elements in newChar.
Therefore, sizeof instead of strlen() would be appropriate:
int x = 10;
char newChar[x];
printf("%zu\n", sizeof(newChar) / sizeof(*newChar));
Online example
#include <stdio.h>
int main(void)
{
int x = 10;
char newChar[x];
printf("The number of elements in newChar is: %zu\n", sizeof(newChar) / sizeof(*newChar));
}
Output:
The number of elements in newChar is: 10

CS50 IDE: printf returns extra characters

I am having problems with the printf function in the CS50 IDE. When I am using printf to print out a string (salt in this code), extra characters are being output that were not present in the original argument (argv).
Posted below is my code. Any help would be appreciated. Thank you.
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
int main(int argc, string argv[])
{
// ensuring that only 1 command-line argument is inputted
if (argc != 2)
{
return 1;
}
char salt[2];
for (int i = 0; i < 2; i++)
{
char c = argv[1][i];
salt[i] = c;
}
printf("the first 2 characters of the argument is %s\n", salt);
}
You are missing a string terminator in salt.
Somehow the computer needs to know where your string ends in memory. It does so by reading until it encounters a NUL byte, which is a byte with value zero.
Your array salt has exactly 2 bytes of space, and after them, random garbage exists which just happens to be next in memory after your array. Since you don't have a string terminator, the computer will read this garbage as well until it encounters a NUL byte.
All you need to do is include such a byte in your array, like so:
char salt[3] = {0};
This will make salt one byte longer, and the {0} is a shorthand for {0, 0, 0} which will initialize the contents of the array with all zerores. (Alternatively, you could use char salt[3]; and later manually set the last byte to zero using salt[2] = 0;.)
In your case, salt is at least one element shy of being a string, unless the argv[1] is only one element, it does not contain a null-terminator.
You need to allocate space to hold the null-terminator and actually put one there to be able to use salt as string, as expected for the argument to %s conversion specifier in case of printf().
Otherwise, the string related functions and operations, which essentially rely on the fact that there will be a null terminator to mark the end of the char array (i.e., mark the end of valid memory that can be accessed), will try to access past the valid memory which causes undefined behavior. Once you hit UB, nothing is guaranteed.
So, considering the fact that you want to use
"....the first 2 characters of the argument....."
you need to make salt a 3-element char array, and make sure that salt[2] contains a null-terminator, like '\0'.

Why is the entirety of this first array being added onto the second, on top of the two values (from the first) that I assign it?

I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC
salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};
Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.

Strcpy: behaving more like 'strcut'

char a[3], b[3];
strcpy(a,"abc");
printf("a1 = %s\n", a);
strcpy(b,a);
printf("a2 = %s\n", a);
printf("b = %s\n", b);
From how I understand strcpy to work the output would be:
a1 = abc
a2 = abc
b = abc
Instead I obtain
a1 = abc
a2 =
b = abc
Why when I call strcpy the second time does it (apparently) erase the contents of a?
Thanks
This is a buffer overflow problem - your a and b are too short ā€“ they don't have room for the null terminator. What is happening is a is just after b in memory, so when strcpy(b,a) executes, the null terminator stored at the end of b is actually the same memory location as the first character of a. This makes a suddenly an empty string.
For starters, make the lengths of the arrays 4 instead of 3. This is okay in sandbox/play/learning mode, but consider in production code:
Use safer string functions (e.g. strncpy) to avoid buffer overflows.
Use character arrays/buffers that support variable size or pre-calculation of the size required to fit your data.
Since you arrays are too small and do not have room for the null terminator you are most likely overwriting a when you try to copy a to b since the strcpy does not know when to stop copying. This declaration would fix the problem for this particular program:
char a[4], b[4];
In the general case you need to ensure that your destination has enough to space to accommodate the source as well as the null terminator.
This example gives you a better idea of what is going on, this is just for demonstration purposes and you should use code like this for anything else but to learn. This works for me in ideone and you can see if live here but may not work properly in other compilers since we are invoking undefined behavior:
#include <stdio.h>
#include <string.h>
int main()
{
char a[3], b[4];
// a will have a lower address in memory than b
printf("%p %p\n", a, b);
// "abc" is a null terminated literal use a size of 4 to force a copy of null
strncpy(a,"abc",4);
// printf will not overrun buffer since we terminated it
printf("a2 = %s\n", a);
// explicitly only copy 3 bytes
strncpy(b,a,3);
// manually null terminate b
b[3] = '\0' ;
// So we can prove we are seeing b's contents
b[0] = 'z' ;
// This will overrun into b now since b[0] is no longer null
printf("a2 = %s\n", a);
printf("b = %s\n", b);
}
The first strcpy(a,"abc") is already wrong. Don't get confused with char array versus C-String... a C-String is a always a char array, but a char array is NOT always a C-String.
A C-String must have a '\0' char in the end. So when you do strcpy "abc" -> a[3] you are actually moving the following 4 bytes to your array { 'a', 'b', 'c', '\0' }
Because a and b were created together, b is right ahead a. When you print out a it goes fine IN THIS CASE because printf() still can find a '\0' to identify as the end of string, despite it's wrong... because your '\0' char is the the area reserved to b.
The following problems are all related to the same thing...
The solution is: the buffer to your C-String must the the maximum size of your string + 1, so you can guarantee you will have room for the '\0' char. If you need for more details, google for "C-String" or "null-terminated string".
You've made a very common beginners mistake. In C, there is no string primitive; when we talk about strings, we're really talking about null-terminated character arrays (or buffers, I don't care what nomenclature you like). so your char[3] will hold a string of 2 letters, plus the null terminator. Another subtle issue is that in memory, they will be laid out on the stack as a[0]a[1]a[2]b[0]b[1]b[2]--and this is the reason you didn't crash when you deserved to. See "abc" is REALLY "abc\0", so a[3] == c and b[0] == \0, and since the behavior is undefined when strings overlap (as these do), I suspect that your implementation just copied chars until it copies a \0. That being the case, strcpy(a, b) will result in a being an empty string.
On the other hand, your program works as it was written to. What you wrote isn't what you meant :)

About string length, terminating NUL, etc

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.
Question 1:
Why is there a difference in the outcomes of source code 1 and source code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[2]="Hi";
printf("%d\n", strlen(c)); //returns 3 (not 2!?)
return 0;
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main(void)
{
char c[3]="Hi";
printf("%d\n", strlen(c)); //returns 2 (not 3!?)
return 0;
}
Question 2:
How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
#define name "Mick" //what is the size? Is there a \0?
Question 3:
Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?
It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).
Detailed answers are appreciated. But keeping your answer brief helps my understanding :)
Thank you in advance!
Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.
Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.
char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);
Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.
Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.
Question 1:
Why is there a difference in the outcomes of source code 1 and source
code 2?
Source code 1:
#include <stdio.h>
#include <string.h>
int main()
{
char c[2]="Hi";
printf("%d", strlen(c)); //returns 3 (not 2!?)
getchar();
}
Source code 2:
#include <stdio.h>
#include <string.h>
int main()
{
char c[3]="Hi";
printf("%d", strlen(c)); //returns 2 (not 3!?)
getchar();
}
answer:
Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.
Question 2:
How is a string variable different from a char array? How to declare
them with the minimum required index numbers allowing \0 to be stored
if any (please read the codes below)?
char name[index] = "Mick"; //should index be 4 or 5?
char name[index] = {'M', 'i', 'c', 'k'}; //should index be 4 or 5?
answer
Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.
#define name "Mick" //what is the size? Is there a \0?
This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.
Question 3:
Does the terminating null ONLY follow strings but not char arrays? So
the actual value of the string "Hi" is [H][i][\0] and the actual value
of the char array "Hi" is [H][i]?
Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.
Question 4:
Suppose c[2] is going to store "Hi" followed by a \0 (not sure how
this is done, using gets(c) maybe?). So where is the \0 stored? Is it
stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be
appended with a \0 to become c[3] which is [H][i][\0]?
In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.
It is quiet confusing that sometimes there is a \0 following the
string/char array and causes trouble when I compare two variables by
if (c1==c2) as it most likely returns FALSE (0).
If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].
[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.
Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.
The remaining answers should follow easily from this fact.
To autosize a char array to match the size of a string literal at initialization, simply specify no array size:
char c[] = "This will automatically size the c array (including the NUL).";
Note that you cannot compare char arrays with the == operator. You have to use
if (strcmp(c1, c2) == 0) {
/* Equal. */
} else {
/* Not equal. */
}
strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen().
In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.
How to declare them with the minimum required index numbers allowing \0 to be stored if any ?
When you are not sure about the size of char array , Do like this :
char c1[] = "Mike"; // strlen = 4
char c2[] = "Omkant" // strlen = 6
NOTE :
EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().
strlen() returns only number of charaters
sizeof gives number of characters plus one more (for \0 character).
So sizeof always gives exactly 1 more than the number returned by strlen().

Resources