How is an empty string stored in a char array? - c

If I have an array declared as
char arr[1] = "";
What is actually stored in memory? What will a[0] be?

Strings are null-terminated. An empty string contains one element, the null-terminator itself, i.e, '\0'.
char arr[1] = "";
is equivalent to:
char arr[1] = {'\0'};
You can imagine how it's stored in the memory from this.

C-strings are zero-terminated. Thus, "abc" is represented as { 'a', 'b', 'c', 0 }.
Empty strings thus just have the zero.
This is also the reason why a string must always be allocated to be one char larger than the maximum possible length.

arr[0] = 0x00;
however, if you did not assign any value like
char arr[1];
then arr[0] = garbage value

a[0] is the null character, which can be referred to as '\0' or 0.
A string is, by definition, "a contiguous sequence of characters terminated by and including the first null character". For an empty string, the terminating null character is the first one (at index 0).

It will pique more if the array is declared as char arr[] = "";
In this case the sizeof(arr) is 1 and strlen(arr) is 0 .
But still self analysis can be done by adding print like this printf("%d", arr[0]); So that you can understand by yourself.
string is a sequence of characters, in your case there is no character is present inside "". So it stores only '\0' character in arr[0].

C string is end with NULL, so the empty string "" actually is "\0", Compiler help do this,
so strlen("") equal 0 but sizeof("") equal to 1.

Related

The null character at the end of string literal

I wrote two versions of codes to practice how to make a character array, and I was expecting the result to be the same.
version 1:
int main(void)
{
char a[7] = "and";
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
version 2:
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
However, here is the result I got:
version 1:
size: 7 length: 3
version 2:
size: 7 length: 4
As far as I know, the string ends with null character, and null character in a string literal is implicit, but why did it disappear? Why didn't it be included as the last element as length in Version 1 shows 3?
In fact strlen is supposed to exclude the null character, so the output 3 is correct. The problem with the second version is that a[7] is not initialized so its values may be arbitrary. It just so happens in this case that the 5th value is 0 and the 0th is not, thus the output 4. Note that in the second version you use wrong indices - indexing of arrays starts from 0, not from 1.
If you want to make this work in the second version, re-write it like so:
int main(void)
{
char a[7] = {0};
a[0] = 'a';
a[1] = 'n';
a[2] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
This initializes the first value in a to 0 explicitly and implicitly makes all other values zero too.
In the second case
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
you are lucky that
a[0] did not contain a 0 (i.e, '\0' or null) : you would have seen a value 0 as length.
a[4] actually had a null value, otherwise, your computer might have been burnt!!
In other words, for a local variable with automatic storage, if left uninitialized, the values are indeterminate. There's no guarantee of a 0-filling (which acts as null terminator), so using the array as a string (ex: argument to strlen()) will likely have an effect of accessing out of bound memory and invoke undefined behavior.
sizeof measure the memory size of something. strlen calculates the length of a c-string (length defined as the length of the sequence of characters excluding the nul terminating one). Here your c-string is shorter than the memory used to store it.
sizeof is C-operator evaluated as compile-time.
strlen is a library function, called at runtime.
strlen(3)
DESCRIPTION
The strlen() function computes the length of the string s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character.
Beware that your second example is undefined as the character array is not initialized, strlen may overflow... You have no guarantee that non initialized chars are set to 0.
As per strlen description:
size_t strlen(const char *str);
Returns the length of the given null-terminated byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character.
The behavior is undefined if str is not a pointer to a null-terminated byte string.
Since in the second version, a is not a null-terminated byte string, the behavior is undefined.
Note that assigning individual character literals to a char array does not make it a string literal, to create a properly null-terminated char array using that kind of assignment you need to do it yourself, starting at index 0:
a[0] = 'a';
//...
a[3] = '\0';

Assignment after initialization to specific index in an array

After assigning 26th element, when printed, still "Computer" is printed out in spite I assigned a character to 26th index. I expect something like this: "Computer K "
What is the reason?
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
printf("%s\n", m1); /*prints out "Computer"*/
printf("%c", m1[26]); /*prints "K"*/
}
At 8th index of that string the \0 character is found and %s prints only till it finds a \0 (the end of string, marked by \0) - at 26th the character k is there but it will not be printed as \0 is found before that.
char s[100] = "Computer";
is basically the same as
char s[100] = { 'C', 'o', 'm', 'p', 'u','t','e','r', '\0'};
Since printf stops when the string is 0-terminated it won't print character 26
Whenever you partially initialize an array, the remaining elements are filled with zeroes. (This is a rule in the C standard, C17 6.7.9 ยง19.)
Therefore char m1[40] = "Computer"; ends up in memory like this:
[0] = 'C'
[1] = 'o'
...
[7] = 'r'
[8] = '\0' // the null terminator you automatically get by using the " " syntax
[9] = 0 // everything to zero from here on
...
[39] = 0
Now of course \0 and 0 mean the same thing, the value 0. Either will be interpreted as a null terminator.
If you go ahead and overwrite index 26 and then print the array as a string, it will still only print until it encounters the first null terminator at index 8.
If you do like this however:
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); // prints out "Computer"
m1[8] = 'K';
printf("%s\n", m1); // prints out "ComputerK"
}
You overwrite the null terminator, and the next zero that happened to be in the array is treated as null terminator instead. This code only works because we partially initialized the array, so we know there are more zeroes trailing.
Had you instead written
int main()
{
char m1[40];
strcpy(m1, "Computer");
This is not initialization but run-time assignment. strcpy would only set index 0 to 8 ("Computer" with null term at index 8). Remaining elements would be left uninitialized to garbage values, and writing m1[8] = 'K' would destroy the string, as it would then no longer be reliably null terminated. You would get undefined behavior when trying to print it: something like garbage output or a program crash.
In C strings are 0-terminated.
Your initialization fills all array elements after the 'r' with 0.
If you place a non-0 character in any random field of the array, this does not change anything in the fields before or after that element.
This means your string is still 0-terminated right after the 'r'.
How should any function know that after that string some other string might follow?
That's because after "Computer" there's a null terminator (\0) in your array. If you add a character after this \0, it won't be printed because printf() stops printing when it encounters a null terminator.
Just as an addition to the other users answers - you should try to answer your question by being more proactive in your learning. It is enough to write a simple program to understand what is happening.
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
for(size_t index = 0; index < 40; index++)
{
printf("m1[%zu] = 0x%hhx ('%c')\n", index, (unsigned char)m1[index], (m1[index] >=32) ? m1[index] : ' ');
}
}

Why is the entirety of this first array being added onto the second, on top of the two values (from the first) that I assign it?

I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC
salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};
Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.

Sizeof(char[]) in C

Consider this code:
char name[]="123";
char name1[]="1234";
And this result
The size of name (char[]):4
The size of name1 (char[]):5
Why the size of char[] is always plus one?
Note the difference between sizeof and strlen. The first is an operator that gives the size of the whole data item. The second is a function that returns the length of the string, which will be less than its sizeof (unless you've managed to get string overflow), depending how much of its allocated space is actually used.
In your example
char name[]="123";
sizeof(name) is 4, because of the terminating '\0', and strlen(name) is 3.
But in this example:
char str[20] = "abc";
sizeof(str) is 20, and strlen(str) is 3.
As Michael pointed out in the comments the strings are terminated by a zero. So in memory the first string will look like this
"123\0"
where \0 is a single char and has the ASCII value 0. Then the above string has size 4.
If you had not this terminating character, how would one know, where the string (or char[] for that matter) ends? Well, indeed one other way is to store the length somewhere. Some languages do that. C doesn't.
In C, strings are stored as arrays of chars. With a recognised terminating character ('\0' or just 0) you can pass a pointer to the string, with no need for any further meta-data. When processing a string, you read chars from the memory pointed at by the pointer until you hit the terminating value.
As your array initialisation is using a string literal:
char name[]="123";
is equivalent to:
char name[]={'1','2','3',0};
If you want your array to be of size 3 (without the terminating character as you are not storing a string, you will want to use:
char name[]={'1','2','3'};
or
char name[3]="123";
(thanks alk)
which will do as you were expecting.
Because there is a null character that is attached to the end of string in C.
Like here in your case
name[0] = '1'
name[1] = '2'
name[2] = '3'
name[3] = '\0'
name1[0] = '1'
name1[1] = '2'
name1[2] = '3'
name1[3] = '4'
name1[4] = '\0'
A String in C (and in, probably, every programming language - behind the scenes) is an array of characters which is terminated by \0 with the ASCII value of 0.
When assigning: char arr[] = "1234";, you assign a string literal, which is, by default, null-terminated (\0 is also called null) as you can see here.
To avoid a null (assuming you want just an array of chars and not a string), you can declare it the following way char arr[] = {'1', '2', '3', '4'}; and the program will behave as you wish (sizeof(arr) would be 4).
name = {'1','2','3','\0'};
name1 = {'1','2','3','4','\0'};
So
sizeof(name) = 4;
sizeof(name1) = 5;
sizeof returns the size of the object and in this case the object is an array and it is defined that your array is 4 bytes long in first case and 5 bytes in second case.
In C, string literals have a null terminating character added to them.
Your strings,
char name[]="123";
char name1[]="1234";
look more like:
char name[]="123\0";
char name1[]="1234\0";
Hence, the size is always plus one. Keep in mind when reading strings from files or from whatever source, the variable where you store your string, should always have extra space for the null terminating character.
For example if you are expected to read string, whose maximum size is 100, your buffer variable, should have size of 101.
Every string is terminated with the char nullbyte '\0' which add 1 to your length.

Printing three strings with identical content gives different results

#include "stdio.h"
void main()
{
char firstName[1] = "1";
char middleName[1] = "1";
char lastName[1] = "1";
printf("%p\t%s\n",firstName,firstName);
printf("%p\t%s\n",middleName,middleName);
printf("%p\t%s\n",lastName,lastName);
}
I compile this code use the gcc 4.8.2, what is confusing me is why it print:
>
root#ubuntu:~# ./main
0x7fff7124273d 111
0x7fff7124273e 11
0x7fff7124273f 1
I think it should print:
0x7fff7124273d 1
0x7fff7124273e 1
0x7fff7124273f 1
Can you help me?
char firstName[1] = "1";
It's legal to initialize a char array like this, but it's not a string, because it's not null-terminated.
"%s" in printf expects a string, so what you are doing is undefined behavior.
My guess is, the compiler puts the variables together, and what byte after them happens to be 0, that can explain what happened. But again, it's undefined behavior, anything could happen.
'1' '1' '1' 0
^ ^ ^
firstName | |
middleName |
lastName
Because size of array is 1 and you are assigning the array of length 2 (string literal also have null character \0 at the end). Hence, string pointed by the pointer may not be NULL terminated string. You need array of size 2.
char firstName[2] = "1";
char middleName[2] = "1";
char lastName[2] = "1";
or
char firstName[] = "1";
char middleName[] = "1";
char lastName[] = "1";
Also, do not use void main in C. Use int main.
"1" is actually two bytes in size - '1', '\0' - you're forgetting that C strings are null-terminated. The null bytes are getting trashed by the initialization. Your arrays need to be big enough to contain all the data in the initializer to avoid this.
Remember, C-style strings are null-terminated arrays, meaning that there should be '\0' after the string. Notes that might help you:
char firstName[2] = "1"; - Adds '\0' by itself, note the 2 instead of 1.
char firstName[] = {'1'} - Does not add '\0'.
char firstName[2] = {'1'} - adds '\0'.
You're getting this output because probably the chars are put together, this is undefined behavior.
in your code
printf("%p\t%s\n",firstName,firstName);
the first thing getting printed is the base address of the array firstName,
the second thing is actually undefined behaviour. for any character array to be a string it must have null character \0 at the end. your array is only 1 character long., and containing '1'. so you acnnot use %s to print that.
instead of %s use %c to print the character, like \
printf("%p\t%c\n",firstName,firstName);
In C, strings are NUL-terminated, i.e. the string "1" is the characters {'1', 0}. You have not allowed enough room for the terminator, so your strings are truncated and printf doesn't know where they end.
It would be better to define them as
char firstName[2] = "1";
and best to do
char firstName[] = "1";
so the compiler calculates the right amount of memory for you if you should ever deal with some with a first name which is longer than 1 character.

Resources