This question already has answers here:
String initialization with and without explicit trailing terminator
(4 answers)
Closed 8 years ago.
I have a few questions regarding string initialization and declaration in C.
Suppose if a I declare a string 's' of size 10 using
char s[10];
Q 1. Is it necessary that all the elements of 's' will be initialized to '\0' or is it just pure luck that I will find other elements to be '\0'?
Q 2. If I instead use malloc to setup a string like this
char *s = malloc(10 * sizeof(char));
Again is it necessary that all the elements will be initialized to '\0'?
Q 3. Further do I need to add an '\0' while declaring the string or not?
char s[10] = "abc";
OR is it has to be
char s[10] = "abc\0";
NOTE: If possible, please take a look at the second answer by Kevin here.
No — in general. In some contexts yes, though. Specifically, if the variable is a local variable and not static, then it is not initialized at all. If the variable is local and static, or if the variable is file scope and static, or if it is global, then it will be initialized to all bytes zero.
No. malloc() is not guaranteed to return zeroed memory. If you need it zeroed, use calloc() instead.
These comments apply to any type.
char s0[10]; // Initialized all bytes zero
static char s1[10]; // Initialized all bytes zero
void somefunc(void)
{
static char s2[10]; // Initialized all bytes zero
char s3[10]; // Not initialized to all bytes zero
char *s4 = malloc(10); // Not initialized to all bytes zero
char *s5 = calloc(10, 1); // Initialized all bytes zero
…code using s0..s5…
}
It is sufficient to use:
char s6[10] = "abc"; // 3 bytes non-zero plus 7 bytes zero
Writing this would achieve the same result because the size of the array is specified:
char s7[10] = "abc\0"; // 3 bytes non-zero plus 7 bytes zero
Writing these gives two arrays of different sizes:
char s8[] = "abc"; // sizeof(s8) == 4 – 1 null byte
char s9[] = "abc\0"; // sizeof(s9) == 5 – 2 null bytes
C automatically adds a trailing null byte.
First and foremost, your s is not a "string". Your s is a character array. The term string refers to the content of a character array. In order to qualify as a string that content must satisfy some requirements. A string is defined as a continuous sequence of characters terminated with a zero character.
Q1. If the array is declared with static storage duration it will begin its life with all zeros in it. In all other cases it will contain unpredictable garbage.
Q2. malloc does not initialize allocated memory. The memory contains unpredictable garbage. calloc allocates character array initialized with zeros.
Q3. What you have on the right-hand side of initialization is called string literal. String literal already includes a terminating zero character implicitly. There's no need to add it explicitly.
However, C language follows the all-or-nothing approach to initialization. If you initialize just a small portion of some aggregate object, the rest of that object is implicitly initialized with zeros. In your case that means that the rest of array s will be filled with zeros anyway all the way to the end. Consequently there's no difference between the end result your two initialization examples. Still, there's no point is specifying that zero character explicitly.
If you declare the string using char s[10]; or malloc, the contents will not be initialized to \0 or anything. It will contain garbage values. So if you need \0 in your string, you need to explicitly store that.
Further, if you do sonething like
char s[10] = "abc";
then, you dont need to add \0,
A note: If you use to calloc instead of malloc to allocate memory, the contents will be initialized to 0.
Q1. If you don't explicitly initialize a local variable then it can contain any values. Often the bytes will just happen to contain zeroes.
But static variables (declared outside any function or prefixed with the static keyword are guaranteed to be initialized to zeroes.
Q2. Again malloc does not clear them memory but it will often happen to be filled with zeroes. To explicitly get zero-filled memory use calloc().
Q3. You don't need to add \0 inside the double-quotes. The string "abc" means 4 bytes are created somewhere containing the 3 characters then a string-terminator (byte with value zero).
Related
In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.
char txt[20] = "Hello World!\0";
How many bytes are allocated by the above definition?
Considering one char occupies 1 byte, one int 2 byte.
Note that there is only one ", and \0 at the end.
How to calcultate many many bytes the above definition has occupied?
Statement char txt[20]="Hello World!\0" comprises actually two parts, a definition part and an initialization part. char txt[20], the definition part, tells the compiler to reserve 20 elements of size of character (in this case 20 bytes), regardless of the content with which you will initialize the array later on. The initialization part ="Hello World!\0" then "prefills" the reserved memory with the characters of literal Hello World!\0. Note that it is actually not necessary to write \0 explicitly in the string, since string literals are by itself terminated by the \0-character. So you should write char txt[20]="Hello World!". It is OK if the length of the string literal is smaller than the memory allocated; If the length of the string literal used for initializing exceeds the length of the array, you get at least a compiler warning.
Note, however, that if you write char txt[]="Hello World!", the length of the memory reserved will be exactly the length of the initial string literal.
Concerning array initialization, you might confer to cppreference.com. Concerning the discussion on "variable definition" versus "variable declaration", I find this SO answer very helpful.
Anything which goes inside the double quotes in C is considered as string with null termination in the end. You don't have to add \0 in the end.
You can use strlen(arr)+1to get the size of char. Here +1 because strlen doesn't count null termination.
I might be a something basic or whatever I am not able to still figure out what will happen
for eg
if I write
char temp[3]="";
or
char temp[3]={0};
or
char temp[3]={};
or
char temp;
What will be the initialization In all four cases.
And if 0 is stored is it stored as ascii value?
And if NULL then also is the ascii value stored.
If some elements are not declared do which value they have
garbage value or something specified
1)
char temp[3]="";
and
char temp[3]={0};
are equivalent. The array temp will be filled with 3 zeros. It's as if you had: char temp[3] = {0, 0, 0};.
2)
char temp[3]={};
is illegal in C. Empty initializers are not allowed in C.
3)
char temp;
This, depends on where temp is declared.
If it's in a block scope then temp will be uninitialized and its value is indeterminate.
If it's at file scope then temp will be initialized to 0, provided there are no other definitions for it1. It's as if you had: char temp = 0;
1 This may sound odd. But C has a concept called "tentative definitions". See: About Tentative definition.
The first three are equivalent and the array will be initialized to zero.
The last case is different, because you don't initialize the single character. How it's initialized depends on where you define the variable. If it's a global variable it will be zero-initialized. If it's a local variable then it will not be initialized at all and have an indeterminate value.
And zero is zero, i.e. 0 and not '0'.
Lastly, NULL is for pointers, not for non-pointer values. There is some confusion since the string terminator character '\0' (which is equal to 0) is also called the null character. The null character and a null pointer are two different things semantically, even if they can have the same actual value.
char temp[3]={};
isn't correct C.
char temp[3]={0};
initializes temp[0] to 0 and the rest is initialized as if they were default-initialized global variables, which for chars means that the rest will be 0 also.
char temp[3]="";
is initialization from a (empty) string which behaves the same as if you broke down the string into character literals and assigned those.
For an empty string, the broken down version would be { '\0' }, which is the same as {0}, which makes it equivalent to the case above it.
char temp; will be default initialized (for chars == zeroed) if it's a global that isn't followed by a nontentative definition or it will have undefined contents if it's an automatic variable.
Say I do initialize an array like this:
char a[]="test";
What's the purpose of this? We know that the content might immediately get changed, as it is not allocated, and thus why would someone initialize the array like this?
To clarify, this code is wrong for the reasons stated by the OP:
char* a;
strcpy(a, "test");
As noted by other responses, the syntax "char a[] = "test"" does not actually do this. The actual effect is more like this:
char a[5];
strcpy(a, "test");
The first statement allocates a fixed-size static character array on the local stack, and the second initializes the data in it. The size is determined from the length of the string literal. Like all stack variables, the array is automatically deallocated on exiting the function scope.
The purpose of this is to allocate five bytes on the stack or the static data segment (depending on where this snippet occurs), then set those bytes to the array {'t','e','s','t','\0'}.
This syntax allocates an array of five characters on the stack, equivalent to this:
char a[5] = "test";
The elements of the array are initialized to the characters in the string given as an initializer. The size of the array is determined to fit the size of the initializer.
It is allocated. That code is equivalent to
char a[5]="test";
When you leave the number out, the compiler simply calculates the length of the character-array for you by counting the characters in the literal string. It then adds 1 to the length in order to include the necessary terminating nul '\0'. Hence, the length of the array is 5 while the length of the string is 4.
The array is allocated; its size is inferred from the string literal being used to initialize it (5 chars total).
Had you written
char *a = "test";
then all that would get allocated would be a pointer variable, not an array (the string literal "test" lives in memory such that it's allocated at program startup and held until the program exits).
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Why is:
char *ptr = "Hello!"
different than:
char ptr[] = "Hello!"
Specifically, I don't see why you can use (*ptr)++ to change the value of 'H' in the array, but not the pointer.
Thanks!
You can (in general) use the expression (*ptr)++ to change the value that ptr points to when ptr is a pointer and not an array (ie., if ptr is declared as char* ptr).
However, in your first example:
char *ptr = "Hello!"
ptr is pointing to a literal string, and literal strings are not permitted to be modified (they may actually be stored in memory area which are not writable, such as ROM or memory pages marked as read-only).
In your second example,
char ptr[] = "Hello!";
The array is declared and the initialization actually copies the data in the string literal into the allocated array memory. That array memory is modifiable, so (*ptr)++ works.
Note: for your second declaration, the ptr identifier itself is an array identifier, not a pointer and is not an 'lvalue' so it can't be modified (even though it converts readily to a pointer in most situations). For example, the expression ++ptr would be invalid. I think this is the point that some other answers are trying to make.
When pointing to a string literal, you should not declare the chars to be modifiable, and some compilers will warn you for this:
char *ptr = "Hello!" /* WRONG, missing const! */
The reason is as noted by others that string literals may be stored in an immutable part of the program's memory.
The correct "annotation" for you is to make sure you have a pointer to constant char:
const char *ptr = "Hello!"
And now you see directly that you can't modify the text stored at the pointer.
Arrays automatically allocate space and they can't be relocated or resized while pointers are explicitly assigned to point to allocated space and can be relocated.
Array names are read only!
If You use a string literal "Hello!", the literal itself becomes an array of 7 characters and gets stored somewhere in a data memory. That memory may be read only.
The statement
char *ptr = "Hello!";
defines a pointer to char and initializes it, by storing the address of the beginning of the literal (that array of 7 characters mentioned earlier) in it. Changing contents of the memory pointed to by ptr is illegal.
The statement
char ptr[] = "Hello!";
defines a char array (char ptr[7]) and initializes it, by copying characters from the literal to the array. The array can be modified.
in C strings are arrays of characters.
A pointer is a variable that contains the memory location of another variable.
An array is a set of ordered data items.
when you put (*ptr)++ you are getting Segmentation Fault with the pointer.
Maybe you are adding 1 to the whole string (with the pointer), instead of adding 1 to the first character of the variable (with the array).