Related
Strings can be initialized with a string literal
char word1[] = "abc";
or as a char array with a null terminator.
char word2[] = {'a', 'b', 'c', '\0'};
Instead of writing word1[], word1 can also be written with a pointer notation
char *word1 = "abc";
However, when trying to write word2 with a pointer notation
char *word2 = {'a', 'b', 'c', '\0'};
it shows me a bunch of warnings, such as
warning: excess elements in scalar initializer char
*word2 = {'a', 'b', 'c', '\0'};
and when I run the program, I get Segmentation fault (core dumped).
Why is that? Why can you write char *word = "abc" but not char *word = {'a', 'b', 'c', '\0'} ?
Why can you initialize a string pointer as a string literal, but not as an array?
Because {'a', 'b', 'c', '\0'} is not an array; it is a list of values to put in the thing being initialized.
The syntax {'a', 'b', 'c', '\0'} does not stand for an array in C. People see it being used to initialize arrays, but, when used in that way, it is just a list of values. It could also be used to initialize a structure, because it is just listing values to put into the thing being initialized. It is not, by itself, an array.
In char *word2 = {'a', 'b', 'c', '\0'};, it does not make sense to initialize word2 with the values 'a', 'b', 'c', and '\0'. It is just one pointer and should be initialized with one value. Giving a list of four values to initialize one thing does not make sense.
In char *word2 = "abc";, "abc" is not a list of values. It is a string literal. A string literal defines a static array that is filled with the characters of the string. And then the string literal is automatically converted to a pointer to its first element, and it is this pointer that is used to initialize word2.
So char *word2 = "abc"; does two things: The string literal defines an array, and the initialization sets word2 to point to the first element of that array. In contrast, in char *word2 = {'a', 'b', 'c', '\0'};, there is nothing to define an array; the list of values is just a list of values.
Comparing this to array initializations, in char word2[] = {'a', 'b', 'c', '\0'};, the array is initialized with a list of values, which is fine. However, in char word1[] = "abc";, something special happens. C 2018 6.7.9 14 says we can initialize an array of character type with a string literal, and the characters of the string will be used to initialize the elements of the array.
There's no fundamental reason for this -- it's just the way the language was originaly defined.
The basic syntax for array initialization is
type array[] = {value, value, value};
The basic syntax for pointer initialization is
type *pointer = value;
But then we have string literals. And it turns out that, deep down inside, the compiler does two almost completely different things with string literals.
If you say
char array[] = "string";
the compiler treats it just about exactly as if you had said
char array[] = { 's', 't', 'r', 'i', 'n', 'g', '\0' };
But if you say
char *p = "string";
the compiler does something quite different. It quietly creates an array for you, containing the string, more or less as if you had written
char __hidden_unnamed_array[] = "string";
char *p = __hidden_unnamed_array;
But the point -- the answer to your question -- is that the compiler does this special thing only for string literals. In the original definition of C, at least, there was no way to use the {value, value, value} syntax to create a hidden, unnamed array that you could do something else with. The {value, value, value} syntax was only defined as working as the direct initializer for an explicitly-declared array.
As #pmg mentions in a comment, however, newer versions of C have a new syntax, the compound literal, which does let you, basically, "use the {value, value, value} syntax to create a hidden, unnamed array to do something else with". So you can in fact write
char *word2 = (char[]){'a', 'b', 'c', '\0'};
and this works just fine. It works in other contexts, too: for example, you can say things like
printf("%s\n", (char[]){'d', 'e', 'f', '\0'});
Going back to a side question you asked: when you wrote
char *word2 = {'a', 'b', 'c', '\0'};
the compiler said to itself, "Wait a minute, word2 is one thing, but the initializer has four things. So I'll throw away three, and warn the programmer that I'm doing so." It then did the equivalent of
char *word2 = {'a'};
and if you later tried something like
printf("%s", word2);
you got a crash when printf tried to access address 0x00000061.
In general, the type of the initializer must match the type of what is being initialized.
This works:
char *word1 = "abc";
Because a string constant has type array of char and such an array decays to type char * when used in an expression or initialization, so this matches the declared type.
This works:
char word2[] = {'a', 'b', 'c', '\0'};
Because an array of char is being initialized with an initializer list of characters (technically they have type int but are converted to char).
This gives a warning:
char *word2 = {'a', 'b', 'c', '\0'};
Because an initializer list is being used to initialize a type which is not an array or struct.
And this is OK:
char word1[] = "abc";
Because the C standard specifically allows initializing a char array with a string literal, as specified in section 6.7.9p14:
An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?
Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.
In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'
Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.
I noticed that in many books and online tutorials when C strings are initialized using brace enclosed lists, it is done like this:
char string[100] = {'t', 'e', 's', 't', '\0' };
Shouldn't all unspecified values automatically be set to 0 (aka '\0')?
I tested this on multiple versions of GCC and MSVC, all values are indeed set to zero, but I'm wondering if there's a specific reason for explicitly writing the null terminator when initializing.
You're right (see "Initialization from brace-enclosed lists" here). But it's a good practice because this would also compile without complaints:
char string[4] = {'t', 'e', 's', 't'};
but wouldn't null-terminate anything, which would lead to errors whenever you would use that as a string. If you say
char string[4] = {'t', 'e', 's', 't', '\0'};
the compiler will know it's an error because the '\0' won't fit.
Note that it's superior even to
char string[100] = "test";
for the same reason:
char string[4] = "test";
does the same as the first example, not the second!
It's to prevent silly mistakes. The first null character is required if the characters are to be interpreted as a string.
char string[4] = {'t', 'e', 's', 't'};
will happily compile, but will not be null-terminated.
char string[4] = {'t', 'e', 's', 't', '\0'};
will fail to compile, which is the point.
By explicitly specifying the null character, the compiler will verify that the array is large enough for its intended purpose.
Adding null terminator is absolutely unnecessary when you specify the size of your character array explicitly, and allocate more chars than is needed for the payload portion of your string. In this situation the standard requires that the remaining chars be initialized with zeros.
The only time when it is necessary is when you want the size of a null-terminated array to be determined automatically by the compiler, i.e.
char string[] = {'t', 'e', 's', 't', '\0' };
// ^^
This question already has answers here:
getting segmentation fault in a small c program
(3 answers)
Closed 7 years ago.
Why does the first version make the program crash, while the second one doesn't? Aren't they the same thing?
Pointer Notation
char *shift = "mondo";
shift[3] = shift[2];
Array Notation
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
MWE
int main( void )
{
char *shift = "mondo";
shift[3] = shift[2];
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
return 0;
}
No! This is one of the important issues in C. In the first, you create a pointer to a read-only part of memory, i.e. you can not change it, only read it. The second, makes an array of characters, i.e. a part of memory of continuous characters where you can have both read and write access, meaning you can both read and change the values of the array.
First one points to a string literal (usually in a read only section of code, should really be const char * but able to get away with it due to historical reasons)|.
The second one creates an array and then populates that array.
Therefore they are not the same
The first is allocating memory in the .TEXT segment while the second is putting it into the .BSS. Memory in the .TEXT segment is, effectively, read only or const:
char *string = "AAAA";
This creates what is effectively a const char * since the memory will be allocated in the .TEXT segment as a string literal. Since this will typically be marked read-only, an attempt to write to it will generate an access violation or segmentation fault.
You want to do this:
char string[] = "AAAA";
This will work as expected and allocate memory for a string of four capital As and use the variable string as a pointer to the location.
This creates a pointer to an existing string:
char *shift = "mondo";
This creates a new array of characters:
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
In the second case, you are allowed to modify the characters because they are the ones that you just created.
In the first case, you are just pointing to an existing string, which should never be modified. The details of where the string is stored is up to the particular compiler. For example, it can store the string in unmodifyable memory. The compiler is also allowed to do tricks to save space. For example:
char *s1 = "hello there";
char *s2 = "there";
s2 might actually point to the same letter 't' that is at the seventh position of the string that s1 points to.
To avoid confusion, prefer to use const pointers with string literals:
const char *shift = "mondo";
This way, the compiler will let you know if you accidentally try to modify it.
Whenever you define a string using
char * str = "hello";
This is implicitly expressed by compiler
const char * str= "hello";
Which makes this symbol goes to read only location of program memory.
But in case of array the same is interpreted as
char const *array[];
That's why compiler screams when user try to change base address of array.
This is implicit done by compiler
I'm trying to terminate a character pointer in c, at a specific location by setting the null terminator to it.
for examples if I have a char pointer
char *hi="hello";
I want it to be "hell" by setting the o to null.
I have tried doing this with strcpy with something like
strcpy(hi+4, "\0");
But it is not working.
"hello" is a string literal so it cannot modified, and in your code, hi points to the first element in such a literal. Any attempt to modify the thing it points to is undefined behaviour.
However, if you create your own char array, you can insert a null terminator at will. For example,
char hi[] = "hello"; // hi is array with {'h', 'e', 'l', 'l', 'o', '\0'}
hi[4] = '\0';
Here, hi is a length 6 array of char which you own and whose contents you can modify. After setting the 5th element, it contains {'h', 'e', 'l', 'l', '\0', '\0'}, and printing it would yield hell.
Point 1:
In your code
char *hi="hello";
hi is a pointer to a string literal. It may not be modifiable. You've to use a char array instead and initialize that with the same string literal. Then you can modify the contenets of that array as you may want.
Point 2:
You don't need strcpy() to copy a single char. You can simply assign the value using the assignment operator =.
Note: You don't terminate a pointer, you terminate achar array with a null-terminator to make that a string.
If the string is a literal you can't modify it. Otherwise:
To terminate a C string after 4 characters you could use:
*(he+4) = 0;
or
he[4] = 0;
he[4] = '\0';
or, since strcpy() copies all the characters specified and then appends a '\0' character:
strcpy(he+4, "");
but this is rather obfuscated.