C char array v C char* initialization - c

The following is accepted as valid c code by gcc version 6.3:
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
However the following fails:
char *green = { 'a', 'b', 'c' }; // gcc error
I am sure there is a perfectly rational reason for this to be the case, but I am wondering what it is. This question is motivated by the case when having to initialize an array of bytes (so unsigned char rather than char), it is very tempting to write something like { '\x43', '\xde', '\xa0' } rather than "\x43\xde\xa0", and as soon as you forget to write my_array[] instead of *my_array, you get caught by the compiler.

The following will produce an error
char *green = { 'a', 'b', 'c' };
Because the initializer for green isn't an array of characters as you believe. It doesn't have a type, it's just a brace-enclosed initializer list. The thing that it initializes in the previous samples (i.e. white) determines how it's interpreted. The same initialzier can be used to initialize any aggregate that is capable of holding 3 characters.
But green is a pointer, and not an aggregate, so you can't use a brace-enclosed initializer list as it's initial value.1
Now, the following two work but with very different semantics:
char blue[] = "abc";
char *red = "abc";
blue is an array. It will hold the same contents as the literal "abc". red is a pointer that points at the literal "abc".
You can use a compound literal expression:
char *green = (char[]){ 'a', 'b', 'c' };
It tells the compiler to create an unnamed object (the life time of which depends on the scope of the declaration), that is of character array type and is initialized with those three characters. The pointer is then assigned the address of that object.

These three declarations
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
are different.
The first one declares a character array that contains exactly three characters corresponding to the number of the initializers.
The second one declares a character array of four characters because it is initialized by a string literal that has four characters including the terminating zero. So this character array contains a string.
The third one defined a string literal that is a character array and declares a pointer of type char * that is initialized by the address of the first character of the character array corresponding to the string literal.
You can imagine this declaration like
char unnamed = { 'a', 'b', 'c', '\0' };
char *red = unnamed;
This declaration
char *green = { 'a', 'b', 'c' };
is invalid because the left object is a scalar and may not be initialized by a list that contains more than one initializer.
Take into account that you could use a compound literal to initialize the pointer. For example
char *green = ( char[] ){ 'a', 'b', 'c' };

Related

Why can you have an pointer to array of strings in C

why does
char *names [] = {"hello", "Jordan"};
work fine
but this does not
char names [] = {"hello", "Jordan"};
would appreciate if someone could explain this to me, thank you :).
Here
char *names [] = {"hello", "Jordan"};
names is array of char pointers i.e it can holds pointers i.e names each elements itself is one char array. But here
char names [] = {"hello", "Jordan"};
names is just a char array i.e it can hold only single char array like "hello" not multiple.
In second case like
int main(void) {
char names[] = {"hello", "Jordan"};
return 0;
}
when you compile(Suggest you to compile with -Wall -pedantic -Wstrict-prototypes -Werror flags), compiler clearly says
error: excess elements in char array initializer
which means you can't have more than one char array in this case. Correct one is
char names[] = {'h','e','l','l','o','\0'}; /* here names is array of characters */
Edit :- Also there is more possibility if syntax of names looks like below
char names[] = { "hello" "Jordan" }; /* its a valid one */
then here both hello and Jordan gets joined & it becomes single char array helloJordan.
char names[] = { "helloJordan" };
The first is an array of pointers to char. The second is an array of char and would have to look like char names[] = {'a', 'b', 'c'}
A string literal, such as "hello", is stored in static memory as an array of chars. In fact, a string literal has type char [N], where N is the number of characters in the array (including the \0 terminator). In most cases, an array identifier decays to a pointer to the first element of the array, so in most expressions a string literal such as "hello" will decay to a pointer to the char element 'h'.
char *names[] = { "hello", "Jordan" };
Here the two string literals decay to pointers to char which point to 'h' and 'J', respectively. That is, here the string literals have type char * after the conversion. These types agree with the declaration on the left, and the array names[] (which is not an array of character type, but an array of char *) is initialized using these two pointer values.
char names[] = "hello";
or similarly:
char names[] = { "hello" };
Here we encounter a special case. Array identifiers are not converted to pointers to their first elements when they are operands of the sizeof operator or the unary & operator, or when they are string literals used to initialize an array of character type. So in this case, the string literal "hello" does not decay to a pointer; instead the characters contained in the string literal are used to initialize the array names[].
char names[] = {"hello", "Jordan"};
Again, the string literals would be used to initialize the array names[], but there are excess initializers in the initializer list. This is a constraint violation according to the Standard. From §6.7.9 ¶2 of the C11 Draft Standard:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
A conforming implementation must issue a diagnostic in the event of a constraint violation, which may take the form of a warning or an error. On the version of gcc that I am using at the moment (gcc 6.3.0) this diagnostic is an error:
error: excess elements in char array initializer
Yet, for arrays of char that are initialized by an initializer list of char values rather than by string literals, the same diagnostic is a warning instead of an error.
In order to initialize an array of char that is not an array of pointers, you would need a 2d array of chars here. Note that the second dimension is required, and must be large enough to contain the largest string in the initializer list:
char names[][100] = { "hello", "Jordan" };
Here, each string literal is used to initialize an array of 100 chars contained within the larger 2d array of chars. Or, put another way, names[][] is an array of arrays of 100 chars, each of which is initialized by a string literal from the initializer list.
char name[] is an array of characters so you can store a word in it:
char name[] = "Muzol";
This is the same of:
char name[] = {'M', 'u', 'z', 'o', 'l', '\0'}; /* '\0' is NULL, it means end of the array */
And char* names[] is an array of arrays where each element of the first array points to the start of the elements of the second array.
char* name[] = {"name1", "name2"};
It's the same of:
char name1[] = {'n', 'a', 'm', 'e', '1', '\0'}; /* or char name1[] = "name1"; */
char name2[] = {'n', 'a', 'm', 'e', '2', '\0'}; /* or char name2[] = "name2"; */
char* names[] = { name1, name2 };
So basically names[0] points to &name1[0], where it can read the memory until name1[5], this is where it finds the '\0' (NULL) character and stops. The same happens for name2[];

Declaring and modifying strings in C

I've recently started to try learn the C programming language. In my first program (simple hello world thing) I came across the different ways to declare a string after I realised I couldn't just do variable_name = "string data":
char *variable_name = "data"
char variable_name[] = "data"
char variable_name[5] = "data"
What I don't understand is the difference between them. I know they are different and one of them specifically allocates an amount of memory to store the data in but that's about it, and I feel like I need to understand this inside out before moving onto more complex concepts in C.
Also, why does using *variable_name let me reassign the variable name to a new string but variable_name[number] or variable_name[] does not? Surely if I assign, say, 10 bytes to it (char variable_name[10] = "data") and try reassigning it to something that is 10 bytes or smaller it should work, so why doesn't it?
What are the empty brackets and the asterix doing?
In this declaration
char *variable_name = "data";
there is declared a pointer. This pointer points to the first character of the string literal "data". The compiler places the string literal in some region of memory and assigns the pointer by the address of the first character of the literal.
You may reassign the pointer. For example
char *variable_name = "data";
char c = 'A';
variable_name = &c;
However you may not change the string literal itself. An attempt to change a string literal results in undefined behaviour of the program.
In these declarations
char variable_name[] = "data";
char variable_name[5] = "data";
there are declared two arrays elements of which are initialized by characters of used for the initialization string literals. For example this declaration
char variable_name[] = "data";
is equivalent to the following
char variable_name[] = { 'd', 'a', 't', 'a', '\0' };
The array will have 5 elements. So this declaration is fully euivalent to the declaration
char variable_name[5] = "data";
There is a difference if you would specify some other size of the array. For example
char variable_name[7] = "data";
In this case the array would be initialized the following way
char variable_name[7] = { 'd', 'a', 't', 'a', '\0', '\0', '\0' };
That is all elements of the array that do not have explicit initializers are zero-initialized.
Pay attention to that in C you may declare a character array using a string literal the following way
char variable_name[4] = "data";
that is the terminating zero of the string literal is not placed in the array.
In C++ such a declaration is invalid.
Of course you may change elements of the array (if it is not defined as a constant array) if you want.
Take into account that you may enclose a string literal used as an initializer in braces. For example
char variable_name[5] = { "data" };
In C99 you may also use so-called destination initializers. For example
char variable_name[] = { [4] = 'A', [5] = '\0' };
Here is a demonstrative program
#include <stdio.h>
#include <string.h>
int main(void)
{
char variable_name[] = { [4] = 'A', [5] = '\0' };
printf( "%zu\n", sizeof( variable_name ) );
printf( "%zu\n", strlen( variable_name ) );
return 0;
}
The program output is
6
0
When ypu apply standard C function strlen declared in header <string.h> you get that it returns 0 because the first elements of the array that precede the element with index 4 are zero initialized.

Initializing a char array with an explicit size and initialized to bigger than the size

I've been reading some code and I encountered the following:
int function(){
char str[4] = "ABC\0";
int number;
/* .... */
}
Normally, when you write a string literal to initialize a char array, the string should be null terminated implicitly right? What happens in this case? Does the compiler recognize the '\0' in the string literal and make that the null terminator? or does it overflow to the int number? Is there anything wrong with this form?
The C99 standard §6.7.8.¶14 says
An array of character type may be initialized by a character string
literal, optionally enclosed in braces. Successive characters of the
character string literal (including the terminating null character if
there is room or if the array is of unknown size) initialize the
elements of the array.
This means that the following statements are equivalent.
char str[4] = "ABC\0";
// equivalent to
char str[4] = "ABC";
// equivalent to
char sr[4] = {'A', 'B', 'C', '\0'};
So there's nothing wrong with the first statement above. As the standard explicitly states, only that many characters in the string literal are used for initializing the array as the size of the array. Note that the string literal "ABC\0" actually contains five characters. '\0' is just like any character, so it's fine.
However please note that there's a difference between
char str[4] = "ABC\0";
// equivalent to
char str[4] = {'A', 'B', 'C', '\0'};
char str[] = "ABC\0"; // sizeof(str) is 5
// equivalent to
char str[] = {'A', 'B', 'C', '\0', '\0'};
That's because the string literal "ABC\0" contains 5 characters and all these characters are used in the initialization of str when the size of the array str is not specified. Contrary to this, when the size of str is explicitly stated as 4, then only the first 4 characters in the literal "ABC\0" are used for its initialization as clearly mentioned in the above quoted para from the standard.
If the code is:
char str[3] = "ABC";
It's fine in C, but the character array str is not a string because it's not null-terminated. See C FAQ: Is char a[3] = "abc"; legal? What does it mean? for detail.
In your example:
char str[4] = "ABC\0";
The last character of the array str happens to be set to '\0', so it's fine and it's a string.

Initialize a const array with a pointer

Why is the first line valid but the rest invalid. I though the first was a shorthand for the second.
const char *c = "abc"; // Why valid?
const char *b = { 'a' , 'b', 'c', '\0' }; // invalid
const int *a = { 1, 2, 3, 0 }; // invalid
The real difference here is that "abc" is a string literal of type char[], whereas {'a', 'b', 'c', 0} is not. You could easily use it to initialize a completely different type, for example:
struct s{
char c;
int i, j;
float f;
} x = {'a', 'b', 'c', 0};
So when you write const char *b = { 'a' , 'b', 'c', '\0' };, the compiler picks the implicit conversion {'a'} -> 'a' and tries to initialize your pointer with that. This may or may not fail depending on the compiler and the actual value, for example many compilers would interpret const char *b = { '\0' }; as initializing b to a NULL pointer instead of an empty string as one could expect.
If you want to initialize the pointer to the address of an array (or any other type) created with list initialization, you should cast explicitly:
const char *b = (const char[]){'a', 'b', 'c', 0};
const int *a = (const int[]){'a', 'b', 'c', 0};
struct s *x = &(struct s){'a', 'b', 'c', 0};
In the first case you have a string literal which are arrays of char, it will be converted to a pointer to char in this context.
In the next two cases you are attempting to use list initialization to initialize a pointer which will attempt to convert the first element of the list to a pointer which generates a warning since neither a char or an int are pointers, the same way this would:
const char *b = 'a' ;
If you had valid pointers in the list it would work fine for the first element but would be ill-formed since you have more initializers than variables.
Array and pointer isn't the same thing.
const char *c = "abc";
Initialise pointer with address of string constant. Sting constant contained elsewhere (not on stack, usually special global constant area).
const char c[] = "abc";
Initialise array of chars with given characters (with contents of given string). This one would be on stack.
The first line is valid because the C standard allows for the creation of constant strings, because their length can be determined at compile time.
The same does not apply to pointers: the compiler can't decide whether it should allocate the memory for the array in the heap (just like a normal int[], for instance) or in regular memory, as in malloc().
If you initialize the array as:
int a[] = { 1, 2, 3, 0 };
then it becomes valid, because now the compiler is sure that you want an array in the heap (temporary) memory, and it will be freed from memory after you leave the code section on which this is declared.

Char array declaring problems

Why can I do
char identifier[4] = {'A', 'B', 'C', 'D'};
and not
char identifier[4];
&identifier = {'A', 'B', 'C', 'D'}; // syntax error : '{'
?
And why can I do
char identifier[4] = "ABCD"; // ABCD\0, aren't that 5 characters??
and not
char identifier[4];
&identifier = "ABCD"; // 'char (*)[4]' differs in levels of indirection from 'char [5]'
?
Is this a joke??
You can only initialize the array when you declare it.
As for char identifier[4] = "ABCD", this is indeed possible but the syntax is used to deliberately omit the trailing NUL character. Do char identifier[] = "ABCD" to let the compiler count the characters and add the NUL ('\0') for you.
Three points:
Initialisation is not assignment
Arrays are not first-class types so cannot be assigned. You have to assign the elements individually (or use a function such as strcpy() or memcpy().
The address of an array is provided by the array name on its own.
In your last example, the following is a valid solution:
char identifier[4];
memcpy(identifier, "ABCD", sizeof(identifier) ) ;
You cannot use strcpy() here, because that would require an array of 5 characters to allow for the nul terminator. The error message about levels of indirection is not a "joke", it is your error; note in the above code identifier does not have a & operator, since that would make it a char** where a char* is required.
What Arkku said, but also, you cannot assign to the address of something, i.e. &x = ... is never legal.

Resources