Initialization different from assignment? - c

1.char str[] = "hello"; //legal
2.char str1[];
str1 = "hello"; // illegal
I understand that "hello" returns the address of the string literal from the string literal pool which cannot be directly assigned to an array variable. And in the first case the characters from the "hello" literal are copied one by one into the array with a '\0' added at the end.
Is this because the assignment operator "=" is overloaded here to support this?
I would also like to know other interesting cases wherein initialization is different from assignment.

You cannot think of it as overloading (which doesn't exist in C anyway), because the initialization of char arrays with string literals is a special case. The type of a string literal is const char[N], so if it were similar to overloading, you'd be able to initialize a char array with any expression whose type is const char[N]. But you cannot!
const char arr[3];
const char arr1[] = arr; //compiler error. Cannot initialize array with another array.
The language standard simply says that character arrays can be initialized with string literals. Since they say nothing about assignment, the general rules apply, in particular, that an array cannot be assigned to.
As for other cases when initialization is different from assignment: in C++, where there are references and classes, there would be zillions of examples. In C, with no full-fledged classes or references, the only other thing I can think of off the top of my head is const variables:
const int a = 4; //OK;
const int b; //Error;
b = 4; //Error;
Another example: array initialization with braces
int a[3] = {1,2,3}; //OK
int b[3];
b = {1,2,3}; //error
Same with structs

If you want to think of it as the operator being overloaded (even though C doesn't use the term), you can of course do that.
Do you also consider this to be overloading:
unsigned char x;
double y;
x = 2;
y = 1.243;
Those are assigning totally different types of data, after all, but using the "same operator", right?
It's just different, to be initializing or to be assigning.
Another big difference is that you used to be able to initialize structures, but there was no corresponding "struct literal" syntax for later assignments. This is no longer true as of C99, where we now have compound literals.

char str[] = "hello";
Is array initialization, using syntactic sugar defined in C because string initialization is so common. The compiler allocates some fixed memory in your program an initializes it. The name of the array (str) evaluates to the address of this memory, and it cannot be changed because there is no variable which holds that address.
Grijesh Chauhan explains more details of this.
Other cases depend on what you mean. Extending the current case, you can easily see that other initialized arrays have the same properties, for example
int a[] = { 1, 2, 3, 4 };

Array has non modifiable address. You need a pointer as a modifiable lvalue.
By assigning(trying) to a contant string literal, you are taking the address of it. Different address causes that illegality.
"hello" allocates some space in memory and gives and address. Then you take its address to initialize the array.

Related

Why can't a defined, fix-sized array be assigned using a compound literal?

Why is it so that a struct can be assigned after defining it using a compound literal (case b) in sample code), while an array cannot (case c))?
I understand that case a) does not work as at that point compiler has no clue of the memory layout on the rhs of the assignment. It could be a cast from any type. But going with this line, in my mind case c) is a perfectly well-defined situation.
typedef struct MyStruct {
int a, b, c;
} MyStruct_t;
void function(void) {
MyStruct_t st;
int arr[3];
// a) Invalid
st = {.a=1, .b=2, .c=3};
// b) Valid since C90
st = (MyStruct_t){.a=1, .b=2, .c=3};
// c) Invalid
arr = (int[3]){[0]=1, [1]=2, [2]=3};
}
Edit:
I am aware that I cannot assign to an array - it's how C's been designed. I could use memcpy or just assign values individually.
After reading the comments and answers below, I guess now my question breaks down to the forever-debated conundrum of why you can't assign to arrays.
What's even more puzzling as suggested by this post and M.M's comment below is that the following assignments are perfectly valid (sure, it breaks strict aliasing rules). You can just wrap an array in a struct and do some nasty casting to mimic an assignable array.
typedef struct Arr3 {
int a[3];
} Arr3_t;
void function(void) {
Arr3_t a;
int arr[3];
a = (Arr3_t){{1, 2, 3}};
*(Arr3_t*)arr = a;
*(Arr3_t*)arr = (Arr3_t){{4, 5, 6}};
}
So then what's stopping developers to include a feature like this to, say C22(?)
C does not have assignment of arrays, at all. That is, where array has any array type, array = /* something here */ is invalid regardless of the contents of "something here". Whether it's a compound literal (which you seem to have confused with designated initializer, a completely different concept) is irrelevant. array1 = array2 would be just as invalid.
As to why it's invalid, at some level that's a question of the motivations/rationale of the C language and its design and unanswerable. However, mechanically, arrays in any context except the operand of sizeof or the operand of & "decay" to pointers to their first element. So in the case of:
arr = (int[3]){[0]=1, [1]=2, [2]=3};
you are attempting to assign pointer to the first element of the compound literal array to a non-lvalue (the rvalue produced when arr decays). And of course that is nonsense.
A compound array literal can be used anywhere that an actual array variable can be used. Since you can't assign one array to another array, it's also not valid to assign a compound literal to an array.
Since you can copy arrays using memcpy(), you could write:
memcpy(arr, (int[3]){[0]=1, [1]=2, [2]=3}, sizeof(arr));
Just like the array variable, the array literal decays to a pointer to its first element.
Compound struct literals can also be used in place of an actual struct variable. But structs can be assign to each other, so it's valid to assign a compound struct literal to a struct variable.
That's the difference between the two cases.

Confusion about how char *s and char s[] works at low level

I know similar questions, like this question, have been posted and answered here but those answers don't offer me the complete picture, hence I'm posting this as a new question. Hope that is ok.
See following snippets -
char s[9] = "foobar"; //ok
s[1] = 'z' //also ok
And
char s[9];
s = "foobar" //doesn't work. Why?
But see following cases -
char *s = "foobar"; //works
s[1] = 'z'; //doesn't work
char *s;
s = "foobar"; //unlike arrays, works here
It is a bit confusing. I mean I have vague understanding that we can't assign values to arrays. But we can modify it. In case of char *s, it seems we can assign values but can't modify it because it is written in read only memory. But still I can't get the full picture.
What exactly is happening at low level?
char s[9] = "foobar"; This is initialization. An array of characters of size 9 is declared and then its contents receives the string "foobar" with any remaining characters set to '\0'.
s = "foobar" is just invalid C syntax. You cannot assign a string to a char array. To make s have the value foobar. Use strcpy(s,"foobar");
char *s = "foobar"; is also initialization, however, this assigns the address of the constant string foobar to the pointer variable s. Note that I say "constant string". A string literal is on most platforms constant. A better way of making this clear is to write const char *s = "foobar";
And indeed, your next assignment s[1]= 'z'; will not work because s is constant.
You need to understand what the expressions are actually doing, then it might come clear to you.
char s[9] = "foobar"; -> Initialize the char array s by the string literal "foobar". Correct.
s[1] = 'z' -> Assign the character constant 'z' to the second elem. of char array s. Correct.
char s[9]; s = "foobar"; -> Declare the char array a, then attempt to assign the string literal "foobar" to the char array. Not permissible. You can´t actually assign arrays in C, you can only initialize an array of char with a string when defining the array itself. That´s the difference. If you want to copy a string into an array of char use strcpy(s, "foobar"); instead.
char *s = "foobar"; -> Define the pointer to char s and initialize it to point to the string literal "foobar". Correct.
s[1] = 'z'; -> Attempt to modify the string literal "foobar", to which is s pointing to. Not permissible. A string literal is stored in read-only memory.
char *s; s = "foobar"; -> Declare the pointer to char s. Then assign the pointer to point to the string literal "foobar". Correct.
This declares array s with an initializer:
char s[9] = "foobar"; //ok
But this is an invalid assignment expression with array s on the left:
s = "foobar"; //doesn't work. Why?
Assignment expressions and declarations with initializers are not the same thing syntactically, although they both use an = in their syntax.
The reason that the assignment to the array s doesn't work is that the array decays to a pointer to its first element in the expression, so the assignment is equivalent to:
&(s[0]) = "foobar";
The assignment expression requires an lvalue on the left hand side, but the result of the & address operator is not an lvalue. Although the array s itself is an lvalue, the expression converts it to something that isn't an lvalue. Therefore, an array cannot be used on the left hand side of an assignment expression.
For the following:
char *s = "foobar"; //works
The string literal "foobar" is stored as an anonymous array of char and as an initializer it decays to a pointer to its first element. So the above is equivalent to:
char *s = &(("foobar")[0]); //works
The initializer has the same type as s (char *) so it is fine.
For the subsequent assignment:
s[1] = 'z'; //doesn't work
It is syntactically correct, but it violates a constraint, resulting in undefined behavior. The constraint that is being violated is that the anonymous arrays created by string literals are not modifiable. Assignment to an element of such an array is a modification and not allowed.
The subsequent assignment:
s = "foobar"; //unlike arrays, works here
is equivalent to:
s = &(("foobar")[0]); //unlike arrays, works here
It is assigning a char * value to a variable of type char *, so it is fine.
Contrast the following use of the initializer "foobar":
char *s = "foobar"; //works
with its use in the earlier declaration:
char s[9] = "foobar"; //ok
There is a special initialization rule that allows an array of char to be initialized by a string literal optionally enclosed by braces. That initialization rule is being used to initialize char s[9].
The string literal used to initialize the array also creates an anonymous array of char (at least notionally) but there is no way to access that anonymous array of char, so it may get omitted from the output of the compiler. This is in contrast with the anonymous array of char created by the string literal used to initialize char *s which can be accessed via s.
It may help to think of C as not allowing you to do anything with arrays except for assisting in a few special cases. C originated when programming languages did little more than help you move individual bytes and “words” (2 or maybe 4 bytes) around and do simple arithmetic and operations with them. With that in mind, let’s look at your examples:
char s[9] = "foobar"; //ok
This is one of the special cases: When you define an array of characters, the compiler will help you initialize it. In a definition, you may provide a string literal, which represents an array of characters, and the compiler will initialize your array with the contents of the string literal.
s[1] = 'z' //also ok
Yes, this just moves the value of one character into one array element.
char s[9];
s = "foobar" //doesn't work. Why?
This does not work because there is no assistance here. s and "foobar" are both arrays, but C has no provision for handling an array as one whole object.
However, although C does not handle an array as a whole object, it does provide some assistance for working with arrays. Since the compiler would not work with whole arrays, programmers needed some other ways to work with arrays. So C was given a feature that, when you used an array in an expression, the compiler would automatically convert it to a pointer to the first element of the array, and that would help the programmer write code to work with elements of the array. We see that in your next example:
char *s = "foobar"; //works
char *s declares s to be a pointer to char. Next, the string literal "foobar" represents an array. Above, we saw that using a string literal to initialize an array was a special case. However, here the string literal is not used to initialize an array. It is used to initialize a pointer, so the special case rules do not apply. In this case, the array represented by the string literal is automatically converted to a pointer to its first element. So s is initialized to be a pointer to the first element of the array containing “f”, “o”, “o”, “b”, “a”, “r”, and a null character.
s[1] = 'z'; //doesn't work
The arrays defined by string literals are intended to be constants. They are “read-only” in the sense that the C standard does not define what happens when you try to modify them. In many C implementations, they are assigned to memory that is read-only because the operating system and the computer hardware do not allow writing to it by normal program means. So s[1] = 'z'; may get an exception (trap) or a warning or error message from the compiler. (Ideally, char *s = "foobar"; would be disallowed because "foobar", being a constant, would have type const char [7]. However, because const did not exist in early C, the types of string literals do not have const.)
char *s;
s = "foobar"; //unlike arrays, works here
Here s is a char *, and the string literal "foobar" is automatically converted to a pointer to its first element, and that pointer is a char *, so the assignment is fine.

Why pointers can't be used to index arrays? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I am trying to change value of character array components using a pointer. But I am not able to do so. Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
I tried accessing arrays using A[0] and it worked. But I am not able to change values of the array components.
{
char *A = "ab";
printf("%c\n", A[0]); //Works. I am able to access A[0]
A[0] = 'c'; //Segmentation fault. I am not able to edit A[0]
printf("%c\n", A[0]);
}
Expected output:
a
c
Actual output:
a
Segmentation fault
The difference is that char A[] defines an array and char * does not.
The most important thing to remember is that arrays are not pointers.
In this declaration:
char *A = "ab";
the string literal "ab" creates an anonymous array object of type char[3] (2 plus 1 for the terminating '\0'). The declaration creates a pointer called A and initializes it to point to the initial character of that array.
The array object created by a string literal has static storage duration (meaning that it exists through the entire execution of your program) and does not allow you to modify it. (Strictly speaking an attempt to modify it has undefined behavior.) It really should be const char[3] rather than char[3], but for historical reasons it's not defined as const. You should use a pointer to const to refer to it:
const char *A = "ab";
so that the compiler will catch any attempts to modify the array.
In this declaration:
char A[] = "ab";
the string literal does the same thing, but the array object A is initialized with a copy of the contents of that array. The array A is modifiable because you didn't define it with const -- and because it's an array object you created, rather than one implicitly created by a string literal, you can modify it.
An array indexing expression, like A[0] actually requires a pointer as one if its operands (and an integer as the other). Very often that pointer will be the result of an array expression "decaying" to a pointer, but it can also be just a pointer -- as long as that pointer points to an element of an array object.
The relationship between arrays and pointers in C is complicated, and there's a lot of misinformation out there. I recommend reading section 6 of the comp.lang.c FAQ.
You can use either an array name or a pointer to refer to elements of an array object. You ran into a problem with an array object that's read-only. For example:
#include <stdio.h>
int main(void) {
char array_object[] = "ab"; /* array_object is writable */
char *ptr = array_object; /* or &array_object[0] */
printf("array_object[0] = '%c'\n", array_object[0]);
printf("ptr[0] = '%c'\n", ptr[0]);
}
Output:
array_object[0] = 'a'
ptr[0] = 'a'
String literals like "ab" are supposed to be immutable, like any other literal (you can't alter the value of a numeric literal like 1 or 3.1419, for example). Unlike numeric literals, however, string literals require some kind of storage to be materialized. Some implementations (such as the one you're using, apparently) store string literals in read-only memory, so attempting to change the contents of the literal will lead to a segfault.
The language definition leaves the behavior undefined - it may work as expected, it may crash outright, or it may do something else.
String literals are not meant to be overwritten, think of them as read-only. It is undefined behavior to overwrite the string and your computer chose to crash the program as a result. You can use an array instead to modify the string.
char A[3] = "ab";
A[0] = 'c';
Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
Yes, because the second one is not an array but a pointer.
The type of "ab" is char /*readonly*/ [3]. It is an array with immutable content. So when you want a pointer to that string literal, you should use a pointer to char const:
char const *foo = "ab";
That keeps you from altering the literal by accident. If you however want to use the string literal to initialize an array:
char foo[] = "ab"; // the size of the array is determined by the initializer
// here: 3 - the characters 'a', 'b' and '\0'
The elements of that array can then be modified.
Array-indexing btw is nothing more but syntactic sugar:
foo[bar]; /* is the same as */ *(foo + bar);
That's why one can do funny things like
"Hello!"[2]; /* 'l' but also */ 2["Hello!"]; // 'l'

Pointer initialization in C

In C why is it legal to do
char * str = "Hello";
but illegal to do
int * arr = {0,1,2,3};
I guess that's just how initializers work in C. However, you can do:
int *v = (int[]){1, 2, 3}; /* C99. */
As for C89:
"A string", when used outside char array initialization, is a string literal; the standard says that when you use a string literal it's as if you created a global char array initialized to that value and wrote its name instead of the literal (there's also the additional restriction that any attempt to modify a string literal results in undefined behavior). In your code you are initializing a char * with a string literal, which decays to a char pointer and everything works fine.
However, if you use a string literal to initialize a char array, several magic rules get in action, so it is no longer "as if an array ... etc" (which would not work in array initialization), but it's just a nice way to tell the compiler how the array should be initialized.
The {1, 2, 3} way to initialize arrays keeps just this semantic: it's only for initialization of array, it's not an "array literal".
In the case of:
char * str = "Hello";
"Hello" is a string literal. It's loaded into memory (but often read-only) when the program is run, and has a memory address that can be assigned to a pointer like char *str. String literals like this are an exception, though.
With:
int * arr = {0,1,2,3};
..you're effectively trying to point at an array that hasn't been put anywhere in particular in memory. arr is a pointer, not an array; it holds a memory address, but does not itself have storage for the array data. If you use int arr[] instead of int *arr, then it works, because an array like that is associated with storage for its contents. Though the array decays to a pointer to its data in many contexts, it's not the same thing.
Even with string literals, char *str = "Hello"; and char str[] = "Hello"; do different things. The first sets the pointer str to point at the string literal, and the second initializes the array str with the values from "Hello". The array has storage for its data associated with it, but the pointer just points at data that happens to be loaded into memory somewhere already.
Because there's no point in declaring and initializing a pointer to an int array, when the array name can be used as a pointer to the first element. After
int arr[] = { 0, 1, 2, 3 };
you can use arr like an int * in almost all contexts (the exception being as operand to sizeof).
... or you can abuse the string literals and store the numbers as a string literal, which for a little endian machine will look as follows:
int * arr = (int *)"\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0";

C compound literals, pointer to arrays

I'm trying to assign a compound literal to a variable, but it seems not to work, see:
int *p[] = (int *[]) {{1,2,3},{4,5,6}};
I got a error in gcc.
but if I write only this:
int p[] = (int []) {1,2,3,4,5,6};
Then it's okay.
But is not what I want.
I don't understand why the error occurrs, because if I initialize it like a array, or use it with a pointer of arrays of chars, its okay, see:
int *p[] = (int *[]) {{1,2,3},{4,5,6}}; //I got a error
int p[][3] = {{1,2,3},{4,5,6}}; //it's okay
char *p[] = (char *[]) {"one", "two"...}; // it's okay!
Note I don't understand why I got an error in the first one, and please I can't, or I don't want to write like the second form because it's needs to be a compound literals, and I don't want to say how big is the array to the compiler. I want something like the second one, but for int values.
Thanks in advance.
First, the casts are redundant in all of your examples and can be removed. Secondly, you are using the syntax for initializing a multidimensional array, and that requires the second dimension the be defined in order to allocate a sequential block of memory. Instead, try one of the two approaches below:
Multidimensional array:
int p[][3] = {{1,2,3},{4,5,6}};
Array of pointers to one dimensional arrays:
int p1[] = {1,2,3};
int p2[] = {4,5,6};
int *p[] = {p1,p2};
The latter method has the advantage of allowing for sub-arrays of varying length. Whereas, the former method ensures that the memory is laid out contiguously.
Another approach that I highly recommend that you do NOT use is to encode the integers in string literals. This is a non-portable hack. Also, the data in string literals is supposed to be constant. Do your arrays need to be mutable?
int *p[] = (int *[]) {
"\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00",
"\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00"
};
That example might work on a 32-bit little-endian machine, but I'm typing this from an iPad and cannot verify it at the moment. Again, please don't use that; I feel dirty for even bringing it up.
The casting method you discovered also appears to work with a pointer to a pointer. That can be indexed like a multidimensional array as well.
int **p = (int *[]) { (int[]) {1,2,3}, (int[]) {4,5,6} };
First understand that "Arrays are not pointers".
int p[] = (int []) {1,2,3,4,5,6};
In the above case p is an array of integers. Copying the elements {1,2,3,4,5,6} to p. Typecasting is not necessary here and both the rvalue and lvalue types match which is an integer array and so no error.
int *p[] = (int *[]) {{1,2,3},{4,5,6}};
"Note I don't understand why I got a error in the first one,.."
In the above case, p an array of integer pointers. But the {{1,2,3},{4,5,6}} is a two dimensional array ( i.e., [][] ) and cannot be type casted to array of pointers. You need to initialize as -
int p[][3] = { {1,2,3},{4,5,6} };
// ^^ First index of array is optional because with each column having 3 elements
// it is obvious that array has two rows which compiler can figure out.
But why did this statement compile ?
char *p[] = {"one", "two"...};
String literals are different from integer literals. In this case also, p is an array of character pointers. When actually said "one", it can either be copied to an array or point to its location considering it as read only.
char cpy[] = "one" ;
cpy[0] = 't' ; // Not a problem
char *readOnly = "one" ;
readOnly[0] = 't' ; // Error because of copy of it is not made but pointing
// to a read only location.
With string literals, either of the above case is possible. So, that is the reason the statement compiled. But -
char *p[] = {"one", "two"...}; // All the string literals are stored in
// read only locations and at each of the array index
// stores the starting index of each string literal.
I don't want to say how big is the array to the compiler.
Dynamically allocating the memory using malloc is the solution.
Hope it helps !
Since nobody's said it: If you want to have a pointer-to-2D-array, you can (probably) do something like
int (*p)[][3] = &(int[][3]) {{1,2,3},{4,5,6}};
EDIT: Or you can have a pointer to its first element via
int (*p)[3] = (int[][3]) {{1,2,3},{4,5,6}};
The reason why your example doesn't work is because {{1,2,3},{4,5,6}} is not a valid initializer for type int*[] (because {1,2,3} is not a valid initializer for int*). Note that it is not an int[2][3] — it's simply an invalid expression.
The reason why it works for strings is because "one" is a valid initializer for char[] and char[N] (for some N>3). As an expression, it's approximately equivalent to (const char[]){'o','n','e','\0'} except the compiler doesn't complain too much when it loses constness.
And yes, there's a big difference between an initializer and an expression. I'm pretty sure char s[] = (char[]){3,2,1,0}; is a compile error in C99 (and possibly C++ pre-0x). There are loads of other things too, but T foo = ...; is variable initialization, not assignment, even though they look similar. (They are especially different in C++, since the assignment operator is not called.)
And the reason for the confusion with pointers:
Type T[] is implicitly converted to type T* (a pointer to its first element) when necessary.
T arg1[] in a function argument list actually means T * arg1. You cannot pass an array to a function for Various Reasons. It is not possible. If you try, you are actually passing a pointer-to-array. (You can, however, pass a struct containing a fixed-size array to a function.)
They both can be dereferenced and subscripted with identical (I think) semantics.
EDIT: The observant might notice that my first example is roughly syntactically equivalent to int * p = &1;, which is invalid. This works in C99 because a compound literal inside a function "has automatic storage duration associated with the enclosing block" (ISO/IEC 9899:TC3).
The one that you are using is array of int pointers. You should use pointer to array :
int (*p)[] = (int *) {{1,2,3}, {4,5,6}}
Look at this answer for more details.
It seems you are confusing pointers and array. They're not the same thing! An array is the list itself, while a pointer is just an address. Then, with pointer arithmetic you can pretend pointers are array, and with the fact that the name of an array is a pointer to the first element everything sums up in a mess. ;)
int *p[] = (int *[]) {{1,2,3},{4,5,6}}; //I got a error
Here, p is an array of pointers, so you are trying to assign the elements whose addresses are 1, 2, 3 to the first array and 4, 5, 6 to the second array. The seg fault happens because you can't access those memory locations.
int p[][3] = {{1,2,3},{4,5,6}}; //it's okay
This is ok because this is an array of arrays, so this time 1, 2, 3, 4, 5 and 6 aren't addresses but the elements themselves.
char *p[] = (char *[]) {"one", "two"...}; // it's okay!
This is ok because the string literals ("one", "two", ...) aren't really strings but pointers to those strings, so you're assigning to p[1] the address of the string literal "one".
BTW, this is the same as doing char abc[]; abc = "abc";. This won't compile, because you can't assign a pointer to an array, while char *def; def = "def"; solves the problem.

Resources