The behaviour of unions in C

The behaviour of unions in C - c

I am really dumb. Please help me out by explaining the output:
#include <stdio.h>
union x
{
int a;
char b;
double c;
};
int main()
{
union x x[3] = {{1}, {'a'}, {1.2}};
int i;
for(i = 0; i < 3; i++)
printf("%d , %d , %lf\n", x[i].a, x[i].b, x[i].c);
return 0;
}
Output:

When you provide an initial value for a union, the compiler assigns it to the first member of the union. So, given:
union x
{
int a;
char b;
double c;
};
and:
union x x[3] = {{1}, {'a'}, {1.2}};
the compiler initializes x[0].a to 1, x[1].a to a, and x[2].a to 1.2. The compiler does not use the types of the initial values to match them to union elements. It merely initializes the first member with the value you give it. If there is a difference in types, the compiler converts the value to the type of the first member. (If that is a legal conversion. Otherwise, the compiler should provide a warning or error message.)
To initialize specific members of the union, you can use designated initializers, in which you explicitly name the member of the union you want to initialize:
union x x[3] = { { .a = 1 }, { .b = 'a' }, { .c = 1.2 } };

think of a union as an allocated space in the memory (size of which is the largest element in the union) each union defined can be interpeted as either element of the union so you can look at that allocated memory as a different type
https://www.tutorialspoint.com/cprogramming/c_unions.htm
taken from the link:
#include <stdio.h>
#include <string.h>
union Data {
int i;
float f;
char str[20];
};
int main( ) {
union Data data;
data.i = 10;
printf( "data.i : %d\n", data.i);
data.f = 220.5;
printf( "data.f : %f\n", data.f);
strcpy( data.str, "C Programming");
printf( "data.str : %s\n", data.str);
return 0;
}
as you can see it has defined one union but it used three different times as a different type
EDIT:
by C99 when initalizing a union the first element is the one initialized hence the value 1.2 is casted to 1 where in the case of the char the casting works as expected
if you reorder the elements in your union you will get:
0 , 0 , 1.000000
0 , 0 , 97.000000
858993459 , 51 , 1.200000
for further explanation:
http://en.cppreference.com/w/c/language/struct_initialization
When initializing a struct, the first initializer in the list
initializes the first declared member (unless a designator is
specified) (since C99), and all subsequent initializers without
designators (since C99)initialize the struct members declared after
the one initialized by the previous expression.
as mentioned in other comments there is a lot of information on the issue and you should research it further, but i hope this gives you a starting point

As per C99 standard, you can still initialize your union with one-liner using designated initializers without losing your data:
union x x[3] = {{.a = 1}, {.b = 'a'}, {.c = 1.2}};
This will force compiler to use particular union's member, rather than type of first declared one (which is int in your case).
Output:
1, 1, 0.000000
97, 97, 0.000000
858993459, 51, 1.200000
Proof.

Related

Unions and values stored

I know that a union allows to store different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Consider this program:
#include <stdio.h>
union integer {
short s;
int i;
long l;
};
int main() {
union integer I;
scanf("%hi", &I.s);
scanf("%d", &I.i);
scanf("%ld", &I.l);
printf("%hi - %d - %ld ", I.s, I.i, I.l );
}
Suppose we enter the values 11, 55, 13 the program will give as output
13 - 13 - 13, no problem here. However, if i were to create three different variables of type struct integer
#include <stdio.h>
union integer {
short s;
int i;
long l;
};
int main() {
union integer S;
union integer I;
union integer L;
scanf("%hi", &S.s);
scanf("%d", &I.i);
scanf("%ld", &L.l);
printf("%hi - %d - %ld ", S.s, I.i, L.l );
}
than all the values will be preserved.
How come? By using three variables am i actually using three unions, each holding just one value?

The union members s, i and l of the same variable share the same memory. Reading a different member than you have written last is undefined behavior.
If you define 3 variables of the same union type it is not much different from defining 3 variables of type int. Every variable has its own memory, and every variable can hold only one of the union members.

What output do you expect from this code? Three different values or the same one?
#include <stdio.h>
int main() {
short S;
int I;
long L;
scanf("%hi", &S);
scanf("%d", &I);
scanf("%ld", &L);
printf("%hi - %d - %ld ", S, I, L );
}
You are declaring three separate variables, even if they are unions, all of them have their storage.

Why can the struct initializer be used after initialization but not the array one?

Struct initializes can be used after initialization by casting it. Eg.
struct EST {
int a;
float b;
}est;
int main() {
est = (struct EST){23, 45.4}; //This works nicely and est gets the values
printf("a: %d\nb: %f\n", est.a, est.b);
}
But the same can't be done for arrays:
int arr[6];
arr = (int []){1, 2, 3, 4, 5};
This gives
error: assignment to expression with array type
What's more head breaking is that if there is an array in the struct, it still works.
struct Weird {
int arr[6];
}w;
int main() {
w = (struct Weird){{1, 2, 3, 4, 5}}; /* It works. The member arr gets all its elements
filled
*/
}
There seems to be something about the = operator not being valid with arrays after initializing. That is my theory.
Why is this and how can arrays be assigned after initializing?

initialization means initializing of a variable at declaration time. The following is correct and is supported by 'c':
int arr[6] = {1,2,3,4,5,6};
Assignment means assigning a value to a variable somwhere in the program after initialization. 'C' does not support assignment to a whole array. Arrays are very special members in 'c' and are treated in a special way which does not allow assignments.
However, you can always copy one array into another either by using a for loop or my using something like memcpy. Here is an example:
int arr[6], brr[6] = {1,2,3,4,5,6};
memcpy(arr,(int[6]){1,2,3,4,5},sizeof(int[6]));
BTW, the cast-like object from your example (int[]){1,2,3,4,5} is called 'compound literal' and is also allowed in c. Here it is used to initialize the parameter of the memcpy function.

Store coordinates in a union for two different ways of access [duplicate]

In the old days of pre-ISO C, the following code would have surprized nobody:
struct Point {
double x;
double y;
double z;
};
double dist(struct Point *p1, struct Point *p2) {
double d2 = 0;
double *coord1 = &p1.x;
double *coord2 = &p2.x;
int i;
for (i=0; i<3; i++) {
double d = coord2[i] - coord1[i]; // THE problem
d2 += d * d;
return sqrt(d2);
}
At that time, we all knew that alignment of double allowed the compiler to add no padding in struct Point, and we just assumed that pointer arithmetics would do the job.
Unfortunately, this problematic line uses pointer arithmetics (p[i] being by definition *(p + i)) outside of any array which is explicitely not allowed by the standard. Draft n1570 for C11 says in 6.5.6 additive operators §8:
When an expression that has integer type is added to or subtracted from a pointerpointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression...
As nothing is said when we have not two elements of the same array, it is unspecified by the standard and from there Undefined Behaviour (even if all common compilers are glad with it...)
Question:
As this idiom allowed to avoid code replication changing just x with y then z which is quite error prone, what could be a conformant way to browse the elements of a struct as if they were members of the same array?
Disclaimer: It obviously only applies to elements of same type, and padding can be detected with a simple static_assert as shown in that other question of mine, so padding, alignment and mixed types are not my problem here.

C does not define any way to specify that the compiler must not add padding between the named members of struct Point, but many compilers have an extension that would provide for that. If you use such an extension, or if you're just willing to assume that there will be no padding, then you could use a union with an anonymous inner struct, like so:
union Point {
struct {
double x;
double y;
double z;
};
double coords[3];
};
You can then access the coordinates by their individual names or via the coords array:
double dist(union Point *p1, union Point *p2) {
double *coord1 = p1->coords;
double *coord2 = p2->coords;
double d2 = 0;
for (int i = 0; i < 3; i++) {
double d = coord2[i] - coord1[i];
d2 += d * d;
}
return sqrt(d2);
}
int main(void) {
// Note: I don't think the inner braces are necessary, but they silence
// warnings from gcc 4.8.5:
union Point p1 = { { .x = .25, .y = 1, .z = 3 } };
union Point p2;
p2.x = 2.25;
p2.y = -1;
p2.z = 0;
printf("The distance is %lf\n", dist(&p1, &p2));
return 0;
}

This is mainly a complement to JohnBollinger's answer. Anonymous struct members do allow a clean and neat syntax, and C defines a union as a type consisting of a sequence
of members whose storage overlap (6.7.2.1 Structure and union specifiers §6). Accessing a member of a union is then specified in 6.5.2.3 Structure and union members:
3 A postfix expression followed by the . operator and an identifier designates a member of
a structure or union object. The value is that of the named member,95) and is an lvalue if
the first expression is an lvalue.
and the (non normative but informative) note 95 precises:
95) If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.
That means that for the current version of the standard the aliasing of a struct by an array with the help of anonymous struct member in a union is explicitly defined behaviour.

Aliasing struct and array the conformant way

In the old days of pre-ISO C, the following code would have surprized nobody:
struct Point {
double x;
double y;
double z;
};
double dist(struct Point *p1, struct Point *p2) {
double d2 = 0;
double *coord1 = &p1.x;
double *coord2 = &p2.x;
int i;
for (i=0; i<3; i++) {
double d = coord2[i] - coord1[i]; // THE problem
d2 += d * d;
return sqrt(d2);
}
At that time, we all knew that alignment of double allowed the compiler to add no padding in struct Point, and we just assumed that pointer arithmetics would do the job.
Unfortunately, this problematic line uses pointer arithmetics (p[i] being by definition *(p + i)) outside of any array which is explicitely not allowed by the standard. Draft n1570 for C11 says in 6.5.6 additive operators §8:
When an expression that has integer type is added to or subtracted from a pointerpointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression...
As nothing is said when we have not two elements of the same array, it is unspecified by the standard and from there Undefined Behaviour (even if all common compilers are glad with it...)
Question:
As this idiom allowed to avoid code replication changing just x with y then z which is quite error prone, what could be a conformant way to browse the elements of a struct as if they were members of the same array?
Disclaimer: It obviously only applies to elements of same type, and padding can be detected with a simple static_assert as shown in that other question of mine, so padding, alignment and mixed types are not my problem here.

C does not define any way to specify that the compiler must not add padding between the named members of struct Point, but many compilers have an extension that would provide for that. If you use such an extension, or if you're just willing to assume that there will be no padding, then you could use a union with an anonymous inner struct, like so:
union Point {
struct {
double x;
double y;
double z;
};
double coords[3];
};
You can then access the coordinates by their individual names or via the coords array:
double dist(union Point *p1, union Point *p2) {
double *coord1 = p1->coords;
double *coord2 = p2->coords;
double d2 = 0;
for (int i = 0; i < 3; i++) {
double d = coord2[i] - coord1[i];
d2 += d * d;
}
return sqrt(d2);
}
int main(void) {
// Note: I don't think the inner braces are necessary, but they silence
// warnings from gcc 4.8.5:
union Point p1 = { { .x = .25, .y = 1, .z = 3 } };
union Point p2;
p2.x = 2.25;
p2.y = -1;
p2.z = 0;
printf("The distance is %lf\n", dist(&p1, &p2));
return 0;
}

This is mainly a complement to JohnBollinger's answer. Anonymous struct members do allow a clean and neat syntax, and C defines a union as a type consisting of a sequence
of members whose storage overlap (6.7.2.1 Structure and union specifiers §6). Accessing a member of a union is then specified in 6.5.2.3 Structure and union members:
3 A postfix expression followed by the . operator and an identifier designates a member of
a structure or union object. The value is that of the named member,95) and is an lvalue if
the first expression is an lvalue.
and the (non normative but informative) note 95 precises:
95) If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.
That means that for the current version of the standard the aliasing of a struct by an array with the help of anonymous struct member in a union is explicitly defined behaviour.

"struct a a1 = {0};" different from "struct a a2 = {5};" why?

If struct a a1 = {0}; initializes all the elements (of different types) of a structure to zero, then struct a a2 = {5}; should initialize it to 5.. no?
#include <stdio.h>
typedef struct _a {
int i;
int j;
int k;
}a;
int main(void)
{
a a0;
a a1 = {0};
a a2 = {5};
printf("a0.i = %d \n", a0.i);
printf("a0.j = %d \n", a0.j);
printf("a0.k = %d \n", a0.k);
printf("a1.i = %d \n", a1.i);
printf("a1.j = %d \n", a1.j);
printf("a1.k = %d \n", a1.k);
printf("a2.i = %d \n", a2.i);
printf("a2.j = %d \n", a2.j);
printf("a2.k = %d \n", a2.k);
return 0;
}
The uninitialized struct contains garbage values
a0.i = 134513937
a0.j = 134513456
a0.k = 0
The initialized to 0 struct contains all elements initialized to 0
a1.i = 0
a1.j = 0
a1.k = 0
The initialized to 5 struct contains only the first element initialized to 5 and the rest of the elements initialized to 0.
a2.i = 5
a2.j = 0
a2.k = 0
Would a2.j and a2.k always guaranteed to initialize to 0 during a a2 = {5}; (or) is it an undefined behavior
OTOH, why am I not seeing all the elements of s2 initialized to 5. How is the struct initialization is done during {0} and how is it different when {5} is used?

Reference:
C99 Standard 6.7.8.21
If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
[EDIT]
Static objects and implicit initialization:
The storage duration of an object determines the lifetime of an object.
There are 3 storage durations:
static, automatic, and allocated
variables declared outside of all blocks and those explicitly declared with the static storage class specifier have static storage duration. Static variables are initialized to zero by default by the compiler.
Consider the following program:
#include<stdio.h>
int main()
{
int i;
static int j;
printf("i = [%d]",i);
printf("j = [%d]",j);
return 0;
}
In the above program, i has automatic storage and since it is not explicitly initialized its value is Undefined.
While j has static storage duration and it is guaranteed to be initialized to 0 by the compiler.

The omitted values will be always initialized to zero, because the standard says so. So you have essentially
struct a a1 = { 0, 0, 0 };
and
struct a a2 = { 5, 0, 0 };
which is of course different.

No. In C, if your initializer list is incomplete, all missing indices will be filled with 0. So this:
int a[3] = {0};
int b[3] = {5};
effectively becomes:
int a[3] = {0, 0, 0};
int b[3] = {5, 0, 0};
This is why it seemingly works with {0} but fails with {5}.

It doesn't work for
struct x {
int *y;
/* ... */
};
struct x xobj = {5};

Take a look at Designated Initializers in GCC documentation.

The behavior is exactly the same in both cases. If there are fewer initializers than there are elements in the aggregate, then the remaining elements are initialized as though they were declared static, meaning they'll be initialized to 0 or NULL.
It's just that in the first case, the explicit initializer has the same value as the implicit initializer.
If you want to initialize all the elements of your aggregate to something other than 0, then you will have to provide an explicit initializer for each of them, i.e.:
a a2 = {5, 5, 5};