Union and Struct Initialization

Union and Struct Initialization - c

I stumbled across a code based on unions in C. Here is the code:
union {
struct {
char ax[2];
char ab[2];
} s;
struct {
int a;
int b;
} st;
} u ={12, 1};
printf("%d %d", u.st.a, u.st.b);
I just couldn't understand how come the output was 268 0. How were the values initialized?
How is the union functioning here? Shouldn't the output be 12 1. It would be great if anyone could explain what exactly is happening here in detail.
I am using a 32 bit processor and on Windows 7.

The code doesn't do what you think. Brace-initializes initialize the first union member, i.e. u.s. However, now the initializer is incomplete and missing braces, since u.s contains two arrays. It should be somethink like: u = { { {'a', 'b'}, { 'c', 'd' } } };
You should always compile with all warnings, a decent compiler should have told you that something was amiss. For instance, GCC says, missing braces around initialiser (near initialisation for ‘u.s’) and missing initialiser (near initialisation for ‘u.s.ab’). Very helpful.
In C99 you can take advantage of named member initialization to initialize the second union member: u = { .st = {12, 1} }; (This is not possible in C++, by the way.) The corresponding syntax for the first case is `u = { .s = { {'a', 'b'}, { 'c', 'd' } } };, which is arguably more explicit and readable!

Your code uses the default initializer for the union, which is its first member. Both 12 and 1 go into the characters of ax, hence the result that you see (which is very much compiler-dependent).
If you wanted to initialize through the second memmber (st) you would use a designated initializer:
union {
struct {
char ax[2];
char ab[2];
} s;
struct {
int a;
int b;
} st;
} u ={ .st = {12, 1}};

The code sets u.s.ax[0] to 12 and u.s.ax[1] to 1. u.s.ax is overlayed onto u.st.a so the least-significant byte of u.st.a is set to 12 and the most-significant byte to 1 (so you must be running on a little-endian architecture) giving a value of 0x010C or 268.

A union's size is the maximum size of the largest element that composes the union. So in this case, your union type has a size of 8-bytes on a 32-bit platform where int types are 4-bytes each. The first member of the union, s, though, only takes up 2-bytes, and therefore overlaps with the first 2-bytes of the st.a member. Since you are on a little-endian system, that means that we're overlapping the two lower-order bytes of st.a. Thus, when you initialize the union as it's done with the values {12, 1}, you've only initialized the values in the two lower-order bytes of st.a ... this leaves the value of st.b initialized to 0. Thus when you attempt to print out the struct containing the two int rather than char members of the union, you end up with your results of 128 and 0.

It probably assigned { 12 ,1 } to the first 2 char in s.ax.
So in a 32bit int it's 1*256 + 12 = 268

Related

typecast structure member in C during assignment

Is there a way to type cast structure member during its initiation, for instance:
struct abc {
char k;
};
int main()
{
struct abc data[] = {.k= 'TI'};
}
Above wouldn't work since k is of type char, is there way to type caste this member k (to int) during its assignment to 'TI' ?

You don't need a cast here.
struct abc {
char k;
};
int main()
{
struct abc data[] = {.k= 'TI'};
}
Your object data is an array of struct abc. The initializer is for a single object of type struct abc.
If you want data to be a 1-element array, you can do this:
struct abc data[] = {{.k= 'TI'}};
or, if you want to be more explicit:
struct abc data[] = {[0] = {.k = 'TI'}};
That's valid code, but it's likely to trigger a warning. 'TI' is a multi-character constant, an odd feature of C that in my experience is used by accident more often than it's used deliberately. Its value is implementation-defined, and it's of type int.
Using gcc on my system, its value is 21577, or 0x5449, which happens to be ('T' << 8) + 'I'. Since data[0].k is a single byte, it can't hold that value. There's an implicit conversion from int to char that determines the value that will be stored (in this case, on my system, 73, which happens to be 'I').
A cast (not a "type cast") converts a value from one type to another. It doesn't change the type of an object. k is of type char, and that's not going to change unless you modify its declaration. Maybe you want to have struct abc { int k; };?
I can't help more without knowing what you're trying to do. Why are you using a multi-character constant? Why is k of type char?

No matter what you do to the value, it doesn't change the fact that the field cannot store that much information. You will need to change the type of k.

'TI' is wrong you probably mean string "TI"
If typecast mean see the sting "TI" as integer you need to use union - it is called type punning.
typedef union
{
char k[2]; //I do not want to store null terminating character
short int x;
}union_t;
int main(void)
{
union_t u = {.k = "TI"};
printf("u.k[0]=%c (0x%x), u.k[1]=%c (0x%x) u.x = %hd (0x%hx)\n", u.k[0], u.k[0], u.k[1], u.k[1], u.x, u.x);
}
https://godbolt.org/z/EhbKG5qnW

"...I was thinking if member K which of type char can be type caste to int ..."
As mentioned in comments, struct members are static and cannot be cast.
But given your willingness to cast (even though that will not work) why not start with a type that naturally accommodates multi-byte characters, i.e. wchar_t ?
If this is acceptable for your work, then when working with multi-byte char in C, you can do the following:
#include <wchar.h>
struct abc {
wchar_t k[2];//or k[3] depending on k will be used as a C string.
};
...
//inside function somewhere:
//assignment can be done as follows:
struct abc buf = {.k[0] = L'T', .k[1] = L'I'};
// Or:
struct abc buf = {.k = L"TI"};//Note, no null terminator, so not C string
Note: regarding 2nd method, by definition, L"TI" contains three characters, T, I and \0. If k is to be used as a string, k must be defined with space for the null terminator: wchar_t k[3];.
(See examples here.)
This results in the following:
(where 84 and 73 are the ASCII values for T and I respectively.)
Note: compiled using GNU GCC, set to follow C99 rules

switch have error "expected a type specifier" (C) [duplicate]

I want to store mixed data types in an array. How could one do that?

You can make the array elements a discriminated union, aka tagged union.
struct {
enum { is_int, is_float, is_char } type;
union {
int ival;
float fval;
char cval;
} val;
} my_array[10];
The type member is used to hold the choice of which member of the union is should be used for each array element. So if you want to store an int in the first element, you would do:
my_array[0].type = is_int;
my_array[0].val.ival = 3;
When you want to access an element of the array, you must first check the type, then use the corresponding member of the union. A switch statement is useful:
switch (my_array[n].type) {
case is_int:
// Do stuff for integer, using my_array[n].ival
break;
case is_float:
// Do stuff for float, using my_array[n].fval
break;
case is_char:
// Do stuff for char, using my_array[n].cvar
break;
default:
// Report an error, this shouldn't happen
}
It's left up to the programmer to ensure that the type member always corresponds to the last value stored in the union.

Use a union:
union {
int ival;
float fval;
void *pval;
} array[10];
You will have to keep track of the type of each element, though.

Array elements need to have the same size, that is why it's not possible. You could work around it by creating a variant type:
#include <stdio.h>
#define SIZE 3
typedef enum __VarType {
V_INT,
V_CHAR,
V_FLOAT,
} VarType;
typedef struct __Var {
VarType type;
union {
int i;
char c;
float f;
};
} Var;
void var_init_int(Var *v, int i) {
v->type = V_INT;
v->i = i;
}
void var_init_char(Var *v, char c) {
v->type = V_CHAR;
v->c = c;
}
void var_init_float(Var *v, float f) {
v->type = V_FLOAT;
v->f = f;
}
int main(int argc, char **argv) {
Var v[SIZE];
int i;
var_init_int(&v[0], 10);
var_init_char(&v[1], 'C');
var_init_float(&v[2], 3.14);
for( i = 0 ; i < SIZE ; i++ ) {
switch( v[i].type ) {
case V_INT : printf("INT %d\n", v[i].i); break;
case V_CHAR : printf("CHAR %c\n", v[i].c); break;
case V_FLOAT: printf("FLOAT %f\n", v[i].f); break;
}
}
return 0;
}
The size of the element of the union is the size of the largest element, 4.

There's a different style of defining the tag-union (by whatever name) that IMO make it much nicer to use, by removing the internal union. This is the style used in the X Window System for things like Events.
The example in Barmar's answer gives the name val to the internal union. The example in Sp.'s answer uses an anonymous union to avoid having to specify the .val. every time you access the variant record. Unfortunately "anonymous" internal structs and unions is not available in C89 or C99. It's a compiler extension, and therefore inherently non-portable.
A better way IMO is to invert the whole definition. Make each data type its own struct, and put the tag (type specifier) into each struct.
typedef struct {
int tag;
int val;
} integer;
typedef struct {
int tag;
float val;
} real;
Then you wrap these in a top-level union.
typedef union {
int tag;
integer int_;
real real_;
} record;
enum types { INVALID, INT, REAL };
Now it may appear that we're repeating ourselves, and we are. But consider that this definition is likely to be isolated to a single file. But we've eliminated the noise of specifiying the intermediate .val. before you get to the data.
record i;
i.tag = INT;
i.int_.val = 12;
record r;
r.tag = REAL;
r.real_.val = 57.0;
Instead, it goes at the end, where it's less obnoxious. :D
Another thing this allows is a form of inheritance. Edit: this part is not standard C, but uses a GNU extension.
if (r.tag == INT) {
integer x = r;
x.val = 36;
} else if (r.tag == REAL) {
real x = r;
x.val = 25.0;
}
integer g = { INT, 100 };
record rg = g;
Up-casting and down-casting.
Edit: One gotcha to be aware of is if you're constructing one of these with C99 designated initializers. All member initializers should be through the same union member.
record problem = { .tag = INT, .int_.val = 3 };
problem.tag; // may not be initialized
The .tag initializer can be ignored by an optimizing compiler, because the .int_ initializer that follows aliases the same data area. Even though we know the layout (!), and it should be ok. No, it ain't. Use the "internal" tag instead (it overlays the outer tag, just like we want, but doesn't confuse the compiler).
record not_a_problem = { .int_.tag = INT, .int_.val = 3 };
not_a_problem.tag; // == INT

You can do a void * array, with a separated array of size_t. But you lose the information type.
If you need to keep information type in some way keep a third array of int (where the int is an enumerated value) Then code the function that casts depending on the enum value.

Union is the standard way to go. But you have other solutions as well. One of those is tagged pointer, which involves storing more information in the "free" bits of a pointer.
Depending on architectures you can use the low or high bits, but the safest and most portable way is using the unused low bits by taking the advantage of aligned memory. For example in 32-bit and 64-bit systems, pointers to int must be multiples of 4 (assuming int is a 32-bit type) and the 2 least significant bits must be 0, hence you can use them to store the type of your values. Of course you need to clear the tag bits before dereferencing the pointer. For example if your data type is limited to 4 different types then you can use it like below
void* tp; // tagged pointer
enum { is_int, is_double, is_char_p, is_char } type;
// ...
uintptr_t addr = (uintptr_t)tp & ~0x03; // clear the 2 low bits in the pointer
switch ((uintptr_t)tp & 0x03) // check the tag (2 low bits) for the type
{
case is_int: // data is int
printf("%d\n", *((int*)addr));
break;
case is_double: // data is double
printf("%f\n", *((double*)addr));
break;
case is_char_p: // data is char*
printf("%s\n", (char*)addr);
break;
case is_char: // data is char
printf("%c\n", *((char*)addr));
break;
}
If you can make sure that the data is 8-byte aligned (like for pointers in 64-bit systems, or long long and uint64_t...), you'll have one more bit for the tag.
This has one disadvantage that you'll need more memory if the data have not been stored in a variable elsewhere. Therefore in case the type and range of your data is limited, you can store the values directly in the pointer. This technique has been used in the 32-bit version of Chrome's V8 engine, where it checks the least significant bit of the address to see if that's a pointer to another object (like double, big integers, string or some object) or a 31-bit signed value (called smi - small integer). If it's an int, Chrome simply does an arithmetic right shift 1 bit to get the value, otherwise the pointer is dereferenced.
On most current 64-bit systems the virtual address space is still much narrower than 64 bits, hence the high most significant bits can also be used as tags. Depending on the architecture you have different ways to use those as tags. ARM, 68k and many others can be configured to ignore the top bits, allowing you to use them freely without worrying about segfault or anything. From the linked Wikipedia article above:
A significant example of the use of tagged pointers is the Objective-C runtime on iOS 7 on ARM64, notably used on the iPhone 5S. In iOS 7, virtual addresses are 33 bits (byte-aligned), so word-aligned addresses only use 30 bits (3 least significant bits are 0), leaving 34 bits for tags. Objective-C class pointers are word-aligned, and the tag fields are used for many purposes, such as storing a reference count and whether the object has a destructor.
Early versions of MacOS used tagged addresses called Handles to store references to data objects. The high bits of the address indicated whether the data object was locked, purgeable, and/or originated from a resource file, respectively. This caused compatibility problems when MacOS addressing advanced from 24 bits to 32 bits in System 7.
https://en.wikipedia.org/wiki/Tagged_pointer#Examples
On x86_64 you can still use the high bits as tags with care. Of course you don't need to use all those 16 bits and can leave out some bits for future proof
In prior versions of Mozilla Firefox they also use small integer optimizations like V8, with the 3 low bits used to store the type (int, string, object... etc.). But since JägerMonkey they took another path (Mozilla’s New JavaScript Value Representation, backup link). The value is now always stored in a 64-bit double precision variable. When the double is a normalized one, it can be used directly in calculations. However if the high 16 bits of it are all 1s, which denote an NaN, the low 32-bits will store the address (in a 32-bit computer) to the value or the value directly, the remaining 16-bits will be used to store the type. This technique is called NaN-boxing or nun-boxing. It's also used in 64-bit WebKit's JavaScriptCore and Mozilla's SpiderMonkey with the pointer being stored in the low 48 bits. If your main data type is floating-point, this is the best solution and delivers very good performance.
Read more about the above techniques: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations

Copying an array in a designated initializer

I'm trying to initialize a const struct with a designated initializer. However, one of the struct elements is a fixed-width array. I already have the contents I would like to initialize the array with in another fixed-width array of appropriate size.
Is there any way to do this with a designated initializer? A simple (failing example) of what I'm trying to accomplish is demonstrated below.
struct foo {
uint8_t array1[4];
uint8_t array2[4];
}
uint8_t array[4] = {
1, 2, 3, 4
};
struct foo const bar = {
.array1 = array, // incompatible pointer to integer conversion
.array2 = { *array } // only copies the first element
};

Short answer: you can't. C does not copy arrays (without the use of (standard library-)functions). The warnings come from the fact that you cannot assign an array as a whole, even when they are static or constant. When an array is used as an r-value in an assignment it decays to a pointer and thus cannot be assigned to another array (as a whole).
The easiest way to go would be to use memcpy, but obviously that must be inside a function.

If bar has global scope, or is declared static, then you won't be able use designated initializers to initialize from non-immediate values, regardless of whether or not the members in question are arrays.
However, if:
bar is declared on the stack of some function, and
Your fixed-size array really does only have 4 elements,
then you might be able to get away with something like this:
#include <stdio.h>
#include <stdint.h>
struct foo {
uint8_t array1[4];
uint8_t array2[4];
};
#define ARRAY_INIT(a) { a[0], a[1], a[2], a[3] }
int main (int argc, char **argv) {
uint8_t arr_init[4] = {
1, 2, 3, 4
};
struct foo const bar = {
.array1 = ARRAY_INIT(arr_init),
.array2 = ARRAY_INIT(arr_init),
};
printf("%d, %d\n", bar.array1[0], bar.array2[3]);
return (0);
}
The initializer array must appear before what is being initialized in the stack frame. Or it may come from a function parameter.
Of course if your array is much bigger than this, then using a macro like this will get very messy indeed.

While you may not be able to initialize the array by copying from another array, it may be helpful to use a preprocessor macro:
#define ARRAY_INIT {1, 2, 3, 4}
struct foo const bar = {
.array1 = ARRAY_INIT,
.array2 = ARRAY_INIT
};

How can mixed data types (int, float, char, etc) be stored in an array?

I want to store mixed data types in an array. How could one do that?

You can make the array elements a discriminated union, aka tagged union.
struct {
enum { is_int, is_float, is_char } type;
union {
int ival;
float fval;
char cval;
} val;
} my_array[10];
The type member is used to hold the choice of which member of the union is should be used for each array element. So if you want to store an int in the first element, you would do:
my_array[0].type = is_int;
my_array[0].val.ival = 3;
When you want to access an element of the array, you must first check the type, then use the corresponding member of the union. A switch statement is useful:
switch (my_array[n].type) {
case is_int:
// Do stuff for integer, using my_array[n].ival
break;
case is_float:
// Do stuff for float, using my_array[n].fval
break;
case is_char:
// Do stuff for char, using my_array[n].cvar
break;
default:
// Report an error, this shouldn't happen
}
It's left up to the programmer to ensure that the type member always corresponds to the last value stored in the union.

Use a union:
union {
int ival;
float fval;
void *pval;
} array[10];
You will have to keep track of the type of each element, though.

Array elements need to have the same size, that is why it's not possible. You could work around it by creating a variant type:
#include <stdio.h>
#define SIZE 3
typedef enum __VarType {
V_INT,
V_CHAR,
V_FLOAT,
} VarType;
typedef struct __Var {
VarType type;
union {
int i;
char c;
float f;
};
} Var;
void var_init_int(Var *v, int i) {
v->type = V_INT;
v->i = i;
}
void var_init_char(Var *v, char c) {
v->type = V_CHAR;
v->c = c;
}
void var_init_float(Var *v, float f) {
v->type = V_FLOAT;
v->f = f;
}
int main(int argc, char **argv) {
Var v[SIZE];
int i;
var_init_int(&v[0], 10);
var_init_char(&v[1], 'C');
var_init_float(&v[2], 3.14);
for( i = 0 ; i < SIZE ; i++ ) {
switch( v[i].type ) {
case V_INT : printf("INT %d\n", v[i].i); break;
case V_CHAR : printf("CHAR %c\n", v[i].c); break;
case V_FLOAT: printf("FLOAT %f\n", v[i].f); break;
}
}
return 0;
}
The size of the element of the union is the size of the largest element, 4.

There's a different style of defining the tag-union (by whatever name) that IMO make it much nicer to use, by removing the internal union. This is the style used in the X Window System for things like Events.
The example in Barmar's answer gives the name val to the internal union. The example in Sp.'s answer uses an anonymous union to avoid having to specify the .val. every time you access the variant record. Unfortunately "anonymous" internal structs and unions is not available in C89 or C99. It's a compiler extension, and therefore inherently non-portable.
A better way IMO is to invert the whole definition. Make each data type its own struct, and put the tag (type specifier) into each struct.
typedef struct {
int tag;
int val;
} integer;
typedef struct {
int tag;
float val;
} real;
Then you wrap these in a top-level union.
typedef union {
int tag;
integer int_;
real real_;
} record;
enum types { INVALID, INT, REAL };
Now it may appear that we're repeating ourselves, and we are. But consider that this definition is likely to be isolated to a single file. But we've eliminated the noise of specifiying the intermediate .val. before you get to the data.
record i;
i.tag = INT;
i.int_.val = 12;
record r;
r.tag = REAL;
r.real_.val = 57.0;
Instead, it goes at the end, where it's less obnoxious. :D
Another thing this allows is a form of inheritance. Edit: this part is not standard C, but uses a GNU extension.
if (r.tag == INT) {
integer x = r;
x.val = 36;
} else if (r.tag == REAL) {
real x = r;
x.val = 25.0;
}
integer g = { INT, 100 };
record rg = g;
Up-casting and down-casting.
Edit: One gotcha to be aware of is if you're constructing one of these with C99 designated initializers. All member initializers should be through the same union member.
record problem = { .tag = INT, .int_.val = 3 };
problem.tag; // may not be initialized
The .tag initializer can be ignored by an optimizing compiler, because the .int_ initializer that follows aliases the same data area. Even though we know the layout (!), and it should be ok. No, it ain't. Use the "internal" tag instead (it overlays the outer tag, just like we want, but doesn't confuse the compiler).
record not_a_problem = { .int_.tag = INT, .int_.val = 3 };
not_a_problem.tag; // == INT

You can do a void * array, with a separated array of size_t. But you lose the information type.
If you need to keep information type in some way keep a third array of int (where the int is an enumerated value) Then code the function that casts depending on the enum value.

sizeof abuse : get the size of a const table

When declaring a const table, it is possible to get the size of the table using sizeof. However,
once you stop using the symbol name, it does not work anymore. is there a way to have the following program output the correct size for table A, instead of 0 ?
#include <stdio.h>
struct mystruct {
int a;
short b;
};
const struct mystruct tableA[] ={
{
.a = 1,
.b = 2,
},
{
.a = 2,
.b = 2,
},
{
.a = 3,
.b = 2,
},
};
const struct mystruct tableB[] ={
{
.a = 1,
.b = 2,
},
{
.a = 2,
.b = 2,
},
};
int main(int argc, char * argv[]) {
int tbl_sz;
const struct mystruct * table;
table = tableA;
tbl_sz = sizeof(table)/sizeof(struct mystruct);
printf("size of table A : %d\n", tbl_sz);
table = tableB;
tbl_sz = sizeof(tableB)/sizeof(struct mystruct);
printf("size of table B : %d\n", tbl_sz);
return 0;
}
Output is :
size of table A : 0
size of table B : 2
This is the intended behavior of sizeof. But is there a way for a compiler to know the size of a const table, given a pointer to the table instead of the symbol name ?

you are asking for the sizeof a pointer. That is always the pointer size (ie usually 4 bytes on a 32-bit machine and 8 bytes on a 64-bit machine). In the 2nd attempt you are asking for the sizeof the array and hence you get the result you'd expect.

Is there a way for a compiler to know the size of a const table, given a pointer to the table instead of the symbol name?
No, because sizeof() is evaluated at compile-time (unless it is a VLA, but a VLA is not a constant table), and the compiler cannot, in general, tell which table the pointer is pointing to. Granted, in the scenario shown, it might be possible in some hypothetical variation of the C language, but that would mean varying definitions of what sizeof() returns, which would be a bigger problem than not getting the answer you might like but do not get.
So, as everyone else ably pointed out, when you take the size of a pointer, you get the size of the pointer. Assuming a standard 32-bit machine since the results are consistent with that assumption, your structure is 8 bytes and your pointers are 4 bytes, so the result of the division is zero, as expected.

No - you're asking for the sizeof() a pointer. But since what you're really trying to get is the number of elements in an array, you can use a macro that will return that value but will generally give you an error if you pass a pointer instead of an array:
#define COUNT_OF(x) ((sizeof(x)/sizeof(0[x])) / ((size_t)(!(sizeof(x) % sizeof(0[x])))))
See this SO answer for more details: Is there a standard function in C that would return the length of an array?
For an even safer solution when using C++ instead of C, see this SO answer that uses templates to ensure that trying to get an array count on a pointer will always generate an error: Compile time sizeof_array without using a macro

Short answer is no; if all you have is a pointer, then there's no (standard) way to get the size of the thing being pointed to through that pointer.

Although syntactically correct, your sample is more conventionally written as:
const struct mystruct tableA[] = {
{1, 2},
{2, 2},
{3, 3},
};
Which is less verbose and therefore more readable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight