When should unions be used? Why do we need them?
Unions are often used to convert between the binary representations of integers and floats:
union
{
int i;
float f;
} u;
// Convert floating-point bits to integer:
u.f = 3.14159f;
printf("As integer: %08x\n", u.i);
Although this is technically undefined behavior according to the C standard (you're only supposed to read the field which was most recently written), it will act in a well-defined manner in virtually any compiler.
Unions are also sometimes used to implement pseudo-polymorphism in C, by giving a structure some tag indicating what type of object it contains, and then unioning the possible types together:
enum Type { INTS, FLOATS, DOUBLE };
struct S
{
Type s_type;
union
{
int s_ints[2];
float s_floats[2];
double s_double;
};
};
void do_something(struct S *s)
{
switch(s->s_type)
{
case INTS: // do something with s->s_ints
break;
case FLOATS: // do something with s->s_floats
break;
case DOUBLE: // do something with s->s_double
break;
}
}
This allows the size of struct S to be only 12 bytes, instead of 28.
Unions are particularly useful in Embedded programming or in situations where direct access to the hardware/memory is needed. Here is a trivial example:
typedef union
{
struct {
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
unsigned char byte4;
} bytes;
unsigned int dword;
} HW_Register;
HW_Register reg;
Then you can access the reg as follows:
reg.dword = 0x12345678;
reg.bytes.byte3 = 4;
Endianness (byte order) and processor architecture are of course important.
Another useful feature is the bit modifier:
typedef union
{
struct {
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char reserved:4;
} bits;
unsigned char byte;
} HW_RegisterB;
HW_RegisterB reg;
With this code you can access directly a single bit in the register/memory address:
x = reg.bits.b2;
Low level system programming is a reasonable example.
IIRC, I've used unions to breakdown hardware registers into the component bits. So, you can access an 8-bit register (as it was, in the day I did this ;-) into the component bits.
(I forget the exact syntax but...) This structure would allow a control register to be accessed as a control_byte or via the individual bits. It would be important to ensure the bits map on to the correct register bits for a given endianness.
typedef union {
unsigned char control_byte;
struct {
unsigned int nibble : 4;
unsigned int nmi : 1;
unsigned int enabled : 1;
unsigned int fired : 1;
unsigned int control : 1;
};
} ControlRegister;
I've seen it in a couple of libraries as a replacement for object oriented inheritance.
E.g.
Connection
/ | \
Network USB VirtualConnection
If you want the Connection "class" to be either one of the above, you could write something like:
struct Connection
{
int type;
union
{
struct Network network;
struct USB usb;
struct Virtual virtual;
}
};
Example use in libinfinity: http://git.0x539.de/?p=infinote.git;a=blob;f=libinfinity/common/inf-session.c;h=3e887f0d63bd754c6b5ec232948027cbbf4d61fc;hb=HEAD#l74
Unions allow data members which are mutually exclusive to share the same memory. This is quite important when memory is more scarce, such as in embedded systems.
In the following example:
union {
int a;
int b;
int c;
} myUnion;
This union will take up the space of a single int, rather than 3 separate int values. If the user set the value of a, and then set the value of b, it would overwrite the value of a since they are both sharing the same memory location.
Lots of usages. Just do grep union /usr/include/* or in similar directories. Most of the cases the union is wrapped in a struct and one member of the struct tells which element in the union to access. For example checkout man elf for real life implementations.
This is the basic principle:
struct _mydata {
int which_one;
union _data {
int a;
float b;
char c;
} foo;
} bar;
switch (bar.which_one)
{
case INTEGER : /* access bar.foo.a;*/ break;
case FLOATING : /* access bar.foo.b;*/ break;
case CHARACTER: /* access bar.foo.c;*/ break;
}
Here's an example of a union from my own codebase (from memory and paraphrased so it may not be exact). It was used to store language elements in an interpreter I built. For example, the following code:
set a to b times 7.
consists of the following language elements:
symbol[set]
variable[a]
symbol[to]
variable[b]
symbol[times]
constant[7]
symbol[.]
Language elements were defines as '#define' values thus:
#define ELEM_SYM_SET 0
#define ELEM_SYM_TO 1
#define ELEM_SYM_TIMES 2
#define ELEM_SYM_FULLSTOP 3
#define ELEM_VARIABLE 100
#define ELEM_CONSTANT 101
and the following structure was used to store each element:
typedef struct {
int typ;
union {
char *str;
int val;
}
} tElem;
then the size of each element was the size of the maximum union (4 bytes for the typ and 4 bytes for the union, though those are typical values, the actual sizes depend on the implementation).
In order to create a "set" element, you would use:
tElem e;
e.typ = ELEM_SYM_SET;
In order to create a "variable[b]" element, you would use:
tElem e;
e.typ = ELEM_VARIABLE;
e.str = strdup ("b"); // make sure you free this later
In order to create a "constant[7]" element, you would use:
tElem e;
e.typ = ELEM_CONSTANT;
e.val = 7;
and you could easily expand it to include floats (float flt) or rationals (struct ratnl {int num; int denom;}) and other types.
The basic premise is that the str and val are not contiguous in memory, they actually overlap, so it's a way of getting a different view on the same block of memory, illustrated here, where the structure is based at memory location 0x1010 and integers and pointers are both 4 bytes:
+-----------+
0x1010 | |
0x1011 | typ |
0x1012 | |
0x1013 | |
+-----+-----+
0x1014 | | |
0x1015 | str | val |
0x1016 | | |
0x1017 | | |
+-----+-----+
If it were just in a structure, it would look like this:
+-------+
0x1010 | |
0x1011 | typ |
0x1012 | |
0x1013 | |
+-------+
0x1014 | |
0x1015 | str |
0x1016 | |
0x1017 | |
+-------+
0x1018 | |
0x1019 | val |
0x101A | |
0x101B | |
+-------+
I'd say it makes it easier to reuse memory that might be used in different ways, i.e. saving memory. E.g. you'd like to do some "variant" struct that's able to save a short string as well as a number:
struct variant {
int type;
double number;
char *string;
};
In a 32 bit system this would result in at least 96 bits or 12 bytes being used for each instance of variant.
Using an union you can reduce the size down to 64 bits or 8 bytes:
struct variant {
int type;
union {
double number;
char *string;
} value;
};
You're able to save even more if you'd like to add more different variable types etc. It might be true, that you can do similar things casting a void pointer - but the union makes it a lot more accessible as well as type safe. Such savings don't sound massive, but you're saving one third of the memory used for all instances of this struct.
Many of these answers deal with casting from one type to another. I get the most use from unions with the same types just more of them (ie when parsing a serial data stream). They allow the parsing / construction of a framed packet to become trivial.
typedef union
{
UINT8 buffer[PACKET_SIZE]; // Where the packet size is large enough for
// the entire set of fields (including the payload)
struct
{
UINT8 size;
UINT8 cmd;
UINT8 payload[PAYLOAD_SIZE];
UINT8 crc;
} fields;
}PACKET_T;
// This should be called every time a new byte of data is ready
// and point to the packet's buffer:
// packet_builder(packet.buffer, new_data);
void packet_builder(UINT8* buffer, UINT8 data)
{
static UINT8 received_bytes = 0;
// All range checking etc removed for brevity
buffer[received_bytes] = data;
received_bytes++;
// Using the struc only way adds lots of logic that relates "byte 0" to size
// "byte 1" to cmd, etc...
}
void packet_handler(PACKET_T* packet)
{
// Process the fields in a readable manner
if(packet->fields.size > TOO_BIG)
{
// handle error...
}
if(packet->fields.cmd == CMD_X)
{
// do stuff..
}
}
Edit
The comment about endianness and struct padding are valid, and great, concerns. I have used this body of code almost entirely in embedded software, most of which I had control of both ends of the pipe.
It's difficult to think of a specific occasion when you'd need this type of flexible structure, perhaps in a message protocol where you would be sending different sizes of messages, but even then there are probably better and more programmer friendly alternatives.
Unions are a bit like variant types in other languages - they can only hold one thing at a time, but that thing could be an int, a float etc. depending on how you declare it.
For example:
typedef union MyUnion MYUNION;
union MyUnion
{
int MyInt;
float MyFloat;
};
MyUnion will only contain an int OR a float, depending on which you most recently set. So doing this:
MYUNION u;
u.MyInt = 10;
u now holds an int equal to 10;
u.MyFloat = 1.0;
u now holds a float equal to 1.0. It no longer holds an int. Obviously now if you try and do printf("MyInt=%d", u.MyInt); then you're probably going to get an error, though I'm unsure of the specific behaviour.
The size of the union is dictated by the size of its largest field, in this case the float.
Unions are used when you want to model structs defined by hardware, devices or network protocols, or when you're creating a large number of objects and want to save space. You really don't need them 95% of the time though, stick with easy-to-debug code.
In school, I used unions like this:
typedef union
{
unsigned char color[4];
int new_color;
} u_color;
I used it to handle colors more easily, instead of using >> and << operators, I just had to go through the different index of my char array.
union are used to save memory, especially used on devices with limited memory where memory is important.
Exp:
union _Union{
int a;
double b;
char c;
};
For example,let's say we need the above 3 data types(int,double,char) in a system where memory is limited.If we don't use "union",we need to define these 3 data types. In this case sizeof(a) + sizeof(b) + sizeof(c) memory space will be allocated.But if we use onion,only one memory space will be allocated according to the largest data t ype in these 3 data types.Because all variables in union structure will use the same memory space. Hence the memory space allocated accroding to the largest data type will be common space for all variables.
For example:
union _Union{
int a;
double b;
char c;
};
int main() {
union _Union uni;
uni.a = 44;
uni.b = 144.5;
printf("a:%d\n",uni.a);
printf("b:%lf\n",uni.b);
return 0;
}
Output is:
a: 0
and b:144.500000
Why a is zero?. Because union structure has only one memory area and all data structures use it in common. So the last assigned value overwrites the old one.
One more example:
union _Union{
char name[15];
int id;
};
int main(){
union _Union uni;
char choice;
printf("YOu can enter name or id value.");
printf("Do you want to enter the name(y or n):");
scanf("%c",&choice);
if(choice == 'Y' || choice == 'y'){
printf("Enter name:");
scanf("%s",uni.name);
printf("\nName:%s",uni.name);
}else{
printf("Enter Id:");
scanf("%d",&uni.id);
printf("\nId:%d",uni.id);
}
return 0;
}
Note:Size of the union is the size of its largest field because sufficient number of bytes must be reserved to store the larges sized field.
In early versions of C, all structure declarations would share a common set of fields. Given:
struct x {int x_mode; int q; float x_f};
struct y {int y_mode; int q; int y_l};
struct z {int z_mode; char name[20];};
a compiler would essentially produce a table of structures' sizes (and possibly alignments), and a separate table of structures' members' names, types, and offsets. The compiler didn't keep track of which members belonged to which structures, and would allow two structures to have a member with the same name only if the type and offset matched (as with member q of struct x and struct y). If p was a pointer to any structure type, p->q would add the offset of "q" to pointer p and fetch an "int" from the resulting address.
Given the above semantics, it was possible to write a function that could perform some useful operations on multiple kinds of structure interchangeably, provided that all the fields used by the function lined up with useful fields within the structures in question. This was a useful feature, and changing C to validate members used for structure access against the types of the structures in question would have meant losing it in the absence of a means of having a structure that can contain multiple named fields at the same address. Adding "union" types to C helped fill that gap somewhat (though not, IMHO, as well as it should have been).
An essential part of unions' ability to fill that gap was the fact that a pointer to a union member could be converted into a pointer to any union containing that member, and a pointer to any union could be converted to a pointer to any member. While the C89 Standard didn't expressly say that casting a T* directly to a U* was equivalent to casting it to a pointer to any union type containing both T and U, and then casting that to U*, no defined behavior of the latter cast sequence would be affected by the union type used, and the Standard didn't specify any contrary semantics for a direct cast from T to U. Further, in cases where a function received a pointer of unknown origin, the behavior of writing an object via T*, converting the T* to a U*, and then reading the object via U* would be equivalent to writing a union via member of type T and reading as type U, which would be standard-defined in a few cases (e.g. when accessing Common Initial Sequence members) and Implementation-Defined (rather than Undefined) for the rest. While it was rare for programs to exploit the CIS guarantees with actual objects of union type, it was far more common to exploit the fact that pointers to objects of unknown origin had to behave like pointers to union members and have the behavioral guarantees associated therewith.
Unions are great. One clever use of unions I've seen is to use them when defining an event. For example, you might decide that an event is 32 bits.
Now, within that 32 bits, you might like to designate the first 8 bits as for an identifier of the sender of the event... Sometimes you deal with the event as a whole, sometimes you dissect it and compare it's components. unions give you the flexibility to do both.
union Event
{
unsigned long eventCode;
unsigned char eventParts[4];
};
What about VARIANT that is used in COM interfaces? It has two fields - "type" and a union holding an actual value that is treated depending on "type" field.
I used union when I was coding for embedded devices. I have C int that is 16 bit long. And I need to retrieve the higher 8 bits and the lower 8 bits when I need to read from/store to EEPROM. So I used this way:
union data {
int data;
struct {
unsigned char higher;
unsigned char lower;
} parts;
};
It doesn't require shifting so the code is easier to read.
On the other hand, I saw some old C++ stl code that used union for stl allocator. If you are interested, you can read the sgi stl source code. Here is a piece of it:
union _Obj {
union _Obj* _M_free_list_link;
char _M_client_data[1]; /* The client sees this. */
};
A file containing different record types.
A network interface containing different request types.
Take a look at this: X.25 buffer command handling
One of the many possible X.25 commands is received into a buffer and handled in place by using a UNION of all the possible structures.
A simple and very usefull example, is....
Imagine:
you have a uint32_t array[2] and want to access the 3rd and 4th Byte of the Byte chain.
you could do *((uint16_t*) &array[1]).
But this sadly breaks the strict aliasing rules!
But known compilers allow you to do the following :
union un
{
uint16_t array16[4];
uint32_t array32[2];
}
technically this is still a violation of the rules. but all known standards support this usage.
Use a union when you have some function where you return a value that can be different depending on what the function did.
Related
I have a code like this
typedef struct
{
unsigned char arr[15]; //size = 15bytes
unsigned char str_cks; //size = 1byte
}iamstruct; //Total Size = 16bytes
typedef union
{
iamstruct var;
unsigned char union_cks[16];
}iamunion; //Total Size = 16bytes
static iamunion var[2];
int main()
{
printf("The size of struct is %d\n",sizeof(iamstruct)); //Output = 16
printf("The size of union is %d\n",sizeof(iamunion)); //Output = 16
var[1].union_cks[1] = 2;
printf("%d",var[1].union_cks[1] ); // Output =2
return 0;
}
I'm confused with struct variable declaration inside the union and how it works?.
What is the main purpose of doing this & How it improves accessibility?
Please share your ideas.
Thanks in advance.
I understood something now from the below code. Here memory allocated is 16bytes and its all shared by an individual member of union.
typedef struct
{
unsigned char str_cks1;
unsigned char str_cks2;
unsigned char str_cks3;
unsigned char str_cks4;
unsigned char str_cks5;
unsigned char str_cks6;
unsigned char str_cks7;
}iamstruct;
typedef union
{
iamstruct var;
unsigned char union_cks[7];
}iamunion;
static iamunion var[7];
int main()
{
int i = 0;
printf("The size of struct is %d\n",sizeof(iamstruct));
for(i=0;i<7;i++)
{
var[i].var.str_cks1 = (i*1);
var[i].var.str_cks2 = (i*2);
var[i].var.str_cks3 = (i*3);
var[i].var.str_cks4 = (i*4);
var[i].var.str_cks5 = (i*5);
var[i].var.str_cks6 = (i*6);
var[i].var.str_cks7 = (i*7);
}
for(i=0;i<7;i++)
{
printf("%d\t",var[i].var.str_cks1);
printf("%d\t",var[i].var.str_cks2);
printf("%d\t",var[i].var.str_cks3);
printf("%d\t",var[i].var.str_cks4);
printf("%d\t",var[i].var.str_cks5);
printf("%d\t",var[i].var.str_cks6);
printf("%d\t",var[i].var.str_cks7);
printf("\n");
}
return 0;
}
Output:
enter image description here
A struct represents the values corresponding to the cartesian product o the types of all its fields. These values are the ordered concatenation of the field values and all are present in every single struct value. In this sense, you'll see all the values in the fields as a tuple, or sequence, of values, each of the type of the field they represent.
On the contrary, a union represents an alternative of every field inside, so the whole set of values of the type is the plain union of each of the types inside it.
So, composition (of union and struct) ensures that you can set an ordered sequence of values(struct) or see the struct field as a single alternative to the union. Easy, right! :) (you can also have a union as a field of a struct, meaning this time that the field in the sequence is an alternative of possible sets of values.
Let's see it with an example. Let's assume you have a variable which is supposed to store Real or Complex values. For Real you just use a plain float value (I will over complex this on purpose, to see how it expands) and for Complex we'll use it two double values (this selection has nothing to do with the other alternative and the lose of precision of a float against a double) You can use:
struct complex {
double real_part, imaginary_part;
};
and then
union {
float real_number;
struct complex complex_number;
} my_variable;
then you can access my_variable.real_number as the single precision float value, and my_variable.complex_number.real_part and my_variable.complex_number.imaginary_part as the double precision double real part and imaginary part of a complex number.
Beware that this is not a way to convert values from real to complex or viceversa. Indeed, in this example, both types of values have different representation internally, and you'll mangle your data if you store a single precision float real number on the variable and try to access it as a complex number (you'll have to externally manage the kind of value you have stored in the variable in order to know how to access it) The set of values storable in the variable will be the whole set of float values for real numbers, plus(or also) the whole set of double pairs or real parts and imaginary parts that conform the complex numbers. This is where the union reserved word was taken from.
It is important to consider that a type represents the set of values storable in a variable of that type. In this way, a struct allows you to store a value of each of the types that the fields represent, and you can store all of them at the same time on the variable, while a union only allows you to decide which type (and which field) you'll use to store only a single value of any of those field alternatives, and no more than one.
In the C programming language described in the second edition of K&R's book, a struct or union is a sequence of bytes, and a struct or union member is a means of interpreting some of the storage within the struct or union as that type. Within a struct, all members are assigned to disjoint regions of storage, while all union members use the same storage.
If the members of the structures are stored consecutively, then given:
struct s1 {char a[3],b[5]; };
struct s2 {char a[5],b[3]; };
union { struct s1 v1; struct s2 v2; } u;
the storage assigned to u.v1.b[3] would also be assigned to u.v2.b[1], so a write to either would effectively set the value in both.
The C Standard allows for other dialects of C which would impose additional restrictions on when objects may be read or written, either in cases where it would allow implementations to generate more efficient code or otherwise benefit their customers, or in cases where it would hurt an implementation's customers but the implementer doesn't care. There has never been any consensus about exactly what additional restrictions should be expected, and because the authors of the C Standard assume that implementers will seek to benefit their customers, there was never any real effort to formulate rules that would not rely upon such benevolence.
For example, for a structure:
struct name{
int a;
char b;
float c;
}g;
g.b='X';
Now I would like to access structure member b using bitwise operators(<<,>> etc.) and change it to 'A'.
Is it possible to access structure members using such operators??
Bitwise operations on struct's doesn't make to much sense because of padding and more importantly it's just killing the purpose of having a struct in the first place. Bitwise operation are as the name said to operate on bit's in variable. Struct variables usually (if they're not packed) will be padded so until you pack them you wouldn't have guarantee where they are to access them but if you want to ask if you can, yes you can, but you would have to cast struct g to let's say 32 bit value then if two variables would be in this space you could use bit operation on this casted value. If it's necessary you can create union from your struct and have raw variable as one union part, and structure as the other option, then you can manipulate bitwise on raw variable.
You can change the data by getting offset of b. I know, this code does not look good.But it serve your purpose.
#include <stdio.h>
struct name
{
int a;
char b;
float c;
}g;
int main()
{
g.b='X';
int y=0;
int offset = &((struct name *)NULL)->b; // Find Offset of b
char *p = (char *)&g;
int i=0;
for(i=0;i<offset;i++)
{
++p;
}
*p=*p|(1<<0);
// *p = 'A';
printf("........%c\n",g.b);
}
there's a way, you have to copy the structure content into a variable (size of your struct), then manipulate the variable and finally copy the variable content back into the struct using memset.
I have two different structures and and two const look up tables as below
typedef const struct
{
unsigned int num;
unsigned char name[100];
unsigned int value1;
unsigned int value2;
unsigned int value3;
}st_Table1;
typedef const struct
{
unsigned int num;
unsigned char name[100];
}st_Table2;
st_Table1 stTable1[] =
{
{ 1, "Name1", 12, 13, 14 },
{ 2, "Name2", 22, 23, 24 },
{ 3, "Name3", 32, 33, 34 },
{ 4, "Name4", 42, 43, 44 }
};
st_Table2 stTable2[] =
{ 1, "India1" },
{ 2, "India2" },
{ 3, "India3" }
};
Could it be possible to have single pointer that can point to both the lookup tables stTable1 and stTable2?
When I have to make the decision for selection of either of the two tables we can assign the address of the table (either of).
But after that I wanted to use the single pointer in the remaining code.
Please reply for any logic ... hint ... clue
Arvind
well you could create a struct
typedef struct _tableChoice
{
st_Table1* table1;
st_Table2* table2;
} tableChoice_t,*p_tableChoice_t;
then you could pass along an argument of type p_tableChoice_t until you need to specifically access one of the tables. If you need to decide during runtime what pointer to use you would need to have both pointers available at the decision point.
You can get the same effect as the casts other have suggested with a union as well
union table_ptr {
st_Table1*table1;
st_Table2*table2;
} my_table_ptr;
and just assign/access the desired member.
You could create a void pointer:
void * ptr;
Like any other pointer this can point to any memory address but it doesn't specify what type of data is stored there, so you could dereference it but you'd have to depend on your own knowledge of the structures to access different elements etc.. If you have some other flag indicating what type of record it's pointing to, you could cast it back to the required type of pointer.
Given that both of your structs have common ground you'd be better off just using the first one and accepting the small overhead on memory for instances where you only need num and name. I hate software bloat and wasted memory, but you're talking about an extra 12 bytes (on most platforms) for records where value1-3 are not needed, so unless we're talking about billions of records chances are the extra code required to deal with more will consume more memory than the wasted space.
Another option could be a union, where you lose 12 bytes from name for records of the second type so that all records take up the same amount of space, though of course compiler padding my make a difference on some platforms.
You may want to follow a similar pattern to the sockaddr family of structures. In that setup, all of the structures must have the same size and the same initial layout, something like:
typedef const struct
{
unsigned int num;
unsigned char name[100];
char buffer_[sizeof(unsigned int) * 3];
} st_TableBase;
typedef const struct
{
unsigned int num;
unsigned char name[100];
unsigned int value1;
unsigned int value2;
unsigned int value3;
} st_Table1;
typedef const struct
{
unsigned int num;
unsigned char name[100];
char buffer_[sizeof(unsigned int) * 3];
} st_Table2;
With this structure, you would normally maintain a pointer of type st_TableBase. With this pointer you'll be able to access the num or name members directly, since all of the types have a consistent initial layout. If you need to access additional fields, you can cast st_TableBase to one of the "derived" types.
For more information on the sockaddr structures see http://www.retran.com/beej/sockaddr_inman.html.
It's impossible. A pointer must know its own range in order to read and write correctly during runtime, or you can do nothing with it. If you don't know whether it's table1 or table2, how can you write code to read it? (what type is the value in value = *pointer; ?) On the other hand, if you do know it's a table1/table2, why not use a pointer of specific type?
OR (if you insist)
Just use st_Table1 as st_Table2 and accept the waste of memory (3*sizeof(unsigned int) for each record). It won't be a big waste unless you have billions of record. You don't want to hold billions of data record by C structure.
OR (well, you hate waste)
typedef struct
{
int num;
char name[100];
int *value;
}st_table;
well, you have a unified structure now. Allocate value during runtime if you need it (value = (int *)malloc(3 * sizeof(int));). Don't forget the NULL check before you read value.
If I declare a Union as:
union TestUnion
{
struct
{
unsigned int Num;
unsigned char Name[5];
}TestStruct;
unsigned char Total[7];
};
Now, How can I know that whether Total[7] is used or TestStruct is used?
I am using C!
I was revisiting unions and structures and this question came to my mind.
"sizeof" can't be used as both are of same size i.e. 7 bytes. (And Here comes another question)
When I filled only "Total" with a Character 'a' and Tried sizeof(TestUnionInstance), it returned 12 (Size of Char is 1 byte, Right?). So I isolated the structure from it and found that Size of Structure is 12 bytes not 5+2=7 bytes.... Strange!!
Anybody can explain??
P.S. I am using Visual Studio 2008.
You can't. That's part of the point of unions.
If you need to be able to tell, you can use something called a tagged union. Some languages have built-in support for these, but in C, you have to do it yourself. The idea is to include a tag along with the union which you can use to tell which version it is. Like:
enum TestUnionTag {NUM_NAME, TOTAL};
struct {
enum TestUnionTag tag;
union {
struct {
unsigned int Num;
unsigned char Name[5];
} TestStruct;
unsigned char Total[7];
} value;
} TestUnion;
Then in your code, you make sure you always set the tag to say how the union is being used.
About the sizeof: the struct is 12 bytes because there are 4 bytes for the int (most modern compilers have a 4-byte int, the same as a long int), then three bytes of padding and five bytes for the chars (i don't know if the padding comes before or after the chars). The padding is there so that the struct is a whole number of words long, so that everything in memory stays aligned on word boundaries. Because the struct is 12 bytes long, the union has to be 12 bytes long to hold it; the union doesn't change size according to what's in it.
The member to use is the one you last wrote to; the other(s) are off limits. You know which member you last wrote to, don't you? After all, it was you who wrote the program :-)
As for you secondary question: the compiler is allowed to insert 'padding bytes' in the structure to avoid unaligned accesses and make it more performant.
example of a possible distribution of bytes inside your structure
Num |Name |pad
- - - -|- - - - -|x x x
0 1 2 3|4 5 6 7 8|9 a b
Short answer: there is no way except by adding an enum somewhere in your struct outside the union.
enum TestUnionPart
{
TUP_STRUCT,
TUP_TOTAL
};
struct TestUnionStruct
{
enum TestUnionPart Part;
union
{
struct
{
unsigned int Num;
unsigned char Name[5];
} TestStruct;
unsigned char Total[7];
} TestUnion;
};
Now you'll need to control creation of your union to make sure the enum is correctly set, for example with functions similar to:
void init_with_struct(struct TestUnionStruct* tus, struct TestStruct const * ts)
{
tus->Part = TUP_STRUCT;
memcpy(&tus->TestUnion.TestStruct, ts, sizeof(*ts));
}
Dispatch on the correct values is now a single switch:
void print(struct TestUnionStruct const * tus)
{
switch (tus->Part)
{
case TUP_STRUCT:
printf("Num = %u, Name = %s\n",
tus->TestUnion.TestStruct.Num,
tus->TestUnion.TestStruct.Name);
break;
case TUP_TOTAL:
printf("Total = %s\n", tus->TestUnion.Total);
break;
default:
/* Compiler can't make sure you'll never reach this case */
assert(0);
}
}
As a side note, I'd like to mention that these constructs are best handled in languages of the ML family.
type test_struct = { num: int; name: string }
type test_union = Struct of test_struct | Total of string
First, sizeof(int) on most architectures nowadays is going to be 4. If you want 2 you should look at short, or int16_t in the stdint.h header in C99 if you want to be specific.
Second, C uses padding bytes to make sure each struct is aligned to a word-boundary (4). So your struct looks like this:
+---+---+---+---+---+---+---+---+---+---+---+---+
| Num | N a m e | | | |
+---+---+---+---+---+---+---+---+---+---+---+---+
There's 3 bytes at the end. Otherwise, the next struct in an array would have it's Num field in an awkwardly-aligned place, which would make it less efficient to access.
Third, the sizeof a union is going to be the sizeof it's largest member. Even if all that space isn't used, sizeof is going to return the largest result.
You need, as other answers have mentioned, some other way (like an enum) to determine which field of your union is used.
There is no way to tell. You should have some additional flags (or other means external to your union) saying which of the union parts is really used.
Another example of including the union with an enum to determine what is stored. I found it much more clear and to the point.
from:
https://www.cs.uic.edu/~jbell/CourseNotes/C_Programming/Structures.html
author:
Dr. John T. Bell
In order to know which union field is actually stored, unions are often nested inside of structs, with an enumerated type indicating what is actually stored there. For example:
typedef struct Flight {
enum { PASSENGER, CARGO } type;
union {
int npassengers;
double tonnages; // Units are not necessarily tons.
} cargo;
} Flight;
Flight flights[ 1000 ];
flights[ 42 ].type = PASSENGER;
flights[ 42 ].cargo.npassengers = 150;
flights[ 20 ].type = CARGO;
flights[ 20 ].cargo.tonnages = 356.78;
When should unions be used? Why do we need them?
Unions are often used to convert between the binary representations of integers and floats:
union
{
int i;
float f;
} u;
// Convert floating-point bits to integer:
u.f = 3.14159f;
printf("As integer: %08x\n", u.i);
Although this is technically undefined behavior according to the C standard (you're only supposed to read the field which was most recently written), it will act in a well-defined manner in virtually any compiler.
Unions are also sometimes used to implement pseudo-polymorphism in C, by giving a structure some tag indicating what type of object it contains, and then unioning the possible types together:
enum Type { INTS, FLOATS, DOUBLE };
struct S
{
Type s_type;
union
{
int s_ints[2];
float s_floats[2];
double s_double;
};
};
void do_something(struct S *s)
{
switch(s->s_type)
{
case INTS: // do something with s->s_ints
break;
case FLOATS: // do something with s->s_floats
break;
case DOUBLE: // do something with s->s_double
break;
}
}
This allows the size of struct S to be only 12 bytes, instead of 28.
Unions are particularly useful in Embedded programming or in situations where direct access to the hardware/memory is needed. Here is a trivial example:
typedef union
{
struct {
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
unsigned char byte4;
} bytes;
unsigned int dword;
} HW_Register;
HW_Register reg;
Then you can access the reg as follows:
reg.dword = 0x12345678;
reg.bytes.byte3 = 4;
Endianness (byte order) and processor architecture are of course important.
Another useful feature is the bit modifier:
typedef union
{
struct {
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char reserved:4;
} bits;
unsigned char byte;
} HW_RegisterB;
HW_RegisterB reg;
With this code you can access directly a single bit in the register/memory address:
x = reg.bits.b2;
Low level system programming is a reasonable example.
IIRC, I've used unions to breakdown hardware registers into the component bits. So, you can access an 8-bit register (as it was, in the day I did this ;-) into the component bits.
(I forget the exact syntax but...) This structure would allow a control register to be accessed as a control_byte or via the individual bits. It would be important to ensure the bits map on to the correct register bits for a given endianness.
typedef union {
unsigned char control_byte;
struct {
unsigned int nibble : 4;
unsigned int nmi : 1;
unsigned int enabled : 1;
unsigned int fired : 1;
unsigned int control : 1;
};
} ControlRegister;
I've seen it in a couple of libraries as a replacement for object oriented inheritance.
E.g.
Connection
/ | \
Network USB VirtualConnection
If you want the Connection "class" to be either one of the above, you could write something like:
struct Connection
{
int type;
union
{
struct Network network;
struct USB usb;
struct Virtual virtual;
}
};
Example use in libinfinity: http://git.0x539.de/?p=infinote.git;a=blob;f=libinfinity/common/inf-session.c;h=3e887f0d63bd754c6b5ec232948027cbbf4d61fc;hb=HEAD#l74
Unions allow data members which are mutually exclusive to share the same memory. This is quite important when memory is more scarce, such as in embedded systems.
In the following example:
union {
int a;
int b;
int c;
} myUnion;
This union will take up the space of a single int, rather than 3 separate int values. If the user set the value of a, and then set the value of b, it would overwrite the value of a since they are both sharing the same memory location.
Lots of usages. Just do grep union /usr/include/* or in similar directories. Most of the cases the union is wrapped in a struct and one member of the struct tells which element in the union to access. For example checkout man elf for real life implementations.
This is the basic principle:
struct _mydata {
int which_one;
union _data {
int a;
float b;
char c;
} foo;
} bar;
switch (bar.which_one)
{
case INTEGER : /* access bar.foo.a;*/ break;
case FLOATING : /* access bar.foo.b;*/ break;
case CHARACTER: /* access bar.foo.c;*/ break;
}
Here's an example of a union from my own codebase (from memory and paraphrased so it may not be exact). It was used to store language elements in an interpreter I built. For example, the following code:
set a to b times 7.
consists of the following language elements:
symbol[set]
variable[a]
symbol[to]
variable[b]
symbol[times]
constant[7]
symbol[.]
Language elements were defines as '#define' values thus:
#define ELEM_SYM_SET 0
#define ELEM_SYM_TO 1
#define ELEM_SYM_TIMES 2
#define ELEM_SYM_FULLSTOP 3
#define ELEM_VARIABLE 100
#define ELEM_CONSTANT 101
and the following structure was used to store each element:
typedef struct {
int typ;
union {
char *str;
int val;
}
} tElem;
then the size of each element was the size of the maximum union (4 bytes for the typ and 4 bytes for the union, though those are typical values, the actual sizes depend on the implementation).
In order to create a "set" element, you would use:
tElem e;
e.typ = ELEM_SYM_SET;
In order to create a "variable[b]" element, you would use:
tElem e;
e.typ = ELEM_VARIABLE;
e.str = strdup ("b"); // make sure you free this later
In order to create a "constant[7]" element, you would use:
tElem e;
e.typ = ELEM_CONSTANT;
e.val = 7;
and you could easily expand it to include floats (float flt) or rationals (struct ratnl {int num; int denom;}) and other types.
The basic premise is that the str and val are not contiguous in memory, they actually overlap, so it's a way of getting a different view on the same block of memory, illustrated here, where the structure is based at memory location 0x1010 and integers and pointers are both 4 bytes:
+-----------+
0x1010 | |
0x1011 | typ |
0x1012 | |
0x1013 | |
+-----+-----+
0x1014 | | |
0x1015 | str | val |
0x1016 | | |
0x1017 | | |
+-----+-----+
If it were just in a structure, it would look like this:
+-------+
0x1010 | |
0x1011 | typ |
0x1012 | |
0x1013 | |
+-------+
0x1014 | |
0x1015 | str |
0x1016 | |
0x1017 | |
+-------+
0x1018 | |
0x1019 | val |
0x101A | |
0x101B | |
+-------+
I'd say it makes it easier to reuse memory that might be used in different ways, i.e. saving memory. E.g. you'd like to do some "variant" struct that's able to save a short string as well as a number:
struct variant {
int type;
double number;
char *string;
};
In a 32 bit system this would result in at least 96 bits or 12 bytes being used for each instance of variant.
Using an union you can reduce the size down to 64 bits or 8 bytes:
struct variant {
int type;
union {
double number;
char *string;
} value;
};
You're able to save even more if you'd like to add more different variable types etc. It might be true, that you can do similar things casting a void pointer - but the union makes it a lot more accessible as well as type safe. Such savings don't sound massive, but you're saving one third of the memory used for all instances of this struct.
Many of these answers deal with casting from one type to another. I get the most use from unions with the same types just more of them (ie when parsing a serial data stream). They allow the parsing / construction of a framed packet to become trivial.
typedef union
{
UINT8 buffer[PACKET_SIZE]; // Where the packet size is large enough for
// the entire set of fields (including the payload)
struct
{
UINT8 size;
UINT8 cmd;
UINT8 payload[PAYLOAD_SIZE];
UINT8 crc;
} fields;
}PACKET_T;
// This should be called every time a new byte of data is ready
// and point to the packet's buffer:
// packet_builder(packet.buffer, new_data);
void packet_builder(UINT8* buffer, UINT8 data)
{
static UINT8 received_bytes = 0;
// All range checking etc removed for brevity
buffer[received_bytes] = data;
received_bytes++;
// Using the struc only way adds lots of logic that relates "byte 0" to size
// "byte 1" to cmd, etc...
}
void packet_handler(PACKET_T* packet)
{
// Process the fields in a readable manner
if(packet->fields.size > TOO_BIG)
{
// handle error...
}
if(packet->fields.cmd == CMD_X)
{
// do stuff..
}
}
Edit
The comment about endianness and struct padding are valid, and great, concerns. I have used this body of code almost entirely in embedded software, most of which I had control of both ends of the pipe.
It's difficult to think of a specific occasion when you'd need this type of flexible structure, perhaps in a message protocol where you would be sending different sizes of messages, but even then there are probably better and more programmer friendly alternatives.
Unions are a bit like variant types in other languages - they can only hold one thing at a time, but that thing could be an int, a float etc. depending on how you declare it.
For example:
typedef union MyUnion MYUNION;
union MyUnion
{
int MyInt;
float MyFloat;
};
MyUnion will only contain an int OR a float, depending on which you most recently set. So doing this:
MYUNION u;
u.MyInt = 10;
u now holds an int equal to 10;
u.MyFloat = 1.0;
u now holds a float equal to 1.0. It no longer holds an int. Obviously now if you try and do printf("MyInt=%d", u.MyInt); then you're probably going to get an error, though I'm unsure of the specific behaviour.
The size of the union is dictated by the size of its largest field, in this case the float.
Unions are used when you want to model structs defined by hardware, devices or network protocols, or when you're creating a large number of objects and want to save space. You really don't need them 95% of the time though, stick with easy-to-debug code.
In school, I used unions like this:
typedef union
{
unsigned char color[4];
int new_color;
} u_color;
I used it to handle colors more easily, instead of using >> and << operators, I just had to go through the different index of my char array.
union are used to save memory, especially used on devices with limited memory where memory is important.
Exp:
union _Union{
int a;
double b;
char c;
};
For example,let's say we need the above 3 data types(int,double,char) in a system where memory is limited.If we don't use "union",we need to define these 3 data types. In this case sizeof(a) + sizeof(b) + sizeof(c) memory space will be allocated.But if we use onion,only one memory space will be allocated according to the largest data t ype in these 3 data types.Because all variables in union structure will use the same memory space. Hence the memory space allocated accroding to the largest data type will be common space for all variables.
For example:
union _Union{
int a;
double b;
char c;
};
int main() {
union _Union uni;
uni.a = 44;
uni.b = 144.5;
printf("a:%d\n",uni.a);
printf("b:%lf\n",uni.b);
return 0;
}
Output is:
a: 0
and b:144.500000
Why a is zero?. Because union structure has only one memory area and all data structures use it in common. So the last assigned value overwrites the old one.
One more example:
union _Union{
char name[15];
int id;
};
int main(){
union _Union uni;
char choice;
printf("YOu can enter name or id value.");
printf("Do you want to enter the name(y or n):");
scanf("%c",&choice);
if(choice == 'Y' || choice == 'y'){
printf("Enter name:");
scanf("%s",uni.name);
printf("\nName:%s",uni.name);
}else{
printf("Enter Id:");
scanf("%d",&uni.id);
printf("\nId:%d",uni.id);
}
return 0;
}
Note:Size of the union is the size of its largest field because sufficient number of bytes must be reserved to store the larges sized field.
In early versions of C, all structure declarations would share a common set of fields. Given:
struct x {int x_mode; int q; float x_f};
struct y {int y_mode; int q; int y_l};
struct z {int z_mode; char name[20];};
a compiler would essentially produce a table of structures' sizes (and possibly alignments), and a separate table of structures' members' names, types, and offsets. The compiler didn't keep track of which members belonged to which structures, and would allow two structures to have a member with the same name only if the type and offset matched (as with member q of struct x and struct y). If p was a pointer to any structure type, p->q would add the offset of "q" to pointer p and fetch an "int" from the resulting address.
Given the above semantics, it was possible to write a function that could perform some useful operations on multiple kinds of structure interchangeably, provided that all the fields used by the function lined up with useful fields within the structures in question. This was a useful feature, and changing C to validate members used for structure access against the types of the structures in question would have meant losing it in the absence of a means of having a structure that can contain multiple named fields at the same address. Adding "union" types to C helped fill that gap somewhat (though not, IMHO, as well as it should have been).
An essential part of unions' ability to fill that gap was the fact that a pointer to a union member could be converted into a pointer to any union containing that member, and a pointer to any union could be converted to a pointer to any member. While the C89 Standard didn't expressly say that casting a T* directly to a U* was equivalent to casting it to a pointer to any union type containing both T and U, and then casting that to U*, no defined behavior of the latter cast sequence would be affected by the union type used, and the Standard didn't specify any contrary semantics for a direct cast from T to U. Further, in cases where a function received a pointer of unknown origin, the behavior of writing an object via T*, converting the T* to a U*, and then reading the object via U* would be equivalent to writing a union via member of type T and reading as type U, which would be standard-defined in a few cases (e.g. when accessing Common Initial Sequence members) and Implementation-Defined (rather than Undefined) for the rest. While it was rare for programs to exploit the CIS guarantees with actual objects of union type, it was far more common to exploit the fact that pointers to objects of unknown origin had to behave like pointers to union members and have the behavioral guarantees associated therewith.
Unions are great. One clever use of unions I've seen is to use them when defining an event. For example, you might decide that an event is 32 bits.
Now, within that 32 bits, you might like to designate the first 8 bits as for an identifier of the sender of the event... Sometimes you deal with the event as a whole, sometimes you dissect it and compare it's components. unions give you the flexibility to do both.
union Event
{
unsigned long eventCode;
unsigned char eventParts[4];
};
What about VARIANT that is used in COM interfaces? It has two fields - "type" and a union holding an actual value that is treated depending on "type" field.
I used union when I was coding for embedded devices. I have C int that is 16 bit long. And I need to retrieve the higher 8 bits and the lower 8 bits when I need to read from/store to EEPROM. So I used this way:
union data {
int data;
struct {
unsigned char higher;
unsigned char lower;
} parts;
};
It doesn't require shifting so the code is easier to read.
On the other hand, I saw some old C++ stl code that used union for stl allocator. If you are interested, you can read the sgi stl source code. Here is a piece of it:
union _Obj {
union _Obj* _M_free_list_link;
char _M_client_data[1]; /* The client sees this. */
};
A file containing different record types.
A network interface containing different request types.
Take a look at this: X.25 buffer command handling
One of the many possible X.25 commands is received into a buffer and handled in place by using a UNION of all the possible structures.
A simple and very usefull example, is....
Imagine:
you have a uint32_t array[2] and want to access the 3rd and 4th Byte of the Byte chain.
you could do *((uint16_t*) &array[1]).
But this sadly breaks the strict aliasing rules!
But known compilers allow you to do the following :
union un
{
uint16_t array16[4];
uint32_t array32[2];
}
technically this is still a violation of the rules. but all known standards support this usage.
Use a union when you have some function where you return a value that can be different depending on what the function did.