I know the differences between union and structure.
But from a design and coding perspective what are the various use cases of using a union instead of a structure? One is a space optimization. Are there any more advantages of using them?
There's really only two major uses. The first is to create a discriminated union. That's probably what you were thinking of by "space optimization," but there's a bit more to it. You need an extra bit of data to know which member of the union is "alive" (has valid data in it) since the compiler does not do this for you. You'd usually see code like that having a union inside of a struct, something like:
struct mixed {
enum { TYPE_INT, TYPE_FLOAT } type;
union {
int int_value;
float float_value;
} data;
};
When assigning to either data.int_value or data.float_value, you'd be expected to set the type member to the appropriate enumeration value as well. Then your code using the mixed value can figure out whether to read the int_value or float_value.
The second significant use of unions is to offer an API that allows the user to access the same data multiple ways. This is pretty limited, as it requires all types in the union to be laid in memory in identical ways. For example, an int and a float are entirely different, and accessing a float as an integer does not give you any particularly meaningful data.
For a useful example of this use case, see how many networking APIs will define a union for IPv4 addresses, something like:
union ipv4addr {
unsigned address;
char octets[4];
};
Most code just wants to pass around the 32-bit integer value, but some code wants to read the individual octets (bytes). This is all doable with pointer casts, but it's a bit easier, more self-documenting, and hence slightly safer to use a union in such a fashion.
In general though, if you're not sure if you need a union, you almost certainly do not need one.
You use a union when your "thing" can be one of many different things but only one at a time.
You use a structure when your "thing" should be a group of other things.
For example, a union can be used for representing widgets or gizmos (with a field allowing us to know what it is), something like:
struct {
int isAGizmo;
union {
widget w;
gizmo g;
}
}
In this example, w and g will overlap in memory.
I've seen this used in compilers where a token can be a numeric constant, keyword, variable name, string, and many other lexical elements (but, of course, each token is one of those things, you cannot have a single token that is both a variable name and a numeric constant).
Alternately, it may be illegal for you to process gizmos without widgets, in which case you could use:
struct {
widget w;
gizmo g;
}
In this case, g would be at a distinct memory location, somewhere after w (no overlap).
Use cases for this abound, such as structures containing record layouts for your phone book application which will no doubt earn you gazillions of dollars from your preferred app store :-)
Ins some cases we may have to use only one variable at a time but the different results will have to be stored with different names. In such cases union will help by allocating same space for each variable at one location with the maximum variable size.
i.e if we use int and char then union will allocate space for char which has the bigger size and the int too 'll be stored in the same space overriding the last one.
You cannot compare unions to structures, it's like comparing apples to oranges, they are used for different things. Unions are typically used in situations where space is premium but more importantly for exclusively alternate data. Unions help eliminate typos and ensure that mutually exclusive states remain mutually exclusive because error in programming logic will surface more quickly when we use unions for mutually exclusive data.
Also unions can lead to much easier pointer mathematics when working with complex data passing between components. case-in-point when developing compilers with LEX and YACC the values are passed from the lexer to the parser in a union. The parser implementation and subsequent typecasting is made significantly easier because of the use of union.
Related
This might be a naive question , I have clear Idea about structures and unions ,and I have seen the use of nesting a union inside a structure in network programming and User Interface programming. But Since these codes are there from long back , I am not able to figure out what would be the design choice or advantage that leads to this decision. Why would I nest a union inside a structure and what would be the advantages , or is this a legacy carry over.The codes I have seen is mainly C.
The sample code looks like this
typedef struct
{
int address1;
int addresslocation;
int addresstype;
}ITEM;
typedef struct application
{
union
{
ITEM const *arr1;
ITEM *arr2;
}initial_array;
}APPLN;
You seem not to have any concern about unions in general, so it's unclear why you're asking about unions appearing as structure members. If they're useful data structures in general (and I contend that they are), then putting them inside a structure does not reduce their usefulness.
In fact, putting unions inside structures can increase their usefulness. One of the key restrictions on the use of unions is that they hold only one member at a time, and (with some caveats) you're permitted to read only the member most recently stored. But how does one know which member that is? One way is to put the union in a structure along with another member that indicates which union member currently contains a value. This pattern goes by various names, among them "tagged union" and "variant record".
But the specific case you asked about is murkier:
typedef struct application
{
union
{
ITEM const *arr1;
ITEM *arr2;
}initial_array;
}APPLN;
A structure type with only one member has pretty limited usefulness. It has the same alignment requirement as all other structure types (which could, in principle, be different from the alignment requirement for unions), so perhaps there's an application for that. Also, if the one member is an array then wrapping it in a structure permits passing and returning it by value, but that's not your case.
For the most part, instead of defining a structure type with only one member, I would prefer to use the member's type directly. Especially so if I'm defining a typedef alias to refer to the type.
Also, the specific form of the union in your example is a little concerning. The only members are pointers to const and non-const versions of the same data type, and, especially without a tag field, the main use I see for that is to conceal non-conforming program actions from the compiler, so as to avoid warnings and / or errors that in fact are well justified.
Sorry if the title is a bit skew, I couldn't think of a concise explanation of what I'm on about!
Anyway, we have an embedded system that stores its settings data in a small SPI EEPROM/Flash chip. In a very basic form it's a struct containing the settings data, a simplified version might look like:
struct settings_data
{
struct factory_data
{ // Data set at the factory
uint32 serial_number;
uint32 calibration;
};
struct user_data
{ // User-configured data follows:
uint8 user_data_1;
uint8 user_data_2;
char[10] somestring;
// etc...
};
}
All fine and dandy until we need to stick an extra value into _factory_data_, at which point everything after it moves.
Now, there are many ways to handle this, but this question is not about finding a different method, it's about whether this idea is reasonable to pad out the data structures so that they don't move when you add things:
struct settings_data
{
union factory_union
{
uint8 padding[100]; // Effectively reserve 100 bytes space
struct factory_data
{ // Data set at the factory
uint32 serial_number;
uint32 calibration;
};
};
union user_union
{
uint8 padding[100]; // Effectively reserve 100 bytes space
struct user_data
{ // User-configured data follows:
uint8 user_data_1;
uint8 user_data_2;
char[10] somestring;
// etc...
};
};
}
If I understand unions correctly, this will reserve 100 bytes storage in the settings_data structure, and any new members we add to the "real" data struct inside the union will not cause the union to grow unless we exceed 100 bytes.
The question is is this a reasonable way to achieve this, given that we have relatively limited resources?
It is reasonable, but it is possible for the size of the union to change when your structure changes even if the structure is still smaller than the padding element.
As it is in your question, the union is likely 100 bytes. Suppose you add a double to the structure, which (we assume) requires eight-byte alignment. Then the compiler makes the union 104 bytes, so that its size will be a multiple of eight (which is necessary so that arrays of the union would maintain the required alignment).
You can avoid this by making the padding a multiple of the alignment requirement of all types you might add in the future.
If I understand unions correctly, this will reserve 100 bytes storage in the settings_data structure, and any new members we add to the "real" data struct inside the union will not cause the union to grow unless we exceed 100 bytes.
Correct. The size of a union is determined by the largest part.
I don't see any problems with your solution. Provided your target platform does not change. However, if the target platform changes, union and struct may "behave" differently due to padding and alignment done by compiler/linker. Little endian and big endian, on the other hand, would not matter in this case, as long as you only use the "real" part of the union.
Also you may consider padding done by compiler/linker to achieve a specific alignment. That is, your "real" part of the union may get bigger than expected. For example, to have 32-bit values aligned some 0-3 octets padding may be added automatically before such an entry in your struct. This depends on the target platform and the compiler/linker. Please refer to the manual of your compiler/linker.
Hope this helps.
Cheers,
Michael
Yes, using the union for padding should work.
And I can't resist giving this little bit of advice. Put a version number in each struct. Then new versions of the software will be able to identify old versions of the structs.
I'm looking for an scenario where using Union is a better option than Structure in C?
I'm not looking for the difference between the two. I'm aware of the Structure and Union concepts in C, and the difference.
And I looked the question Difference between a Structure and a Union in C, which is no way the possible duplicate.
Well, consider the situation where you would like to be able to change each byte of an integer. You could use a union of the integer, and, for example, an array of 4 characters.
union Example
{
int x;
char array[4];
};
That way, by modifying one of the characters, you would also modify a corresponding byte (union members share memory space!).
However, that does not mean unions are better than structs, they're very different and comparing the two doesn't really make sense. It's just an example of how unions can be suitable for doing certain things.
A union is a type that enables you to store different data types in the same memory space (but not simultaneously). A typical use is a table designed to hold a mixture of types in some order that is neither regular nor known in advance. By using an array of unions, you can create an array of equal-sized units, each of which can hold a variety of data types.
unions are set up in much the same way as structures.
Another place you might use a union is in a structure for which the stored information depends on one of the members. For example, suppose you have a structure representing an automobile. If the automobile is owned by the user, you want a structure member describing the owner. If the automobile is leased, you want the member to describe the leasing company. Then you can do something along the following lines:
struct owner {
char socsecurity[12];
...
};
struct leasecompany {
char name[40];
char headquarters[40];
...
};
union data {
struct owner owncar;
struct leasecompany leasecar;
};
struct car_data {
char make[15];
int status; /* 0 = owned, 1 = leased */
union data ownerinfo;
...
};
Suppose flits is a car_data structure. Then if flits.status were 0, the program could use flits.ownerinfo.owncar.socsecurity, and if flits.status were 1, the program could use
flits.ownerinfo.leasecar.name.
This is all taken from the book C Primer Plus 5th Edition
A union is usefull when you have a datastrcuture which can be interpreted in different ways, but always using the same memory.
A good example is i.E. a 32 bit value (DWORD). You can read it as 2*16 bit values, 1*32 bit value or 4*8 bit values, so it is usefull, if you need to adress these parts individually, to create a union. This way you don't have to work with bitmasks or such. You could even create the individual bits, or sets of bits and access them as individual variables, using a union.
Using it to preserve memory is IMO not really needed, because you could always cast to different structures.
Union is used in cases, where one is required to read a blob of data in multiple ways, or read data in different format than it was written. This is something a struct can not handle, unless one considers casting a data to different structs.
Casting can be prohibited in some coding conventions (especially if cast through void ptr), since that makes static analysis difficult. Even then, the comparison would be between union and cast, not between union and struct.
Suppose we have a structure that represents the state of some object, and a function that sets the values in that struct. Side effects of the assignment are important -- changing the state of the object affects hardware, for instance -- which is why the assignment is a function and not simply done inline with '='.
typedef struct foo_s {
int a;
int b;
int c;
} foo_t;
void foo_create (foo_id_t* id, ...);
void foo_set (foo_id_t id, foo_t* new_values);
After creation, perhaps clients want to change their foo a bit, so they fill in a foo_t struct and call foo_set. The question is what elegant idioms are there in C for allowing a partial structure assignment, changing a specified subset of the fields and leaving the rest as they were before?
Ways I've thought of:
1) Read-modify-write: Call get, change some fields, call set. Requires the implementation of set to compare every field to detect actual changes. Potential locking issues between the read and write. Potential performance issues depending on the storage used for the foo_t.
2) Accessor functions: one set function per field would let you call just those functions you need. Drawbacks include a tremendous proliferation of individual functions; locking issues for an entire transaction; and difficulties coordinating sequence if several fields must change together to make sense.
3) Bitmap of fields: Add a bitmap as a parameter to set, or embedded in the foo_t, to indicate which fields are valid. Client code sets the appropriate bits, fills in corresponding fields, and calls set(). Drawbacks include manual maintenance for parallel bit definitions for each field; small bit of extra work for the client. Locking and sequence can be handled by the set implementation.
4) List of offsets: Similar to (3), but passing a variable-length array of the offets into the foo_t of fields that were changed (offset_of() comes in handy). Implementation of set() iterates down the list, comparing offsets against the structure to know which fields were changed. Eliminates the manual duplication of (3), but requires the array to be passed (pointer and length). Forces the client to declare or malloc() such an array, which is a bit clumsy.
5) Property list: Rather than fields in a structure, the object could be represented as a list of names of object properties (that is, an enum). Individual properties could be set with a function like foo_property_set (foo_id, foo_property, void* property_value, int property_len); This style allows arbitrary access to individual fields and future expansion. Disadvantages arise when the goal is to change multiple fields together -- transactional locking would be required; some actions might require several related properties to be changed together; and there's extra overhead in repeated function calls for many properties.
What coding patterns do you use to handle this problem?
How about an accessor macro?
#define SETFOO(fooptr, member, value) ((fooptr)->member = (value))
This is like approach 2), except you don't proliferate functions, you have just the
one macro. Can't help you with the locking issues, you'll just have to provide functions
to lock and unlock before doing any changes. As far as "coordinating sequence if several fields must change together to make sense", you can't change multiple fields atomically in
C anyway. Even a whole structure assignment is basically a memcpy().
I am surprised that the most natural solution is not on your list: object orientation. I am not talking about C++ here, I am talking about using the paradigm of object orientation in C.
You can view your structure as a class, and you can define any number of methods that modify it in the ways you need. Just define one method for each high level operation you need, these methods are the only ones that directly modify your struct foo_s; all other code simply makes sequences of calls to these methods. You can even call your functions using a scheme like foo_methodName() to signal to which "class" they belong.
With that, you don't need to create any complex scheme to modify several fields at once.
I am a little new to C programming. I was writing a C program which has 3 integers to handle. I had all of them inside an array and suddenly I had a thought of why should I not use a structure.
My question here is when is the best time to use a structure and when to use an array. And is there any memory usage difference between the two in this particular case.
Any help regarding this is appriciated. Thanks!
An array is best when you want to loop through the values (which, essentially, means they're strongly related). Otherwise a structure allows you to give them meaningful names and avoids the need to document that array, e.g. myVar[1] is the name of the company and myVar[0] is its phone number, etc. as opposed to companyName, companyPhone.
The difference is about semantic information. If you want to store your information as a list where there is no semantic distinction between different members of that list, then use an array. Perhaps each member of the list represents a different value for the same thing.
If each of those integers represents something special or different, use a struct. Note the implications of using a struct, such as the fact that people expect the members to be closely related semantically.
struct has other advantages over array which can make it more powerful. For example, its ability to encapsulate multiple data types.
If you are passing this information between many functions, a structure is likely more practical (because there is no need to pass the size). It would be bad to pass an array (which decays to a pointer) and expect the callee to know how many items are in the array. Using a struct implicitly makes this part of the function contract.
In terms of size, there is no difference. A 4 byte int would typically be 4-byte aligned.
You can think of structure like an object in OOP languages, a structure ties related data into a single type and allows you to access each member of the structure using the member's name instead of array indices. If you can think of a singular name that could unify the related data then you should be using a structure.
An array can be thought of as a list of items, if the name you thought of above contains the word list or collection or is a plural, then you should be using arrays or other collection types. The primary use of arrays is to loop over it and apply the same operation to every items in the array or a range of items in the array. If you used an array but never looped over it, it's an indication that probably array may not be the best data type.
I would suggest to use an array if the different things you store are logically the same data, but different instance of this. (like a list of telephone numbers or ages). And use a struct when they mean different things (like age and size) bound together because they are related to the same thing (a person).
The size is equal, since both store 3 integers without anything else; You could actually cast the struct to an array and use it like that (although you shouldn't do that for its ugliness).
You could test that with this simple programm:
#include <stdio.h>
struct three_numbers{
int x;
int y;
int z;
};
int main(int argc, char** argv) {
int test[3];
printf("struct: %d, array: %d\n", sizeof(three_numbers), sizeof(test));
}
prints on my system:
struct: 12, array: 12
In my opinion, you should think first from the perspective of the design to decide which one to use. In your question you have mentioned that "I have three integers to handle". The point here is that how did you arrive at three integers?
Just as many others have noted, let's say you need store details of a person, first you need to think of the person as an object and then decide what all information relevant to that person you will need and then decide what data type you need to use for each of those details. What you are trying to do is that you have decided that data types first and then trying work your way up.
To just put in simple words about the difference between structure and array. Structure is a Composite Data Type (or a User defined data type) whereas array is just a collection of similar data.
Use structures to group information about a single object. Use arrays to group information about multiple objects.