Related
I am trying to declare a generic variable type in C (I can't us C++), and I have in my mind the following options.
Option1
typedef struct
{
void *value;
ElementType_e type;
} Data_t;
Option 2
typedef struct {
ElementType_e type;
union {
int a;
float b;
char c;
} my_union;
} my_struct;
where ElementType_e is an enum that holds all the possible type of variables (e.g. int, char, unsigned int, etc..). I am kinda leaning toward option 1, because I don't believe casting will add extra computational time, compared to switch, right?
I am just wondering which type is more useful? I know option 1 will require casting every-time being used/accessed. is there any possible issues that could happen with casting ( especially with running/compiling the code on different platform, e.g 32 bits and 16 bits micro)
While option2 require a switch () to do any operation (e.g. addition, ...).
The following link explained that Option 2 is better ( from readability point of view), but i mainly concern about the code size and computational cost.
Generic data type in C [ void * ]
is there any possible issues that could happen with casting
No, as you do not want cast, as there is no need to cast when assigning from/to a void-pointer (in C).
I am just wondering which type is more useful?
Both do, so it depends, as
1 is for the lazy (as it's few typing, and few different variables' names to remember).
2 is for the cautious (as it's type-save, as opposed to option 1, where the "real" type info is lost, so you can even assign a variable's address of a type not in ElementType_e).
Referring a comment:
Regarding performance I expect no major difference between both approaches (if implemented sanely), as both options need condtional statments on assigning to/from (exception here are pointer variables, which for option 1 go without conditonal statements for assignment).
I'd recommend using a union. In fact, I've used one myself in a similar situation:
union sockaddr_u {
struct sockaddr_storage ss;
struct sockaddr_in sin;
struct sockaddr_in6 sin6;
};
I use this union in socket code where I could be working with either IPv4 or IPv6 addresses. In this particular case, the "type" field is actually the first field in each of the inner structs (ss.ss_family, sin.sin_family, and sin6.sin6_family).
I think the problem is not well posed, since there are infinite possible data types definable by the programmer. Consider for example the following sequence:
typedef char S0_t;
typedef struct { S0_t x; } S1_t;
typedef struct { S1_t x; } S2_t;
typedef struct { S2_t x; } S3_t;
It's pretty clear that it's possible to follow indefinitely in order to define as many new types as we want.
So, there is not a straight manner to handle this possibilities.
On the other hand, as pointers are of more flexible nature, you can take the decision of defining a generic type concerned only with pointer types.
Thus, the types used in your project will have to be only pointers.
In this way, probably something very simple like the following definition could work:
typedef void* generic_t;
Other then memory casting tricks is there any way to use an untagged union
(a data type that explicitly hold one of a set of types that isn't a tagged union,
ie. one that is forced by the compiler to hold an associated type tag and possibly only allowed by the language to get the value of the proper type)
without an associated type tag in the container that holds it?
Are there any other advantage an untagged union holds over a typed union?
edit: to show what I mean example tagged union in haskell
data U = I Int | S String
manually tagged union in c
enum u_types {INT,STRING};
typedef struct {
u_types tag;
union u{
int i;
char s[STRING_BUFFER_SIZE];
} d;
}tagged union;
untagged union in c
union u{
int i;
char s[STRING_BUFFER_SIZE];
} d;
One use for untagged unions is to allow easy access to smaller parts of a larger type:
union reg_a {
uint32_t full;
struct { /* little-endian in this example */
uint16_t low;
uint16_t high;
} __attribute__((__packed__));
};
union reg_a a;
a.full = 0x12345678; /* set all whole 32-bits */
a.high = 0xffff; /* change the upper 16-bits */
union pix_rgba {
uint32_t pix; /* to access the whole 32-bit pixel at once */
struct {
uint8_t red; /* red component only */
uint8_t green; /* green component only */
uint8_t blue; /* blue only */
uint8_t alpha; /* alpha only */
} __attribute__((__packed__));
};
These sorts of uses are not necessarily completely portable though, since they may depend on specific representations of types, endianness, etc. Still, they're often portable enough that one or two alternate versions will cover all the platforms one cares about, and they can be quite useful.
Untagged unions are also useful when what is stored in the union will be known anyway, even without checking the tag, and you don't want the extra overhead of storing and updating a tag. Possibly information in another place, that may also serve another purpose, may indicate what kind of data should be in the union -- in which case there's no need tag the union itself.
You don't need the tag if you only plan to define one variable, or use it inside a struct.
For example:
union
{
int x;
int y;
} u;
void test(void)
{
u.x = 10;
}
You only need the tag if you plan to use it in more than one place, if you need to create a pointer to it etc.
Note: The answer above assumed that the question was about what the standard calls a tag. However, after the answer was given, the question was updated to indicate that the tag in question was an extra type field used to record which of the fields in the union was active.
What you have posted as "manually tagged" is not valid C syntax , I suppose you meant it to be:
typedef enum {INT,STRING} u_types;
typedef struct {
u_types tag;
union u{
int i;
char s[1];
} d;
}tagged_union;
Please note that the formal definition of a struct/union tag in C is the name after the struc/union keyword. In your second example, u is a union tag. This is what had me quite confused.
What you describe as "tagged union" is known as a variant in computer science: a variable which can hold multiple types. Variants are generally frowned upon in programming in general and in C in particular. They are banned in MISRA-C:2004, rules 18.3 and 18.4.
Languages with support for variants, like VB (and probably Haskell?) typically present then as: this variable can hold anything, but you should be careful with using it, because it is very inefficient.
In C, variants are not only inefficient, they are a safety hazard. MISRA-c recognizes this in rule 18.3:
For example: a program might try to access data of one type from the location when actually it is storing a value of the other type (e.g. due to an interrupt). The two types of data may align differently in the storage, and encroach upon other data. Therefore the data may not be correctly initialised every time the usage switches. The practice is particularly dangerous in concurrent systems.
So the question should rather be, are there any uses for tagged unions (variants)? No, there is not. I haven't used a single one in any C program I have ever written, there is no use for them. Since C has void pointers, there are far better and safer ways in C to create generic data types:
void ADT_add_generic_type (void* data, some_enum_t type, size_t size);
Take a look at how the C standard implements the functions qsort() and bsearch() for some good examples of generic C programming (ISO 9899:1999 7.20.5.1):
void *bsearch (const void *key,
const void *base,
size_t nmemb,
size_t size,
int (*compar)(const void *, const void *));
Description The bsearch function searches an array of nmemb objects,
the initial element of which is pointed to by base, for an element
that matches the object pointed to by key. The size of each element of
the array is specified by size.
The uses for "untagged" unions are several, however. Data protocols, packing, hardware register access etc etc. See Dmitri's answer for a good example.
Here's a trick:
static __attribute__((const, always_inline))
int32_t floatToIntBits(float f) {
union {
float value;
int32_t bits;
};
value = f;
return bits;
}
Wherever you have a generic implementation using void *, you could use an untagged union instead. Since you are using void * the true object type has to be known from context.
This is a maximally portable way to implement a generic datastructure that can store union { void *ptr; unsigned x; } for example (on C platforms where there is no uintptr_t).
I have two structures, with values that should compute a pondered average, like this simplified version:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} quantities;
And then I use them to calculate:
average = v_move*qtt_move + v_read*qtt_read + v_suck*qtt_suck + v_flush*qtd_flush + v_nop*qtd_nop + v_call*qtt_call;
Every now and them I need to include another variable. Now, for instance, I need to include v_clean and qtt_clean. I can't change the structures to arrays:
typedef struct
{
int v[6];
} values;
typedef struct
{
int qtt[6];
} quantities;
That would simplify a lot my work, but they are part of an API that need the variable names to be clear.
So, I'm looking for a way to access the members of that structures, maybe using sizeof(), so I can treat them as an array, but still keep the API unchangeable. It is guaranteed that all values are int, but I can't guarantee the size of an int.
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Thanks,
Beco
What you are trying to do is not possible to do in any elegant way. It is not possible to reliably access consecutive struct members as an array. The currently accepted answer is a hack, not a solution.
The proper solution would be to switch to an array, regardless of how much work it is going to require. If you use enum constants for array indexing (as #digEmAll suggested in his now-deleted answer), the names and the code will be as clear as what you have now.
If you still don't want to or can't switch to an array, the only more-or-less acceptable way to do what you are trying to do is to create an "index-array" or "map-array" (see below). C++ has a dedicated language feature that helps one to implement it elegantly - pointers-to-members. In C you are forced to emulate that C++ feature using offsetof macro
static const size_t values_offsets[] = {
offsetof(values, v_move),
offsetof(values, v_read),
offsetof(values, v_suck),
/* and so on */
};
static const size_t quantities_offsets[] = {
offsetof(quantities, qtt_move),
offsetof(quantities, qtt_read),
offsetof(quantities, qtt_suck),
/* and so on */
};
And if now you are given
values v;
quantities q;
and index
int i;
you can generate the pointers to individual fields as
int *pvalue = (int *) ((char *) &v + values_offsets[i]);
int *pquantity = (int *) ((char *) &q + quantities_offsets[i]);
*pvalue += *pquantity;
Of course, you can now iterate over i in any way you want. This is also far from being elegant, but at least it bears some degree of reliability and validity, as opposed to any ugly hack. The whole thing can be made to look more elegantly by wrapping the repetitive pieces into appropriately named functions/macros.
If all members a guaranteed to be of type int you can use a pointer to int and increment it:
int *value = &(values.v_move);
int *quantity = &(quantities.qtt_move);
int i;
average = 0;
// although it should work, a good practice many times IMHO is to add a null as the last member in struct and change the condition to quantity[i] != null.
for (i = 0; i < sizeof(quantities) / sizeof(*quantity); i++)
average += values[i] * quantity[i];
(Since the order of members in a struct is guaranteed to be as declared)
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Yes, a union can certainly do the job:
union
{
values v; /* As defined by OP */
int array[6];
} u;
You can use a pointer to u.values in your API, and work with u.array in your code.
Personally, I think that all the other answers break the rule of least surprise. When I see a plain struct definition, I assume that the structure will be access using normal access methods. With a union, it's clear that the application will access it in special ways, which prompts me to pay extra attention to the code.
It really sounds as if this should have been an array since the beggining, with accessor methods or macros enabling you to still use pretty names like move, read, etc. However, as you mentioned, this isn't feasible due to API breakage.
The two solutions that come to my mind are:
Use a compiler specific directive to ensure that your struct is packed (and thus, that casting it to an array is safe)
Evil macro black magic.
How about using __attribute__((packed)) if you are using gcc?
So you could declare your structures as:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} __attribute__((packed)) values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} __attribute__((packed)) quantities;
According to the gcc manual, your structures will then use the minimum amount of memory possible for storing the structure, omitting any padding that might have normally been there. The only issue would then be to determine the sizeof(int) on your platform which could be done through either some compiler macros or using <stdint.h>.
One more thing is that there will be a performance penalty for unpacking and re-packing the structure when it needs to be accessed and then stored back into memory. But at least you can be assured then that the layout is consistent, and it could be accessed like an array using a cast to a pointer type like you were wanting (i.e., you won't have to worry about padding messing up the pointer offsets).
Thanks,
Jason
this problem is common, and has been solved in many ways in the past. None of them is completely safe or clean. It depends on your particuar application. Here's a list of possible solutions:
1) You can redefine your structures so fields become array elements, and use macros to map each particular element as if it was a structure field. E.g:
struct values { varray[6]; };
#define v_read varray[1]
The disadvantage of this approach is that most debuggers don't understand macros. Another problem is that in theory a compiler could choose a different alignment for the original structure and the redefined one, so the binary compatibility is not guaranted.
2) Count on the compiler's behaviour and treat all the fields as it they were array fields (oops, while I was writing this, someone else wrote the same - +1 for him)
3) create a static array of element offsets (initialized at startup) and use them to "map" the elements. It's quite tricky, and not so fast, but has the advantage that it's independent of the actual disposition of the field in the structure. Example (incomplete, just for clarification):
int positions[10];
position[0] = ((char *)(&((values*)NULL)->v_move)-(char *)NULL);
position[1] = ((char *)(&((values*)NULL)->v_read)-(char *)NULL);
//...
values *v = ...;
int vread;
vread = *(int *)(((char *)v)+position[1]);
Ok, not at all simple. Macros like "offsetof" may help in this case.
If I have structure definitions, for example, like these:
struct Base {
int foo;
};
struct Derived {
int foo; // int foo is common for both definitions
char *bar;
};
Can I do something like this?
void foobar(void *ptr) {
((struct Base *)ptr)->foo = 1;
}
struct Derived s;
foobar(&s);
In other words, can I cast the void pointer to Base * to access its foo member when its type is actually Derived *?
You should do
struct Base {
int foo;
};
struct Derived {
struct Base base;
char *bar;
};
to avoid breaking strict aliasing; it is a common misconception that C allows arbitrary casts of pointer types: although it will work as expected in most implementations, it's non-standard.
This also avoids any alignment incompatibilities due to usage of pragma directives.
Many real-world C programs assume the construct you show is safe, and there is an interpretation of the C standard (specifically, of the "common initial sequence" rule, C99 ยง6.5.2.3 p5) under which it is conforming. Unfortunately, in the five years since I originally answered this question, all the compilers I can easily get at (viz. GCC and Clang) have converged on a different, narrower interpretation of the common initial sequence rule, under which the construct you show provokes undefined behavior. Concretely, experiment with this program:
#include <stdio.h>
#include <string.h>
typedef struct A { int x; int y; } A;
typedef struct B { int x; int y; float z; } B;
typedef struct C { A a; float z; } C;
int testAB(A *a, B *b)
{
b->x = 1;
a->x = 2;
return b->x;
}
int testAC(A *a, C *c)
{
c->a.x = 1;
a->x = 2;
return c->a.x;
}
int main(void)
{
B bee;
C cee;
int r;
memset(&bee, 0, sizeof bee);
memset(&cee, 0, sizeof cee);
r = testAB((A *)&bee, &bee);
printf("testAB: r=%d bee.x=%d\n", r, bee.x);
r = testAC(&cee.a, &cee);
printf("testAC: r=%d cee.x=%d\n", r, cee.a.x);
return 0;
}
When compiling with optimization enabled (and without -fno-strict-aliasing), both GCC and Clang will assume that the two pointer arguments to testAB cannot point to the same object, so I get output like
testAB: r=1 bee.x=2
testAC: r=2 cee.x=2
They do not make that assumption for testAC, but โ having previously been under the impression that testAB was required to be compiled as if its two arguments could point to the same object โ I am no longer confident enough in my own understanding of the standard to say whether or not that is guaranteed to keep working.
That will work in this particular case. The foo field in the first member of both structures and hit has the same type. However this is not true in the general case of fields within a struct (that are not the first member). Items like alignment and packing can make this break in subtle ways.
As you seem to be aiming at Object Oriented Programming in C I can suggest you to have a look at the following link:
http://www.planetpdf.com/codecuts/pdfs/ooc.pdf
It goes into detail about ways of handling oop principles in ANSI C.
In particular cases this could work, but in general - no, because of the structure alignment.
You could use different #pragmas to make (actually, attempt to) the alignment identical - and then, yes, that would work.
If you're using microsoft visual studio, you might find this article useful.
There is another little thing that might be helpful or related to what you are doing ..
#define SHARED_DATA int id;
typedef union base_t {
SHARED_DATA;
window_t win;
list_t list;
button_t button;
}
typedef struct window_t {
SHARED_DATA;
int something;
void* blah;
}
typedef struct window_t {
SHARED_DATA;
int size;
}
typedef struct button_t {
SHARED_DATA;
int clicked;
}
Now you can put the shared properties into SHARED_DATA and handle the different types via the "superclass" packed into the union.. You could use SHARED_DATA to store just a 'class identifier' or store a pointer.. Either way it turned out handy for generic handling of event types for me at some point. Hope i'm not going too much off-topic with this
I know this is an old question, but in my view there is more that can be said and some of the other answers are incorrect.
Firstly, this cast:
(struct Base *)ptr
... is allowed, but only if the alignment requirements are met. On many compilers your two structures will have the same alignment requirements, and it's easy to verify in any case. If you get past this hurdle, the next is that the result of the cast is mostly unspecified - that is, there's no requirement in the C standard that the pointer once cast still refers to the same object (only after casting it back to the original type will it necessarily do so).
However, in practice, compilers for common systems usually make the result of a pointer cast refer to the same object.
(Pointer casts are covered in section 6.3.2.3 of both the C99 standard and the more recent C11 standard. The rules are essentially the same in both, I believe).
Finally, you've got the so called "strict aliasing" rules to contend with (C99/C11 6.5 paragraph 7); basically, you are not allowed to access an object of one type via a pointer of another type (with certain exceptions, which don't apply in your example). See "What is the strict-aliasing rule?", or for a very in-depth discussion, read my blog post on the subject.
In conclusion, what you attempt in your code is not guaranteed to work. It might be guaranteed to always work with certain compilers (and with certain compiler options), and it might work by chance with many compilers, but it certainly invokes undefined behavior according to the C language standard.
What you could do instead is this:
*((int *)ptr) = 1;
... I.e. since you know that the first member of the structure is an int, you just cast directly to int, which bypasses the aliasing problem since both types of struct do in fact contain an int at this address. You are relying on knowing the struct layout that the compiler will use and you are still relying on the non-standard semantics of pointer casting, but in practice this is significantly less likely you give you problems.
The great/bad thing about C is that you can cast just about anything -- the problem is, it might not work. :) However, in your case, it will*, since you have two structs whose first members are both of the same type; see this program for an example. Now, if struct derived had a different type as its first element -- for example, char *bar -- then no, you'd get weird behavior.
* I should qualitfy that with "almost always", I suppose; there're a lot of different C compilers out there, so some may have different behavior. However, I know it'll work in GCC.
What would be the differences between using simply a void* as opposed to a union? Example:
struct my_struct {
short datatype;
void *data;
}
struct my_struct {
short datatype;
union {
char* c;
int* i;
long* l;
};
};
Both of those can be used to accomplish the exact same thing, is it better to use the union or the void* though?
I had exactly this case in our library. We had a generic string mapping module that could use different sizes for the index, 8, 16 or 32 bit (for historic reasons). So the code was full of code like this:
if(map->idxSiz == 1)
return ((BYTE *)map->idx)[Pos] = ...whatever
else
if(map->idxSiz == 2)
return ((WORD *)map->idx)[Pos] = ...whatever
else
return ((LONG *)map->idx)[Pos] = ...whatever
There were 100 lines like that. As a first step, I changed it to a union and I found it to be more readable.
switch(map->idxSiz) {
case 1: return map->idx.u8[Pos] = ...whatever
case 2: return map->idx.u16[Pos] = ...whatever
case 3: return map->idx.u32[Pos] = ...whatever
}
This allowed me to see more clearly what was going on. I could then decide to completely remove the idxSiz variants using only 32-bit indexes. But this was only possible once the code got more readable.
PS: That was only a minor part of our project which is about several 100โ000 lines of code written by people who do not exist any more. The changes to the code have to be gradual, in order not to break the applications.
Conclusion: Even if people are less used to the union variant, I prefer it because it can make the code much lighter to read. On big projects, readability is extremely important, even if it is just you yourself, who will read the code later.
Edit: Added the comment, as comments do not format code:
The change to switch came before (this is now the real code as it was)
switch(this->IdxSiz) {
case 2: ((uint16_t*)this->iSort)[Pos-1] = (uint16_t)this->header.nUz; break;
case 4: ((uint32_t*)this->iSort)[Pos-1] = this->header.nUz; break;
}
was changed to
switch(this->IdxSiz) {
case 2: this->iSort.u16[Pos-1] = this->header.nUz; break;
case 4: this->iSort.u32[Pos-1] = this->header.nUz; break;
}
I shouldn't have combined all the beautification I did in the code and only show that step. But I posted my answer from home where I had no access to the code.
In my opinion, the void pointer and explicit casting is the better way, because it is obvious for every seasoned C programmer what the intent is.
Edit to clarify: If I see the said union in a program, I would ask myself if the author wanted to restrict the types of the stored data. Perhaps some sanity checks are performed which make sense only on integral number types.
But if I see a void pointer, I directly know that the author designed the data structure to hold arbitrary data. Thus I can use it for newly introduced structure types, too.
Note that it could be that I cannot change the original code, e.g. if it is part of a 3rd party library.
It's more common to use a union to hold actual objects rather than pointers.
I think most C developers that I respect would not bother to union different pointers together; if a general-purpose pointer is needed, just using void * certainly is "the C way". The language sacrifices a lot of safety in order to allow you to deliberately alias the types of things; considering what we have paid for this feature we might as well use it when it simplifies the code. That's why the escapes from strict typing have always been there.
The union approach requires that you know a priori all the types that might be used. The void * approach allows storing data types that might not even exist when the code in question is written (though doing much with such an unknown data type can be tricky, such as requiring passing a pointer to a function to be invoked on that data instead of being able to process it directly).
Edit: Since there seems to be some misunderstanding about how to use an unknown data type: in most cases, you provide some sort of "registration" function. In a typical case, you pass in pointers to functions that can carry out all the operations you need on an item being stored. It generates and returns a new index to be used for the value that identifies the type. Then when you want to store an object of that type, you set its identifier to the value you got back from the registration, and when the code that works with the objects needs to do something with that object, it invokes the appropriate function via the pointer you passed in. In a typical case, those pointers to functions will be in a struct, and it'll simply store (pointers to) those structs in an array. The identifier value it returns from registration is just the index into the array of those structs where it has stored this particular one.
Although using union is not common nowadays, since union is more definitive for your usage scenario, suits well. In the first code sample it's not understood the content of data.
My preference would be to go the union route. The cast from void* is a blunt instrument and accessing the datum through a properly typed pointer gives a bit of extra safety.
Toss a coin. Union is more commonly used with non-pointer types, so it looks a bit odd here. However the explicit type specification it provides is decent implicit documentation. void* would be fine so long as you always know you're only going to access pointers. Don't start putting integers in there and relying on sizeof(void*) == sizeof (int).
I don't feel like either way has any advantage over the other in the end.
It's a bit obscured in your example, because you're using pointers and hence indirection. But union certainly does have its advantages.
Imagine:
struct my_struct {
short datatype;
union {
char c;
int i;
long l;
};
};
Now you don't have to worry about where the allocation for the value part comes from. No separate malloc() or anything like that. And you might find that accesses to ->c, ->i, and ->l are a bit faster. (Though this might only make a difference if there are lots of these accesses.)
It really depends on the problem you're trying to solve. Without that context it's really impossible to evaluate which would be better.
For example, if you're trying to build a generic container like a list or a queue that can handle arbitrary data types, then the void pointer approach is preferable. OTOH, if you're limiting yourself to a small set of primitive data types, then the union approach can save you some time and effort.
If you build your code with -fstrict-aliasing (gcc) or similar options on other compilers, then you have to be very careful with how you do your casting. You can cast a pointer as much as you want, but when you dereference it, the pointer type that you use for the dereference must match the original type (with some exceptions). You can't for example do something like:
void foo(void * p)
{
short * pSubSetOfInt = (short *)p ;
*pSubSetOfInt = 0xFFFF ;
}
void goo()
{
int intValue = 0 ;
foo( &intValue ) ;
printf( "0x%X\n", intValue ) ;
}
Don't be suprised if this prints 0 (say) instead of 0xFFFF or 0xFFFF0000 as you may expect when building with optimization. One way to make this code work is to do the same thing using a union, and the code will probably be easier to understand too.
The union reservs enough space for the largest member, they don't have to be same, as void* has a fixed size, whereas the union can be used for arbitrary size.
#include <stdio.h>
#include <stdlib.h>
struct m1 {
union {
char c[100];
};
};
struct m2 {
void * c;
};
int
main()
{
printf("sizeof m1 is %d ",sizeof(struct m1));
printf("sizeof m2 is %d",sizeof(struct m2));
exit(EXIT_SUCCESS);
}
Output:
sizeof m1 is 100 sizeof m2 is 4
EDIT: assuming you only use pointers of the same size as void* , I think the union is better, as you will gain a bit of error detection when trying to set .c with an integer pointer, etc'.
void* , unless you're creating you're own allocator, is definitely quick and dirty, for better or for worse.