union versus void pointer - c

What would be the differences between using simply a void* as opposed to a union? Example:
struct my_struct {
short datatype;
void *data;
}
struct my_struct {
short datatype;
union {
char* c;
int* i;
long* l;
};
};
Both of those can be used to accomplish the exact same thing, is it better to use the union or the void* though?

I had exactly this case in our library. We had a generic string mapping module that could use different sizes for the index, 8, 16 or 32 bit (for historic reasons). So the code was full of code like this:
if(map->idxSiz == 1)
return ((BYTE *)map->idx)[Pos] = ...whatever
else
if(map->idxSiz == 2)
return ((WORD *)map->idx)[Pos] = ...whatever
else
return ((LONG *)map->idx)[Pos] = ...whatever
There were 100 lines like that. As a first step, I changed it to a union and I found it to be more readable.
switch(map->idxSiz) {
case 1: return map->idx.u8[Pos] = ...whatever
case 2: return map->idx.u16[Pos] = ...whatever
case 3: return map->idx.u32[Pos] = ...whatever
}
This allowed me to see more clearly what was going on. I could then decide to completely remove the idxSiz variants using only 32-bit indexes. But this was only possible once the code got more readable.
PS: That was only a minor part of our project which is about several 100’000 lines of code written by people who do not exist any more. The changes to the code have to be gradual, in order not to break the applications.
Conclusion: Even if people are less used to the union variant, I prefer it because it can make the code much lighter to read. On big projects, readability is extremely important, even if it is just you yourself, who will read the code later.
Edit: Added the comment, as comments do not format code:
The change to switch came before (this is now the real code as it was)
switch(this->IdxSiz) {
case 2: ((uint16_t*)this->iSort)[Pos-1] = (uint16_t)this->header.nUz; break;
case 4: ((uint32_t*)this->iSort)[Pos-1] = this->header.nUz; break;
}
was changed to
switch(this->IdxSiz) {
case 2: this->iSort.u16[Pos-1] = this->header.nUz; break;
case 4: this->iSort.u32[Pos-1] = this->header.nUz; break;
}
I shouldn't have combined all the beautification I did in the code and only show that step. But I posted my answer from home where I had no access to the code.

In my opinion, the void pointer and explicit casting is the better way, because it is obvious for every seasoned C programmer what the intent is.
Edit to clarify: If I see the said union in a program, I would ask myself if the author wanted to restrict the types of the stored data. Perhaps some sanity checks are performed which make sense only on integral number types.
But if I see a void pointer, I directly know that the author designed the data structure to hold arbitrary data. Thus I can use it for newly introduced structure types, too.
Note that it could be that I cannot change the original code, e.g. if it is part of a 3rd party library.

It's more common to use a union to hold actual objects rather than pointers.
I think most C developers that I respect would not bother to union different pointers together; if a general-purpose pointer is needed, just using void * certainly is "the C way". The language sacrifices a lot of safety in order to allow you to deliberately alias the types of things; considering what we have paid for this feature we might as well use it when it simplifies the code. That's why the escapes from strict typing have always been there.

The union approach requires that you know a priori all the types that might be used. The void * approach allows storing data types that might not even exist when the code in question is written (though doing much with such an unknown data type can be tricky, such as requiring passing a pointer to a function to be invoked on that data instead of being able to process it directly).
Edit: Since there seems to be some misunderstanding about how to use an unknown data type: in most cases, you provide some sort of "registration" function. In a typical case, you pass in pointers to functions that can carry out all the operations you need on an item being stored. It generates and returns a new index to be used for the value that identifies the type. Then when you want to store an object of that type, you set its identifier to the value you got back from the registration, and when the code that works with the objects needs to do something with that object, it invokes the appropriate function via the pointer you passed in. In a typical case, those pointers to functions will be in a struct, and it'll simply store (pointers to) those structs in an array. The identifier value it returns from registration is just the index into the array of those structs where it has stored this particular one.

Although using union is not common nowadays, since union is more definitive for your usage scenario, suits well. In the first code sample it's not understood the content of data.

My preference would be to go the union route. The cast from void* is a blunt instrument and accessing the datum through a properly typed pointer gives a bit of extra safety.

Toss a coin. Union is more commonly used with non-pointer types, so it looks a bit odd here. However the explicit type specification it provides is decent implicit documentation. void* would be fine so long as you always know you're only going to access pointers. Don't start putting integers in there and relying on sizeof(void*) == sizeof (int).
I don't feel like either way has any advantage over the other in the end.

It's a bit obscured in your example, because you're using pointers and hence indirection. But union certainly does have its advantages.
Imagine:
struct my_struct {
short datatype;
union {
char c;
int i;
long l;
};
};
Now you don't have to worry about where the allocation for the value part comes from. No separate malloc() or anything like that. And you might find that accesses to ->c, ->i, and ->l are a bit faster. (Though this might only make a difference if there are lots of these accesses.)

It really depends on the problem you're trying to solve. Without that context it's really impossible to evaluate which would be better.
For example, if you're trying to build a generic container like a list or a queue that can handle arbitrary data types, then the void pointer approach is preferable. OTOH, if you're limiting yourself to a small set of primitive data types, then the union approach can save you some time and effort.

If you build your code with -fstrict-aliasing (gcc) or similar options on other compilers, then you have to be very careful with how you do your casting. You can cast a pointer as much as you want, but when you dereference it, the pointer type that you use for the dereference must match the original type (with some exceptions). You can't for example do something like:
void foo(void * p)
{
short * pSubSetOfInt = (short *)p ;
*pSubSetOfInt = 0xFFFF ;
}
void goo()
{
int intValue = 0 ;
foo( &intValue ) ;
printf( "0x%X\n", intValue ) ;
}
Don't be suprised if this prints 0 (say) instead of 0xFFFF or 0xFFFF0000 as you may expect when building with optimization. One way to make this code work is to do the same thing using a union, and the code will probably be easier to understand too.

The union reservs enough space for the largest member, they don't have to be same, as void* has a fixed size, whereas the union can be used for arbitrary size.
#include <stdio.h>
#include <stdlib.h>
struct m1 {
union {
char c[100];
};
};
struct m2 {
void * c;
};
int
main()
{
printf("sizeof m1 is %d ",sizeof(struct m1));
printf("sizeof m2 is %d",sizeof(struct m2));
exit(EXIT_SUCCESS);
}
Output:
sizeof m1 is 100 sizeof m2 is 4
EDIT: assuming you only use pointers of the same size as void* , I think the union is better, as you will gain a bit of error detection when trying to set .c with an integer pointer, etc'.
void* , unless you're creating you're own allocator, is definitely quick and dirty, for better or for worse.

Related

C function that returns a pointer to an array correct syntax?

In C you can declare a variable that points to an array like this:
int int_arr[4] = {1,2,3,4};
int (*ptr_to_arr)[4] = &int_arr;
Although practically it is the same as just declaring a pointer to int:
int *ptr_to_arr2 = int_arr;
But syntactically it is something different.
Now, how would a function look like, that returns such a pointer to an array (of int e.g.) ?
A declaration of int is int foo;.
A declaration of an array of 4 int is int foo[4];.
A declaration of a pointer to an array of 4 int is int (*foo)[4];.
A declaration of a function returning a pointer to an array of 4 int is int (*foo())[4];. The () may be filled in with parameter declarations.
As already mentioned, the correct syntax is int (*foo(void))[4]; And as you can tell, it is very hard to read.
Questionable solutions:
Use the syntax as C would have you write it. This is in my opinion something you should avoid, since it's incredibly hard to read, to the point where it is completely useless. This should simply be outlawed in your coding standard, just like any sensible coding standard enforces function pointers to be used with a typedef.
Oh so we just typedef this just like when using function pointers? One might get tempted to hide all this goo behind a typedef indeed, but that's problematic as well. And this is since both arrays and pointers are fundamental "building blocks" in C, with a specific syntax that the programmer expects to see whenever dealing with them. And the absensce of that syntax suggests an object that can be addressed, "lvalue accessed" and copied like any other variable. Hiding them behind typedef might in the end create even more confusion than the original syntax.
Take this example:
typedef int(*arr)[4];
...
arr a = create(); // calls malloc etc
...
// somewhere later, lets make a hard copy! (or so we thought)
arr b = a;
...
cleanup(a);
...
print(b); // mysterious crash here
So this "hide behind typedef" system heavily relies on us naming types somethingptr to indicate that it is a pointer. Or lets say... LPWORD... and there it is, "Hungarian notation", the heavily criticized type system of the Windows API.
A slightly more sensible work-around is to return the array through one of the parameters. This isn't exactly pretty either, but at least somewhat easier to read since the strange syntax is centralized to one parameter:
void foo (int(**result)[4])
{
...
*result = &arr;
}
That is: a pointer to a pointer-to-array of int[4].
If one is prepared to throw type safety out the window, then of course void* foo (void) solves all of these problems... but creates new ones. Very easy to read, but now the problem is type safety and uncertainty regarding what the function actually returns. Not good either.
So what to do then, if these versions are all problematic? There are a few perfectly sensible approaches.
Good solutions:
Leave allocation to the caller. This is by far the best method, if you have the option. Your function would become void foo (int arr[4]); which is readable and type safe both.
Old school C. Just return a pointer to the first item in the array and pass the size along separately. This may or may not be acceptable from case to case.
Wrap it in a struct. For example this could be a sensible implementation of some generic array type:
typedef struct
{
size_t size;
int arr[];
} array_t;
array_t* alloc (size_t items)
{
array_t* result = malloc(sizeof *result + sizeof(int[items]));
return result;
}
The typedef keyword can make things a lot clearer/simpler in this case:
int int_arr[4] = { 1,2,3,4 };
typedef int(*arrptr)[4]; // Define a pointer to an array of 4 ints ...
arrptr func(void) // ... and use that for the function return type
{
return &int_arr;
}
Note: As pointed out in the comments and in Lundin's excellent answer, using a typedef to hide/bury a pointer is a practice that is frowned-upon by (most of) the professional C programming community – and for very good reasons. There is a good discussion about it here.
However, although, in your case, you aren't defining an actual function pointer (which is an exception to the 'rule' that most programmers will accept), you are defining a complicated (i.e. difficult to read) function return type. The discussion at the end of the linked post delves into the "too complicated" issue, which is what I would use to justify use of a typedef in a case like yours. But, if you should choose this road, then do so with caution.

Using different struct definitions to simulate public and private fields in C

I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.
The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).
- example-project
- cmake-build-debug
- example-lib-name
- include
- example-lib-name
- example-header-file.h
- src
- example-lib-name
- example-source-file.c
- CMakeLists.txt
- CMakeLists.txt
- main.c
Let's say that example-header-file.h contains:
typedef struct ExampleStruct {
int data;
} ExampleStruct;
ExampleStruct* new_example_struct(int, double);
which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.
Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling
ExampleStruct* new_struct = new_example_struct(<int>, <double>);,
and will be able to access the data property like: new_struct->data.
However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.
So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.
In our implementation file (example-source-file.c), let's say we have the following code:
#include <stdlib.h>
#include <stdbool.h>
typedef struct ExampleStruct {
int data;
double val;
} ExampleStruct;
ExampleStruct* new_example_struct(int data, double val) {
ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
example_struct->data=data;
example_struct->val=val;
return new_example_struct;
}
double get_val(ExampleStruct* e) {
return e->val;
}
This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:
#include <stdio.h>
#include "example-lib-name/example-header-file.h"
int main() {
printf("Hello, World!\n");
ExampleStruct* test = new_example(6, 7.2);
printf("%d\n", test->data); // <-- THIS WORKS
double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
printf("%f\n", x); //
// printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
return 0;
}
I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.
Some things I see that may be cause for concern:
This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.
Questions I'd like answers on:
Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Yes: your approach produces undefined behavior.
C requires that
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(C17 6.2.7/2)
and that
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
[...]
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")
Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.
Other alternatives:
Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.
As long as the public has a complete definition for ExampleStruct, it can make code like:
ExampleStruct a = *new_example_struct(42, 1.234);
Then the below will certainly fail.
printf("%g\n", get_val(&a));
I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.
Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)
Are there significant performance penalties I may suffer as a result of writing code this way?
Probably:
Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.
i.e. is there a simpler way to do this
Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:
#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)
typedef struct ExampleStruct {
int data;
_Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;
This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).
The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.
An example of the use of such an approach can be found in CUDA's driver API:
Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer
The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.
This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.
In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.
struct example { int data; };
struct example *new_example(int, double);
double example_val(struct example *e);
and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)
#include <stdlib.h>
#include "example.h"
struct private_example {
struct example public;
double val;
};
struct example *new_example(int data, double val) {
struct private_example *const example = malloc(sizeof *example);
if(!example) return 0;
example->public.data = data;
example->val = val;
return &example->public;
}
/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
return (struct private_example *)(void *)
((char *)example - offsetof(struct private_example, public));
}
double example_val(struct example *e) {
return example_upcast(e)->val;
}
Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.

How union is helpful over a variable? [duplicate]

This question already has answers here:
Purpose of Unions in C and C++
(16 answers)
Closed 2 years ago.
I have looked into this post and explored the use cases of union. Every source is saying that it is memory efficient over structures, I understand this.
But confusion arises when you say:
"Variables of Union share same memory, when you change the value of one variable, then other's value gets changed, and only one variable is accessible at a time"
So why is there a need to declare union and its child variables, why can not I use 1 simple variable by allocating the biggest memory as union biggest variable?
Consider this example:
union {
int length;
int breadth;
char status;
} dimension;
Versus
int dimension; # Now this can store the highest value similar to length and breadth.
So why is there a need to declare union and its child variables, why can not I use 1 simple variable by allocating the biggest memory as union biggest variable?
You could do that, but a union makes it much easier.
Your particular example does not make much sense, mostly because you have two of the same type, but also because all are integers. So let's take an example from the post you linked:
union {
int i;
float f;
} u;
This means that you can use u.i when you want to treat the memory as int or u.f if you want to treat it like a floatfloat.
So let's say you want to solve this without a union and just declare a variable "big enough". Which type do you pick? int is at least 16 bit, and float is at least 32 bit. So we pick a float then? Nope, because it might be the case that on the target system, an int is 64 bit and a float 32. So let's overdo things and pick the largest type that exists? Well, you could, but that kind of defeats the purpose of saving memory.
And how do we access them as different types if declared as variables? Consider this code:
float x;
int y = *(int*) x;
Should work nice, right? Nope, apart from the problem that sizes may vary, you will also run into problems with representations. There are a number of different ways of representing both integers and floats. You may also encounter problems with endianess.
You can mimic the behavior of unions without actually using them, but it will require A LOT of extra work. The resulting code is very likely to contain a lot of bugs, and is probably much slower and less portable too.
One use case is to achieve polymorphism. Here is a very simple example, and tbh, it does not look it make things much easier, but that's commonly the case with examples. Suppose we have this:
void print_float(float f)
{
printf("Value: %f\n", f);
}
void print_int(int i)
{
printf("Value: %d\n", i);
}
That could be replaced by this:
struct multitype {
union {
int i;
float f;
} data;
enum { INT, FLOAT } type;
};
void print(struct multitype x)
{
switch(x.type) {
case INT: printf("Value: %d\n", x.data.i); break;
case FLOAT: printf("Value: %f\n", x.data.f); break;
}
}
I think we use union when you are not clear about the type of the variable.
I used union when I convert assemble language to c code. Sometimes one variable used as an int variable and sometimes used as a pointer. At this time I used union.
union {
int data;
void* handle;
}
I hope it's helpful for your understanding.

How to check if a void* pointer can be safely cast to something else?

Let's say I have this function, which is part of some gui toolkit:
typedef struct _My_Struct My_Struct;
/* struct ... */
void paint_handler( void* data )
{
if ( IS_MY_STRUCT(data) ) /* <-- can I do something like this? */
{
My_Struct* str = (My_Struct*) data;
}
}
/* in main() */
My_Struct s;
signal_connect( SIGNAL_PAINT, &paint_handler, (void*) &s ); /* sent s as a void* */
Since the paint_handler will also be called by the GUI toolkit's main loop with other arguments, I cannot always be sure that the parameter I am receiving will always be a pointer to s.
Can I do something like IS_MY_STRUCT in the paint_handler function to check that the parameter I am receiving can be safely cast back to My_Struct* ?
Your void pointer looses all its type information, so by that alone, you cannot check if it can be cast safely. It's up to the programmer to know if a void* can be cast safely to a type.
Unfortunately there is no function to check what the pointer was before it appears in that context (void).
The one solution I can think of is if you place an int _struct_id as the first member of all of your structs. This id member can then be safely checked regardless of the type but this will fail if you pass pointers that don't implement this member (or int, char, ... pointers).
The best you could do would be to look at what data points to to see if it has telltale signs of being what you want, although a) it wouldn't be anywhere close to a guarantee and b) might be dangerous, as you don't know how big the thing data actually points to is. I suppose it isn't any more dangerous than just casting it and using it, but (as has been suggested) a redesign would be better.
If you are creating the type that is being used, you could include as part of the type some kind of identifying information that would help you rule out some void pointers as not being of the type you are looking for. While you would run the chance that some random area of memory would contain the same data or signature as what you are looking for, at least you would know when something was not the type you were looking for.
This approach would require that the struct was initialized in such a way that the signature members, used to determine if the memory area is not valid, is initialized to the signature value.
An example:
typedef struct {
ULONG ulSignature1;
// .. data elements that you want to have
ULONG ulSignature2;
} MySignedStruct;
#define MYSIGNEDSTRUCT_01 0x1F2E3D4C
#define MYSIGNEDSTRUCT_02 0xF1E2D3C4
#define IS_MY_STRUCT(sAdr) ( (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_01 ) && (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_02))
This is kind of a rough approach however it can help. Naturally using a macro like IS_MY_STRUCT() where the argument is used twice can be problematic if the argument has a side effect so you would have to be careful of something like IS_MY_STRUCT(xStruct++) where xStruct is a pointer to a MySignedStruct.
There really isn't in c. void pointers are typeless, and should only ever be casted when you truly know what they point to.
Perhaps you should instead reconsider your design; rewrite your code so that no inspection is necessary. This is the same reason google disallows RTTI in its style guide.
I know the question is 3 years old but here I go,
How about using a simple global enum to distinguish where the function is called from. then you can switch between what type to cast the void pointer to.

Access struct members as if they are a single array?

I have two structures, with values that should compute a pondered average, like this simplified version:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} quantities;
And then I use them to calculate:
average = v_move*qtt_move + v_read*qtt_read + v_suck*qtt_suck + v_flush*qtd_flush + v_nop*qtd_nop + v_call*qtt_call;
Every now and them I need to include another variable. Now, for instance, I need to include v_clean and qtt_clean. I can't change the structures to arrays:
typedef struct
{
int v[6];
} values;
typedef struct
{
int qtt[6];
} quantities;
That would simplify a lot my work, but they are part of an API that need the variable names to be clear.
So, I'm looking for a way to access the members of that structures, maybe using sizeof(), so I can treat them as an array, but still keep the API unchangeable. It is guaranteed that all values are int, but I can't guarantee the size of an int.
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Thanks,
Beco
What you are trying to do is not possible to do in any elegant way. It is not possible to reliably access consecutive struct members as an array. The currently accepted answer is a hack, not a solution.
The proper solution would be to switch to an array, regardless of how much work it is going to require. If you use enum constants for array indexing (as #digEmAll suggested in his now-deleted answer), the names and the code will be as clear as what you have now.
If you still don't want to or can't switch to an array, the only more-or-less acceptable way to do what you are trying to do is to create an "index-array" or "map-array" (see below). C++ has a dedicated language feature that helps one to implement it elegantly - pointers-to-members. In C you are forced to emulate that C++ feature using offsetof macro
static const size_t values_offsets[] = {
offsetof(values, v_move),
offsetof(values, v_read),
offsetof(values, v_suck),
/* and so on */
};
static const size_t quantities_offsets[] = {
offsetof(quantities, qtt_move),
offsetof(quantities, qtt_read),
offsetof(quantities, qtt_suck),
/* and so on */
};
And if now you are given
values v;
quantities q;
and index
int i;
you can generate the pointers to individual fields as
int *pvalue = (int *) ((char *) &v + values_offsets[i]);
int *pquantity = (int *) ((char *) &q + quantities_offsets[i]);
*pvalue += *pquantity;
Of course, you can now iterate over i in any way you want. This is also far from being elegant, but at least it bears some degree of reliability and validity, as opposed to any ugly hack. The whole thing can be made to look more elegantly by wrapping the repetitive pieces into appropriately named functions/macros.
If all members a guaranteed to be of type int you can use a pointer to int and increment it:
int *value = &(values.v_move);
int *quantity = &(quantities.qtt_move);
int i;
average = 0;
// although it should work, a good practice many times IMHO is to add a null as the last member in struct and change the condition to quantity[i] != null.
for (i = 0; i < sizeof(quantities) / sizeof(*quantity); i++)
average += values[i] * quantity[i];
(Since the order of members in a struct is guaranteed to be as declared)
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Yes, a union can certainly do the job:
union
{
values v; /* As defined by OP */
int array[6];
} u;
You can use a pointer to u.values in your API, and work with u.array in your code.
Personally, I think that all the other answers break the rule of least surprise. When I see a plain struct definition, I assume that the structure will be access using normal access methods. With a union, it's clear that the application will access it in special ways, which prompts me to pay extra attention to the code.
It really sounds as if this should have been an array since the beggining, with accessor methods or macros enabling you to still use pretty names like move, read, etc. However, as you mentioned, this isn't feasible due to API breakage.
The two solutions that come to my mind are:
Use a compiler specific directive to ensure that your struct is packed (and thus, that casting it to an array is safe)
Evil macro black magic.
How about using __attribute__((packed)) if you are using gcc?
So you could declare your structures as:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} __attribute__((packed)) values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} __attribute__((packed)) quantities;
According to the gcc manual, your structures will then use the minimum amount of memory possible for storing the structure, omitting any padding that might have normally been there. The only issue would then be to determine the sizeof(int) on your platform which could be done through either some compiler macros or using <stdint.h>.
One more thing is that there will be a performance penalty for unpacking and re-packing the structure when it needs to be accessed and then stored back into memory. But at least you can be assured then that the layout is consistent, and it could be accessed like an array using a cast to a pointer type like you were wanting (i.e., you won't have to worry about padding messing up the pointer offsets).
Thanks,
Jason
this problem is common, and has been solved in many ways in the past. None of them is completely safe or clean. It depends on your particuar application. Here's a list of possible solutions:
1) You can redefine your structures so fields become array elements, and use macros to map each particular element as if it was a structure field. E.g:
struct values { varray[6]; };
#define v_read varray[1]
The disadvantage of this approach is that most debuggers don't understand macros. Another problem is that in theory a compiler could choose a different alignment for the original structure and the redefined one, so the binary compatibility is not guaranted.
2) Count on the compiler's behaviour and treat all the fields as it they were array fields (oops, while I was writing this, someone else wrote the same - +1 for him)
3) create a static array of element offsets (initialized at startup) and use them to "map" the elements. It's quite tricky, and not so fast, but has the advantage that it's independent of the actual disposition of the field in the structure. Example (incomplete, just for clarification):
int positions[10];
position[0] = ((char *)(&((values*)NULL)->v_move)-(char *)NULL);
position[1] = ((char *)(&((values*)NULL)->v_read)-(char *)NULL);
//...
values *v = ...;
int vread;
vread = *(int *)(((char *)v)+position[1]);
Ok, not at all simple. Macros like "offsetof" may help in this case.

Resources