union types and flexible array member - c

I have a question about the flexible-length arrays in C structures (http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html).
typedef struct {
size_t N;
int elems[];
} A_t;
Now the general approach is quite obvious,
A_t * a = malloc(sizeof(A_t) + sizeof(int) * N)
a->N = N;
....
Now this seems to be awkward when trying to incorporate stuff into other structs or stack-based allocation. So something like the following snipet is bound to fail for N!=0
struct {
A_t a;
A_t b; /// !!!!!
double c; /// !!!!!
};
Now I think it should be possible to allow for usages like this by defining another type
typedef struct {
size_t N;
int elems[5];
} A_5_t;
struct {
A_5_t a;
A_5_t b;
double c; // should work here now.
} mystruct;
and then use it as if it were an A_t structure. When calling a function void foo(A_t * arg1);, one would need to use something like foo((A_t*) (&mystruct.b)). Which -- to me -- appears to be a bit clumsy. I therefore wonder whether there is a better way to do this. I wonder whether one could employ a union type for this somehow?
I am asking this question, because the flexible-length array makes it possible to have data in one piece in the structure, therefore one can copy a struct with a single command instead of having to worry about deep and shallow copies, etc.

You have a mult-layered question.
In this one example:
struct {
A_t b;
double c; /// fails
};
I would try:
struct {
double c;
A_t b;
};
Always place the variable portion of a struct at the end. Note, I don't use GCC, so try this, it might/maybe work.
To follow-up on a requirement given by #wirrbel, the following struct is NOT variable length, but it does define and provide access to a variable length array of integers.
typedef struct {
size_t N;
int *(elems[]); // parens to ensure a pointer to an array
} A_t;
A_t *a = malloc //etc.
a->elems = malloc(sizeof(int) * N);
In this fashion several A_t structures can be included in a more general structure.

No, in general your two struct, A_t and A_5_t, are not interchangeable. The reason is that the version with the flexible array can have different padding in front of the elems field than versions with a fixed field length.
Whether or not your compiler implements a different padding or not, you can test by using the offsetof macro. But even if the offsets are the same for your particular compiler and platform, you'd better not rely on that if you want portable code.

I have figured it out now (the solution has actually been descibed in the gnu documentation as provided above). By appending an array declaration after the struct declaration, one does create a contiguous memory range that is directly adjacent to the "empty" flexible array. Therefore b.A.elems[i] is referencing the same data as b.elems_[i].
It is probably advisable to choose an identifier that tells you that the memory of this array is actually belonging to the structure. at least thats how I would use it then.
typedef struct {
size_t N;
double elems[];
} A_t;
typedef struct {
A_t a;
double elems_[4];
} B_t;
void foo(A_t * arg1) {
for (size_t i=0; i < arg1->N; ++i) {
printf("%f\n", arg1->elems[i]);
}
}
int main(int argc, char *argv[]) {
B_t b;
b.a.N = 4;
for (int i=0; i < 4; ++i) {
b.elems_[i] = 12.4;
}
foo(&b.a);
}

Related

How to expose variable sized arrays inside C struct in swig?

I'm struggling for a few days now to find a solution to wrap a C struct containing multiple variable-sized int arrays (stored as pointers) in swig.
Suppose the following minimal example:
typedef struct {
size_t length;
int *a;
int *b;
} mystruct;
where both a and b are pointers to int arrays allocated somewhere in C. The size of both arrays is stored in the length member.
Now, what I would really like to have is two-fold:
access to a and b members in objects of type mystruct should be safe, i.e. exceptions should be thrown if index is out-of-bounds.
the data in a and b must not be copied-over into a python list or tuple but I want to provide __getitem__ methods instead. The reason for that is that the actual struct consists of many such arrays and they get really huge and I don't want to waste any memory by duplicating them.
I've seen examples how to accomplish this with fixed-sized arrays by writing wrapper classes and templates for each member that internally store the size/length of each array individually, e.g.: SWIG interfacing C library to Python (Creating 'iterable' Python data type from C 'sequence' struct) and SWIG/python array inside structure.
However, I assume once I would wrap a and b into a class to enable them to be extended with __getitem__ methods, I won't have access to the length member of mystruct, i.e. the 'container' of a and b.
One thing I tried without success was to write explicit _get and _set methods
typedef struct {
size_t length;
} mystruct;
%extend mystruct {
int *a;
};
%{
int *mystruct_a_get(mystruct *s) {
return mx->a;
}
int *mystruct_b_get(mystruct *s) {
return mx->b;
}
...
%}
But here, the entire arrays a and b would be returned without any control of the maximum index...
My target languages are Python and Perl 5, so I guess one could start writing complicated typemaps for each language. I've done that before for other wrappers and hope there is a more generic solution to my situation that involves only C++ wrapper classes and such.
Any help or idea is appreciated!
Edit for possible solution
So, I couldn't let it go and came up with the following (simplified) solution that more or less combines the solutions I already saw elsewhere. The idea was to redundantly store the array lengths for each of the wrapped arrays:
%{
/* wrapper for variable sized arrays */
typedef struct {
size_t length;
int *data;
} var_array_int;
/* convenience constructor for variable sized array wrapper */
var_array_int *
var_array_int_new(size_t length,
int *data)
{
var_array_int *a = (var_array_int *)malloc(sizeof(var_array_int));
a->length = length;
a->data = data;
return a;
}
/* actual structure I want to wrap */
typedef struct {
size_t length;
int *a;
int *b;
} mystruct;
%}
/* hide all struct members in scripting language */
typedef struct {} var_array_int;
typedef struct {} mystruct;
/* extend variable sized arrays with __len__ and __getitem__ */
%extend var_array_int {
size_t __len__() const {
return $self->length;
}
const int __getitem__(int i) const throw(std::out_of_range) {
if ((i < 0) ||
(i >= $self->length))
throw std::out_of_range("Index out of bounds");
return $self->data[i];
}
};
/* add read-only variable sized array members to container struct */
%extend mystruct {
var_array_int *const a;
var_array_int *const b;
};
/* implement explict _get() methods for the variable sized array members */
%{
var_array_int *
mystruct_a_get(mystruct *s)
{
return var_array_int_new(s->length, s->a);
}
var_array_int *
mystruct_b_get(mystruct *s)
{
return var_array_int_new(s->length, s->b);
}
%}
The above solution only provides read access to the variable sized arrays and does not include any NULL checks for the wrapped int * pointers. My actual solution of course does that and also makes use of templates to wrap variable sized arrays of different types. But I refrained from showing that here for the sake of clarity.
I wonder if there is an easier way to do the above. Also the solution only seems to work in Python so far. Implementing something similar for Perl 5 already gives me a headache.

Kind of polymorphism in C

I'm writing a C program in which I define two types:
typedef struct {
uint8_t array[32];
/* struct A's members */
...
} A;
typedef struct {
uint8_t array[32];
/* struct B's members, different from A's */
...
} B;
Now I would like to build a data structure which is capable of managing both types without having to write one for type A and one for type B, assuming that both have a uint8_t [32] as their first member.
I read how to implement a sort of polymorphism in C here and I also read here that the order of struct members is guaranteed to be kept by the compiler as written by the programmer.
I came up with the following idea, what if I define the following structure:
typedef struct {
uint8_t array[32];
} Element;
and define a data structure which only deals with data that have type Element? Would it be safe to do something like:
void f(Element * e){
int i;
for(i = 0; i < 32; i++) do_something(e->array[i]);
}
...
A a;
B b;
...
f(((Element *)&a));
...
f(((Element *)&b));
At a first glance it looks unclean, but I was wondering whether there are any guarantees that it will not break?
If array is always the first in your struct, you can simply access it by casting pointers. There is no need for a struct Element. You data structure can store void pointers.
typedef struct {
char array[32];
} A;
typedef struct {
void* elements;
size_t elementSize;
size_t num;
} Vector;
char* getArrayPtr(Vector* v, int i) {
return (char*)(v->elements) + v->elementSize*i;
}
int main()
{
A* pa = malloc(10*sizeof(A));
pa[3].array[0] = 's';
Vector v;
v.elements = pa;
v.num = 10;
v.elementSize = sizeof(A);
printf("%s\n", getArrayPtr(&v, 3));
}
but why not have a function that works with the array directly
void f(uint8_t array[32]){
int i;
for(i = 0; i < 32; i++) do_something(array[i]);
}
and call it like this
f(a.array)
f(b.array)
polymorphism makes sense when you want to kepp
a and b in a container of some sorts
and you want to iterate over them but you dont want to care that they are different types.
This should work fine if you, you know, don't make any mistakes. A pointer to the A struct can be cast to a pointer to the element struct, and so long as they have a common prefix, access to the common members will work just fine.
A pointer to the A struct, which is then cast to a pointer to the element struct can also be cast back to a pointer to the A struct without any problems. If element struct was not originally an A struct, then casting the pointer back to A will be undefined behavior. And this you will need to manage manually.
One gotcha (that I've run into) is, gcc will also allow you to cast the struct back and forth (not just pointer to struct) and this is not supported by the C standard. It will appear to work fine until your (my) friend tries to port the code to a different compiler (suncc) at which point it will break. Or rather, it won't even compile.

C inheritance through type punning, without containment?

I'm in a position where I need to get some object oriented features working in C, in particular inheritance. Luckily there are some good references on stack overflow, notably this Semi-inheritance in C: How does this snippet work? and this Object-orientation in C. The the idea is to contain an instance of the base class within the derived class and typecast it, like so:
struct base {
int x;
int y;
};
struct derived {
struct base super;
int z;
};
struct derived d;
d.super.x = 1;
d.super.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
This is great, but it becomes cumbersome with deep inheritance trees - I'll have chains of about 5-6 "classes" and I'd really rather not type derived.super.super.super.super.super.super all the time. What I was hoping was that I could typecast to a struct of the first n elements, like this:
struct base {
int x;
int y;
};
struct derived {
int x;
int y;
int z;
};
struct derived d;
d.x = 1;
d.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
I've tested this on the C compiler that comes with Visual Studio 2012 and it works, but I have no idea if the C standard actually guarantees it. Is there anyone that might know for sure if this is ok? I don't want to write mountains of code only to discover it's broken at such a fundamental level.
What you describe here is a construct that was fully portable and would have been essentially guaranteed to work by the design of the language, except that the authors of the Standard didn't think it was necessary to explicitly mandate that compilers support things that should obviously work. C89 specified the Common Initial Sequence rule for unions, rather than pointers to structures, because given:
struct s1 {int x; int y; ... other stuff... };
struct s2 {int x; int y; ... other stuff... };
union u { struct s1 v1; struct s2 v2; };
code which received a struct s1* to an outside object that was either
a union u* or a malloc'ed object could legally cast it to a union u*
if it was aligned for that type, and it could legally cast the resulting
pointer to struct s2*, and the effect of using accessing either struct s1* or struct s2* would have to be the same as accessing the union via either the v1 or v2 member. Consequently, the only way for a compiler to make all of the indicated rules work would be to say that converting a pointer of one structure type into a pointer of another type and using that pointer to inspect members of the Common Initial Sequence would work.
Unfortunately, compiler writers have said that the CIS rule is only applicable in cases where the underlying object has a union type, notwithstanding the fact that such a thing represents a very rare usage case (compared with situations where the union type exists for the purpose of letting the compiler know that pointers to the structures should be treated interchangeably for purposes of inspecting the CIS), and further since it would be rare for code to receive a struct s1* or struct s2* that identifies an object within a union u, they think they should be allowed to ignore that possibility. Thus, even if the above declarations are visible, gcc will assume that a struct s1* will never be used to access members of the CIS from a struct s2*.
By using pointers you can always create references to base classes at any level in the hierarchy. And if you use some kind of description of the inheritance structure, you can generate both the "class definitions" and factory functions needed as a build step.
#include <stdio.h>
#include <stdlib.h>
struct foo_class {
int a;
int b;
};
struct bar_class {
struct foo_class foo;
struct foo_class* base;
int c;
int d;
};
struct gazonk_class {
struct bar_class bar;
struct bar_class* base;
struct foo_class* Foo;
int e;
int f;
};
struct gazonk_class* gazonk_factory() {
struct gazonk_class* new_instance = malloc(sizeof(struct gazonk_class));
new_instance->bar.base = &new_instance->bar.foo;
new_instance->base = &new_instance->bar;
new_instance->Foo = &new_instance->bar.foo;
return new_instance;
}
int main(int argc, char* argv[]) {
struct gazonk_class* object = gazonk_factory();
object->Foo->a = 1;
object->Foo->b = 2;
object->base->c = 3;
object->base->d = 4;
object->e = 5;
object->f = 6;
fprintf(stdout, "%d %d %d %d %d %d\n",
object->base->base->a,
object->base->base->b,
object->base->c,
object->base->d,
object->e,
object->f);
return 0;
}
In this example you can either use base pointers to work your way back or directly reference a base class.
The address of a struct is the address of its first element, guaranteed.

C generic type as function argument input

So I have two different structs in which all the properties that I will be accessing will be the same. and I also have a function, who's argument, i want to be able to accept either of the two. Example:
typedef struct{
int whatnot = 14;
int thing[11];
} TH_CONFIG;
typedef struct{
int whatnot = 3;
int thing[5];
} TH_CONFIG_2;
*_CONFIG var;
void fun(*_CONFIG input)
{
input.whatnot = 5;
}
int main(){
fun(var);
}
I may have an inkling that I should use void as the type from that I could typecast or something?, but my searching has only yielded things about function pointers, templates, and C#.
EDIT: *_CONFIG is not meant to be syntactically correct, its signifying that I don't know what to do there, but its supposed to be the _CONFIG type
Possible solutions.
Just use an array of length 11 for both of them. Did you really run out of those last 6 bytes on your OS?
Make it a dynamic array.
Just write in assembly, you clearly don't care about C's higher-level-ness.
Use a language like C++ that supports templates or polymorphism.
Just pass in the arguments of the struct you care about.
void fun(int* whatnot) {
*whatnot = 5;
}
int main() {
fun(&myStruct.whatnot);
return 0;
}
Factor into a quasi-OO design.
struct {
int whatnot;
} typedef Common;
struct TH_CONFIG_1 {
Common common;
int thing[11];
};
struct TH_CONFIG_2 {
Common common;
int thing[5];
}
But if you insist...
void fun(void* input) {
( (int)(*input) ) = 5;
}
or...
void fun(void* input) {
( (TH_CONFIG*) input)->whatnot = 5; // may have been a TH_CONFIG_2, but who cares?
}
Note: this would not pass code review at any C shop.
You can use any pointer type and cast it.
If all the properties you're accessing are the same, I'm guessing one's an extension of the other (since the properties need to have the same offset from the beginning of the struct). In that case you may want to use this pattern:
struct base {
int foo;
char **strings;
};
struct extended {
struct base super;
double other_stuff;
};
Since super is at the start of struct extended, you can cast a struct extended * to struct base * without problems. Of course, you could do that by repeating the same fields in the beginning of struct extended instead, but then you're repeating yourself.

typecheck for return value

I have a list in which i want to be able to put different types. I have a function that returns the current value at index:
void *list_index(const List * list, int index) {
assert(index < list->size);
return list->data[index];
}
In the array there are multiple types, for example:
typedef struct structA { List *x; char *y; List *z; } structA;
typedef struct structB { List *u; char *w; } structB;
Now in order to get data from the array:
structA *A;
structB *B;
for(j=0... ) {
A = list_index(list, j);
B = list_index(list, j);
}
But now how do I find out the type of the return value? Is this possible with typeof (I'm using GCC btw)?
And is this even possible or do i have to make some sort of different construction?
You'll have to use unions like shown here.
The best way to solve this would be to use unions.
Another way would be to memcpy() the list item to an actual struct (i.e., not a pointer) of the appropriate type. This would have the advantage of making each List item as small as possible.
A third way would be to just cast the pointer types as in type punning. C allows this as long as the object is dereferenced with its either its correct type or char.
Either way, you will need to put a code in each structure that identifies the type of object. There is no way the compiler can figure out what a pointer points to for you. And even if you could use typeof, you shouldn't. It's not C99.
Technically, if you don't use a union, you will have a problem making a legal C99 access to the type code, because you will need to make a temporary assumption about the type and this will violate the rule that objects must be dereferenced as their actual type, via a union, or via a char *. However, since the type code must by necessity be in the same position in every type (in order to be useful) this common technical violation of the standard will not actually cause an aliasing optimization error in practice.
Actually, if you make the type code a char, make it the first thing in the struct, and access it via a char *, I think you will end up with code that is a bit confusing to read but is perfectly conforming C99.
Here is an example, this passes gcc -Wall -Wextra
#include <stdio.h>
#include <stdlib.h>
struct A {
char typeCode;
int something;
};
struct B {
char typeCode;
double somethingElse;
};
void *getMysteryList();
int main()
{
void **list = getMysteryList();
int i;
for (i = 0; i < 2; ++i)
switch (*(char *) list[i]) {
case 'A':
printf("%d\n", ((struct A *) list[i])->something);
break;
case 'B':
printf("%7.3f\n", ((struct B *) list[i])->somethingElse);
break;
}
return 0;
}
void *getMysteryList()
{
void **v = malloc(sizeof(void *) * 2);
struct A *a = malloc(sizeof(struct A));
struct B *b = malloc(sizeof(struct B));
a->typeCode = 'A';
a->something = 789;
b->typeCode = 'B';
b->somethingElse = 123.456;
v[0] = a;
v[1] = b;
return v;
}
C handles types and typing entirely at compile time (no dynamic typing), so once you've cast a pointer to a 'void *' its lost any information about the original type. You can cast it back to the original type, but you need to know what that is through some other method.
The usual way to do this is with some kind of type tag or descriptor in the beginning of all the objects that might be stored in your list type. eg:
typedef struct structA { int tag; List *x; char *y; List *z; } structA;
typedef struct structB { int tag; List *u; char *w; } structB;
enum tags { structAtype, structBtype };
You need to ensure that every time you create a structA or a structB, you set the tag field properly. Then, you can cast the void * you get back from list_index to an int * and use that to read the tag.
void *elem = list_index(list, index)
switch (*(int *)elem) {
case structAtype:
/* elem is a structA */
:
case structBtype:
/* elem is a structB */
Make the elements you want to put into the list all inherit from a common base class. Then you can have your base class contain members that identify the actual type.
class base {
public:
typedef enum {
type1,
type2,
type3
} realtype;
virtual realtype whatAmI()=0;
};
class type_one : public base {
public:
virtual base::realtype whatAmI() { return base::type1; };
};
class type_two : public base {
public:
virtual base::realtype whatAmI() { return base::type2; };
};
After that, you'd declare your list type like:
std::list<base *> mylist;
and you can stuff pointers to any of the derived types into the list. Then when you take them out, you can just call 'whatAmI()' to find out what to cast it to.
Please note: Trying to do this in C++ means you are doing something in a way that's not a good match for C++. Any time you deliberately evade the C++ type system like this, it means you're giving up most of the usefulness of C++ (static type checking), and generally means you're creating large amounts of work for yourself later on, not only as you debug the first iteration of this app, but especially at maintenance time.
You have some choices. Keep in mind that C is basically not a dynamically typed language.
You Make a common base for the structs, and put a simple type indicator of your own in it.
struct base {
int type_indication:
};
then
struct structA {
struct base base;
...
};
and then you can cast the pointer to (struct base *).

Resources