generic API to initialize structs with common fields - c

I have a number of structures, which have first 3 fields common, below is
simplified example:
struct my_struct1 {
/* common fields. */
int a;
int b;
int c;
};
struct my_struct2 {
int a;
int b;
int c;
uint32_t d;
uint32_t e;
};
struct my_struct3 {
int a;
int b;
int c;
uint16_t d;
char e;
};
static void func1(struct my_struct1 *s)
{
/* ... */
}
static void func2(struct my_struct2 *s)
{
/* ... */
}
static void func3(struct my_struct3 *s)
{
/* ... */
}
int main(void)
{
struct my_struct1 s = {1, 2, 3};
struct my_struct2 p = {1, 2, 3, 4, 5};
struct my_struct3 q = {1, 2, 3, 4, 'a'};
func1(&s);
func2(&p);
func3(&q);
/* XXX */
func3((struct my_struct3 *)&s);
return 0;
}
Is it safe to typecast s to struct my_struct3 * and pass to func3 and ensure that s or other objects allocated on stack would not be corrupted?
The reason is that I would like to write a generic API that takes a pointer, initializes common fields (which are common for structures). The other function is specific to my_struct* and sets the rest of the fields.
I'm not sure if void * can solve this.
UPDATE
I should mention, that unfortunately I can't change the structures layout, i.e. adding a common part isn't an option, because the code I'm working with is pretty old and I'm not allowed to change its core structures.
The only ugly workaround I'm seeing is to pass void * and enum struct_type parameters to generic_init function, and based on struct_type cast void * to appropriate structure.

As far as I can interpret the standard, casting a pointer of type my_struct1* to a pointer of type mystruct_3* or vice vera may yield undefined behaviour because of pointer conversion rules (cf. C11 standard ISO/IEC 9899:TC2):
6.3.2.3 Pointers ... (7) A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If
the resulting pointer is not correctly aligned for the pointed-to
type, the behavior is undefined. ...
Hence, as my_struct1 and my_struct3 may have different alignment, a pointer that is correctly aligned according to my_struct1 does not necessarily be correctly aligned according to my_struct3.
But even if you can guarantee that all structs have the same alignment, passing a pointer to an object of type my_struct1 to func3 is - in my opinion -
not safe, even if the common members are the first ones in each struct and even if func3 accesses only the common members.
The reason is that a compiler may introduce padding between members:
6.7.2.1 Structure and union specifiers ... (13) Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are
declared. A pointer to a structure object, suitably converted, points
to its initial member (or if that member is a bit-field, then to the
unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
Hence, as my_struct1 and my_struct3 have different sets of members, the rules of how a compiler introduces padding may vary between these two structs. I think that it is unlikely that this happens, but I did not find any statement in the standard that guarantees that padding - even for the first three members - is the same for my_struct1 and my_struct3.

To flesh out what the comments of EOF and Eugene Sh. already explain:
It would not be safe to cast a my_struct1 to a my_struct3, as my_struct3 has more members that my_struct1 and the compiler would not warn at all about accessing those additional members (d and e), overwriting whatever is behind the my_struct1. Doing it the other way around may work, as long as my_struct1 exactly corresponds to the start of my_struct3. I am not sure if there is any guarantee in the standard that would cover you there, but I would not bet on it.
The advantages of separating out the common part in a separate structure type
are the following:
This reduces repetition in your code which has the advantage of
allowing you to change the common code in one place, reducing the
risk of errors.
The compiler can check the types passed around, by
casting the structs you would be effectively disabling such checks.
There is no need for the common struct to be at the start of a struct, by making it a struct member the compiler can figure out the correct offsets for you.
struct common {
int a;
int b;
int c;
};
struct my_struct1 {
struct common com;
};
struct my_struct2 {
struct common com;
uint32_t d;
uint32_t e;
};
struct my_struct3 {
struct common com;
uint16_t d;
char e;
};
void init_common(struct common *com)
{
com->a = 1;
com->b = 2;
/* ... */
}
struct my_struct1 s = {{1, 2, 3}};
struct my_struct2 p = {{1, 2, 3}, 4, 5};
struct my_struct3 q = {{1, 2, 3}, 4, 'a'};
init_common(&s.com);
init_common(&p.com);
init_common(&q.com);

It's semi-safe. If the first member is identical, you are guaranteed that a pointer to the structure is also a pointer to the first member. Technically, a compiler could insert arbitrary padding after the first member. In fact, no compiler will do so, so if two structs share a first and second member, the pointers to the second members also have the same offset. However the offset may not be address + sizeof(int) for your member "b". ints might e padded to 8 bytes for performance.
To avoid ambiguity, you can explicitly set the common members to a struct "common".

Related

"Inheritance" in C's structs?

Here I'm a bit confused about this code:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
struct test_struct {
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
printf("%u\n", test_in.weird[0]); // 0
test_in.tst.weird[0] = 1;
printf("%u\n", test_in.weird[0]); // 1
return 0;
}
I didn't know that it is possible to use struct's fields this way, so I have two questions:
How is it called in C?
And, of course, how does it work? (Why weird field was changed when I don't change it directly, I thought these are two different fields?)
Here I'm a bit confused about this code:
The short answer is: the code has undefined behavior.
How is it called in C? How does it work?
struct test_struct is defined with its last member as an array of unspecified length: uint8_t weird[]; This member is called a flexible array member, not to be confused with a variable length array.
6.7.2 Type specifiers
[...]
20     As a special case, the last member of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
if you allocate such a structure from the heap with extra space for array elements, these elements can be accessed via the weird member up to the number of elements thus allocated.
The C Standard mandates that such a structure can only be defined as a member of another structure or union if it appears as the last member of said aggregate. In the posted code, the programmer violates this constraint, so accessing elements of test_in.tst.weird has undefined behavior, and so does accessing elements of test_in.weird.
The programmer also assumes that the test_in.tst.weird array and the test_in.weird array overlap exactly, which may be the case but is not guaranteed, nor supported: code relying on this type of aliasing has undefined behavior as well.
In your example, assuming the compiler accepts the empty initializer {} (part of the next C Standard and borrowed from C++), it seems to work as expected, but this is not guaranteed and alignment issues may cause it to fail as shown in the modified version below:
#include <stdint.h>
#include <stdio.h>
struct test_struct {
uint8_t f;
uint8_t weird[];
};
struct test_struct1 {
int x;
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
struct {
struct test_struct1 tst;
uint8_t weird[256];
} test_in1 = {};
printf("modifying test_in.weird[0]:\n");
printf("%u\n", test_in.weird[0]); // 0
test_in.tst.weird[0] = 1;
printf("%u\n", test_in.weird[0]); // 1
printf("modifying test_in1.weird[0]:\n");
printf("%u\n", test_in1.weird[0]); // 0
test_in1.tst.weird[0] = 1;
printf("%u\n", test_in1.weird[0]); // 0?
return 0;
}
Output:
chqrlie$ make 220930-flexible.run
clang -O3 -std=c11 -Weverything -o 220930-flexible 220930-flexible.c
220930-flexible.c:17:28: warning: field 'tst' with variable sized type 'struct test_struct' not at
the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct test_struct tst;
^
220930-flexible.c:19:17: warning: use of GNU empty initializer extension [-Wgnu-empty-initializer]
} test_in = {};
^
220930-flexible.c:22:29: warning: field 'tst' with variable sized type 'struct test_struct1' not
at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
struct test_struct1 tst;
^
220930-flexible.c:24:18: warning: use of GNU empty initializer extension [-Wgnu-empty-initializer]
} test_in1 = {};
^
4 warnings generated.
modifying test_in.weird[0]:
0
1
modifying test_in1.weird[0]:
0
0
struct test_struct {
uint8_t f;
uint8_t weird[];
};
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
Effectively, before there were FAM's in the language, what you've declared is:
int main(void) {
struct {
struct { uint8_t f; } tst;
union {
uint8_t weird0[1]; // any non-zero size up to 256
uint8_t weird1[256];
} overlay;
} test_in = {};
On the contrary as described in the comments section above, a declaration like
int array[];
is not a Variable Length Array, it's either called Arrays of unknown size (cppreference) or Arrays of Length Zero (gcc).
An example of a VLA would be:
void foo(size_t n)
{
int array[n]; //n is not available at compile time
}
Based on the comment below (from the cppreference - see provided link):
Within a struct definition, an array of unknown size may appear as the last member (as long as there is at least one other named member), in which case it is a special case known as flexible array member. See struct (section Explanation) for details:
struct s { int n; double d[]; }; // s.d is a flexible array member
struct s *s1 = malloc(sizeof (struct s) + (sizeof (double) * 8)); // as if d was double d[8]
The provided code is just invalid.
You declared a structure with a flexible array member
struct test_struct {
uint8_t f;
uint8_t weird[];
};
From the C Standard (6.7.2.1 Structure and union specifiers)
18 As a special case, the last element of a structure with more than
one named member may have an incomplete array type; this is called a
flexible array member.
As it is seen from the quote such a member must be the last element of a structure. So the above structure declaration is correct.
However then in main you declared another unnamed structure
int main(void) {
struct {
struct test_struct tst;
uint8_t weird[256];
} test_in = {};
//...
that contains as member an element of the structure with the flexible array element that now is not the last element of the unnamed structure. So such a declaration is invalid.
Secondly, you are using empty braces to initialize an object of the unnamed structure. Opposite to C++ in C you may not use empty braces to initialize objects.

struct similarity in C

Consider the two structs below:
struct A {
double x[3];
double y[3];
int z[3];
struct A *a;
int b;
struct A *c;
unsigned d[10];
};
struct B {
double x[3];
double y[3];
int z[3];
};
Notice that struct B is a strict subset of struct A. Now, I want to copy the members .x, .y and .z from an instance of struct A to an instance of struct B. My question is: according to the standards, is it valid to do:
struct A s_a = ...;
struct B s_b;
memcpy(&s_b, &s_a, sizeof s_b);
I.e. is it guaranteed that the paddings for the members, in their sequence of appearance, will be the same, so that I can "partially" memcpy struct A to struct B?
It is not guaranteed that struct A's layout starts off the same as struct B's layout.
However, if and only if they were both members of a union:
union X
{
struct A a;
struct B b;
};
then it is guaranteed that the common initial sequence has the same layout.
I've never heard of any compiler that would lay out a struct differently if it detected that the struct were a member of a union, so in practice you should be safe!
How about using struct B as an anonymous struct member of struct A. This requires, however, -fms-extensions for gcc (there should be a similar extension for VC as the name implies):
struct B {
double x[3];
double y[3];
int z[3];
};
struct A {
struct B;
struct A *a;
int b;
struct A *c;
unsigned d[10];
};
This allows to use the fields in struct A like:
struct A as;
as.x[2] = as.y[0];
etc. This guarantees identical layout (the standard allows no padding at the beginning of a struct, so the inner struct is guarantee to start at the same address as the outer) and struct A being cast-compatible to struct B.
Also:
struct A as;
struct B bs;
memcpy(&as, &bs, sizeof(bs));
I do not think the Standard would prohibit an implementation from including so much more padding in s_a than s_b that the former is actually larger even though its members are a subset of s_b's. Such behavior would be very weird, and I can't think of any reason why a compiler would do such a thing, but I don't think it would be prohibited.
If the number of bytes copied is the lesser of sizeof s_a and sizeof s_b, then the memcpy operation will be guaranteed to copy all of the common fields, but would not necessarily leave the later fields of s_b undisturbed. On a typical machine, if the declarations had been:
struct A { uint32_t x; char y; };
struct B { uint32_t x; char y,p; uint16_t q; };
the first structure would contain five bytes of data and three bytes of padding, while the second would contain eight bytes of data with no padding. Using memcpy as shown in your code would copy the padding from s_a over the data in s_b.
If you need to copy the initial structure members while leaving the balance of the structure undisturbed, you should compute add offset and size of the last member of interest, and use that as the number of bytes to copy. In the example I give above, the offset of y would be 4, and the size would be 1, so the memcpy would thus ignore parts of the structure that are used as padding in A but might hold data in B.

If only using the first element, do I have to allocate mem for the whole struct?

I have a structure where the first element is tested and dependent on its value the rest of the structure will or will not be read. In the cases where the first element's value dictates that the rest of the structure will not be read, do I have to allocate enough memory for the entire structure or just the first element?
struct element
{
int x;
int y;
};
int foo(struct element* e)
{
if(e->x > 3)
return e->y;
return e->x;
}
in main:
int i = 0;
int z = foo((struct element*)&i);
I assume that if only allocating for the first element is valid, then I will have to be wary of anything that may attempt to copy the structure. i.e. passing the struct to a function.
don't force your information into structs where it's not needed: don't use the struct as the parameter of your function.
either pass the member of your struct to the function or use inheritance:
typedef struct {
int foo;
} BaseA;
typedef struct {
int bar;
} BaseB;
typedef struct {
BaseA a;
BaseB b;
} Derived;
void foo(BaseB* info) { ... }
...
Derived d;
foo(&d.b);
BaseB b;
foo(&b);
if you're just curious (and seriously don't use this): you may.
typedef struct {
int foo, goo, hoo, joo;
} A;
typedef struct {
int unused, goo;
} B;
int foo(A* a) { return a->goo; }
...
B b;
int goo = foo((A*)&b);
In general you'll have to allocate a block of memory at least as many bytes as are required to fully read the accessed member with the largest offset in your structure. In addition when writing to this block you have to make sure to use the same member offsets as in the original structure.
The point being, a structure is only a block of memory with different areas assigned different interpretations (int, char, other structs etc...) and accessing a member of a struct (after reordering and alignment) boils down to simply reading from or writing to a bit of memory.
I do not think the code as given is legitimate. To understand why, consider:
struct CHAR_AND_INT { unsigned char c; int i; }
CHAR_AND_INT *p;
A compiler would be entitled to assume that p->c will be word-aligned and have whatever padding would be necessary for p->i to also be word-aligned. On some processors, writing a byte may be slower than writing a word. For example, a byte-store instruction may require the processor to read a word from memory, update one byte within it, and write the whole thing back, while a word-store instruction could simply store the new data without having to read anything first. A compiler that knew that p->c would be word-aligned and padded could implement p->c = 12; by using a word store to write the value 12. Such behavior wouldn't yield desired results, however, if the byte following p->c wasn't padding but instead held useful data.
While I would not expect a compiler to impose "special" alignment or padding requirements on any part of the structure shown in the original question (beyond those which apply to int) I don't think anything in the standard would forbid a compiler from doing so.
You need to only check that the structure itself is allocated; not the members (in that case at least)
int foo(struct element* e)
{
if ( e != 0) // check that the e pointer is valid
{
if(e->x != 0) // here you only check to see if x is different than zero (values, not pointers)
return e->y;
}
return 0;
}
In you edited change, I think this is poor coding
int i = 0;
int z = foo((struct element*)&i);
In that case, i will be allocation on the stack, so its address is valid; and will be valid in foo; but since you cast it into something different, the members will be garbage (at best)
Why do you want to cast an int into a structure?
What is your intent?

Is it possible to store different kinds of structs into a flexible-length array?

Let's say we have two struct types as follows:
struct A {
int a;
}
struct B {
int b;
int c;
}
Would it be possible to initialize a flexible-length array to contain instances of both A and B using designated initializers, e.g:
<sometype> my_array[] = {
((struct A){ .a = 10, }),
((struct B){ .b = 1, .c = 5, }),
};
And since I need to know the type of elements in the array, a way to put some char before the structs would be nice too. :)
I know this looks terribly broken, but I am trying to pack some bytecode-like data structures together and this looks like an elegant way to define them (well, with the help of some macros at least).
Edit: To clarify a few points:
Dynamic allocation is not an option
Neither are unions - I want the elements to occupy exactly the space needed by their type
"Variable length array" in the question could have been misleading - the exact denomination would be "flexible length array", according to http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html. The example code is ideally how I'd like it to look like.
So what I'd basically like is to be able to pack some arbitrary, structured data into a memory area that is allocated in the .data segment of the binary. I do not need random access to elements, just to pack the data from structs - the use of a flexible length array in my example is because this construct seems to be the closest from what I want to achieve. But the declaration could be anything else that does the job (except assembler, I need to retain C portability).
The best way for this would be to use unions. You could define all your types within a union, including this union and the char you wanna you for defining what is the actual type into a struct.
struct TypesAB {
char type;
union {
struct {
int a;
} A;
struct {
int b;
int c;
} B;
};
};
enum {
TypeA,
TypeB
};
With this struct, you can define your array, and then set the elements.
struct TypesAB array[10];
array[0].type = TypeA;
array[0].A.a = 10;
array[1].type = TypeB;
array[1].B.b = 1;
array[1].B.c = 5;
Note that the memory layout will make you loose some space if your A and B types are not the same length. Indeed, with the above definition, struct TypesAB will be defined with a sizeof large enough to hold the larger of the A or B, plus the char. If you use it as a A, then the memory space that would have been used for the c member is lost. The same memory space is used for the a member of A and the b member of B.

Mocking "inheritance" in accessing members of structures in C

I have a question about C and attempting to mock a partial type of "inheritance", only in accessing members of structures. Look at the following example:
#pragma pack(push,1)
typedef struct foo
{
int value;
int value2;
}foo;
typedef struct foo_extended
{
// "inherits" foo
int value;
int value2;
// "inherits" foo stops
//we also have some additional data
float additional;
}foo_extended;
#pragma pack(pop)
//! This function works for both foo types
void workboth(void* objP)
{
foo* obj = (foo*)objP;
obj->value = 5;
obj->value2 = 15;
}
//! This works only for the extended
void workextended(foo_extended* obj)
{
obj->value = 25;
obj->value2 = 35;
obj->additional = 3.14;
}
int main()
{
foo a;
foo_extended b;
workboth(&a);
workboth(&b);
workextended(&b);
return 0;
}
This works in my system but my question is whether this can be portable as long as there is correct packing of the involved structures (depending on the compiler). I suppose it would need #ifndefs correcttly invoking the tight packing in other compilers too.
Of course the obvious problem is total lack of type checking and putting all of the responsibility of correct usage to the programmer but I am wondering if this is portable or not. Thanks!
P.S.: Forgot to mention that the standard I attempt to adhere to is C99
I have a slightly different method. Instead of using the same values, I create a struct in the struct, and this way the packing is unnecessary:
typedef struct foo
{
int value;
int value2;
}foo;
typedef struct foo_extended
{
foo father;
float additional;
}foo_extended;
now the rest is pretty much as you showed, with a small difference:
void workextended(foo_extended* obj)
{
obj->father.value = 25;
obj->father.value2 = 35;
obj->additional = 3.14;
}
but I would add an id as a field of the first object in the hierarchy to make sure the casting is done to the correct object.
This method is guaranteed to work by the C standard.
As of C11, and also supported by some existing compilers as extensions to older standards, you should use an anonymous struct for that
struct foo_extended {
struct {
int value;
int value2;
};
//we also have some additional data
float additional;
};
by this your substructure has exactly the same layout as foo in particular what concerns alignment of its parts: to be compatible between different compilation units struct that have exactly the same fields in the same order must be laid out identically.
(The impact of your packed pragma is not so clear to me)
Since your foo structure is the first in foo_extended it must always be at offset 0 within that one.
By reading 6.7.2.1/12 and /13 in the C99 draft, I think it can be assumed that two different structs with the same initial members are compatible up to the first different member.
6.7.2.1/12
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
6.7.2.1/13
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

Resources