Is it a good practice to hide structure definition in C? - c

In my opinion, hiding the definition of a structure in C generally makes code safer, as you enforce—with the help of the compiler—that no member of the structure can be accessed directly.
However, it has a downside in that a user of the structure cannot declare variables of its type to be put on the stack, because the size of the structure becomes unavailable this way (and, therefore, the user has to resort to allocating on the heap via malloc() even when it is undesirable).
This can be (partially) solved via the alloca(3) function that is present in all major libc implementations, even though it does not conform to POSIX.
Keeping these pros and cons in mind, can such design be considered good in general?
In lib.h:
struct foo;
extern size_t foo_size;
int foo_get_bar(struct foo *);
In lib.c:
struct foo {
int bar;
};
size_t foo_size = sizeof foo;
int foo_get_bar(struct foo *foo)
{
return foo->bar;
}
In example.c:
#include "lib.h"
int bar(void)
{
struct foo *foo = alloca(foo_size);
foo_init(foo);
return foo_get_bar(foo);
}

Yes, it is a good practice to hide data.
As an alternate to the alloca(foo_size); pattern, one can declare an aligned character array and perform a pointer conversion. The pointer conversion is not fully portable, though. The character array needs to be a VLA, if the size is defined by a variable and not a compile-time constant:
extern size_t size;
struct sfoo;
#include <stddef.h>
int main(void) {
unsigned char _Alignas (max_align_t) cptr[size];
// or unsigned char _Alignas (_Complex long double) cptr[size]; // some widest type
struct sfoo *sfooptr = (struct sfoo *) cptr;
...
If VLAs are not desired or available, declare the size as a constant (#define foo_N 100) that is guaranteed to be at least as much as needed.

Function bar invokes undefined behavior: the structure pointed to by foo is uninitialized.
If you are going to hide the structure details, provide a foo_create() that allocates one and initializes it and foo_finalize that releases any resources and frees it.
What you are proposing could be made to work, but is error prone and is not a general solution.

Related

C - Structure and user variables share same memory - possible?

Sorry for newbie question in C. I have following problem.
Let's say I have a structure of:
struct foo {
char var_a;
char var_b;
char var_c;
};
And a list of variables:
foo j;
char a, b, c;
I want to make sure, that in all moments of time j.var_a is equal to a, j.var_b equals to b and j.var_c equals to c. E.g I want that structure members would be also accessible like normal user variables at any moment. I thought that struct members just need to share the same memory locations with defined user variables, so assumed that something from this has to be defined as pointer and tried this:
foo *j;
char a, b, c;
And in main() function:
j = &a;
With the aim of assigning address of structure pointer to address of a and with assumption that a,b,c will be located in adjacent memory spaces. But compiler throws error obviously because I can't point pointer of one type to address of other type. I also feel this is unsafe as it relates to the order of variables in memory.
So is there a safe way to achieve this goal without manual reassignment each time when any of variables are changed and additional memory copying? I have an embedded target, so would like to save memory and processor time.
You're correct that j = &a will not do what you expect for the reasons you gave.
What you can do instead is define the members of struct foo to be pointers:
struct foo {
char *var_a;
char *var_b;
char *var_c;
};
char a, b, c;
foo j = { &a, &b, &c };
char a, b, c; local variables are't guaranteed to be contiguous, in-order, and to exit in memory.
There's no way to robustly overlay a struct over them.
The closes thing you could do to pretending some local variables overlay a struct is to make those "local variabled" macros:
struct foo {
char var_a;
char var_b;
char var_c;
};
int main(void)
{
struct foo j;
#define a j.var_a
#define b j.var_b
#define c j.var_c
/*now a,b,c are sort of like contiguous locals
that overlay j */
#undef a
#undef b
#undef c
}
Although there are some circumstances when it would be useful to be able to use "ordinary variable" syntax to access members of a structure (among other things, it may improve performance, allow code that saves/restores a program's state to be reduced to a single structure write/read, etc.) the closest C comes to allowing that is allowing the use of #define macros to replace the "variable" name with a reference to the structure.
For example,
struct foo { int foo_woozle, foo_moozle; } myFoo;
#define woozle myFoo.foo_woozle
#define moozle myFoo.foo_moozle
Unfortunately, this approach requires that the "variable" names not appear as tokens in any other context within the compilation unit. If, for example, one tried to declare a structure with a member named woozle, the structure declaration would fail because its name would get replaced with myFoo.foo_woozle.
On some implementations it may be possible to do something like:
extern int woozle,boozle;
extern volatile struct foo myFoo;
and then use code written in another language (e.g. assembly) to overlay the objects as desired. Unfortunately, this is likely to be unreliable unless optimizations are disabled or the objects are all qualified volatile. The approach using #define will thus be able to generate more efficient code, but is unfortunately saddled with the ugly semantic limitations of macro substitution.

Is struct copying with memcpy() legal?

Lets say I have two structs:
typedef struct {
uint64_t type;
void(*dealloc)(void*);
} generic_t;
typedef struct {
uint64_t type;
void(*dealloc)(void*);
void* sth_else;
} specific_t;
The common way to copy to the simpler struct would be:
specific_t a = /* some code */;
generic_t b = *(generic_t*)&a;
But this is illegal because it violates strict aliasing rules.
However, if I memcpy the struct I only have void pointers which are not affected by strict aliasing rules:
extern void *memcpy(void *restrict dst, const void *restrict src, size_t n);
specific_t a = /* some code */;
generic_t b;
memcpy(&b, &a, sizeof(b));
Is it legal to copy a struct with memcpy like this?
An example use case would be a generic deallocator:
void dealloc_any(void* some_specific_struct) {
// Get the deallocator
generic_t b;
memcpy(&b, some_specific_struct, sizeof(b));
// Call the deallocator with the struct to deallocate
b.dealloc(some_specific_struct);
}
specific_t a = /* some code */;
dealloc_any(&a);
Legal.
According to memcpy manual: The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
So it doesn't care about types at all.
It just does exactly what you tell it to do.
So use it with caution, if you used sizeof(a) instead sizeof(b) you might've overwritten some other variables on the stack.
Although this is not a direct answer it may help.
If you want overlapping structs, you could use a union:
typedef union
{
generic_t generic;
specific_t specific;
} combined_t;
This avoids the need to cast. But you need to be very careful to make sure that you don't access sth_else if it's not initialised. You would need data & logic to determine which of the union members has been set plus access functions/macros. This is working towards building class inheritance in C.
In the past I've built a Java-like exception handling mechanism in C (so try, catch, ...). This featured link-time exception 'class' inheritance. So a compiled library could define a SomeException 'class', user code could then 'subclass' this exception, and the new 'subclass' exception would still be caught by a SomeException catch clause. If you need to stay in C, there's a lot you can do with smart macro's and a few well chosen uncomplicated C constructions.

Am I violating strict aliasing rules by creating dummy struct data types?

I have these two functions:
static inline void *ether_payload(void *pkt)
{
return ((char*)pkt) + 14;
}
static inline uint16_t ip_id(const void *pkt)
{
const char *cpkt = pkt;
uint16_t id;
memcpy(&id, &cpkt[4], sizeof(id));
return ntohs(id);
}
Now, there's a type safety issue. For the first function, void pointer means Ethernet header. For the second function, void pointer means IPv4 header. This creates a huge possibility that somebody accidentally calls the second function for an Ethernet header directly. If somebody does so, the compiler gives no warning.
I would like to eliminate this type safety issue through two dummy structs the contents of which are never defined:
struct etherhdr;
struct ipv4hdr;
Now the functions would be:
static inline struct ipv4hdr *ether_payload(struct etherhdr *pkt)
{
return (struct ipv4hdr*)(((char*)pkt) + 14);
}
static inline uint16_t ip_id(const struct ipv4hdr *pkt)
{
const char *cpkt = (const char*)pkt;
uint16_t id;
memcpy(&id, &cpkt[4], sizeof(id));
return ntohs(id);
}
This solves the type safety issue. Note I'm not actually accessing the Ethernet or IP headers through a struct which would be very bad practice indeed.
My question is, am I violating strict aliasing rules by defining such an API? Note the data is never accessed via the struct; the data is just accessed via memcpy using a char pointer. My understanding is that char pointer can alias to anything.
Let's leave the fact that Ethernet packet can contain IPv6 as irrelevant, as this was just a very simple example.
As for answering your question, it was already answered by Cornstalks, no, you are not violating any strict aliasing rules.
You may convert a pointer to a char pointer. You may convert char pointer to another pointer if you are sure, that this another pointer is really there.
See Strict aliasing rule and 'char *' pointers
The Standard allows implementations to impose alignment restrictions for structures which are coarser than those of any items contained therein. This would allow an implementation for a platform that only supports aligned accesses, that was given e.g.
#include <string.h>
#include <stdint.h>
struct foo {uint32_t dat[1]; };
struct bar {uint16_t dat[2]; };
void test1(struct foo *dest, struct foo *src)
{
memcpy(dest, src, 4);
}
void test2(struct bar *dest, struct bar *src)
{
memcpy(dest, src, 4);
}
to generate code for test2 which is just as efficient as for test1 [using one 32-bit read and write, instead of two 16-bit reads and writes]. If an implementation were to always pad all structures out to a multiple of four bytes and align them to four-byte boundaries, such an implementation would be allowed to perform the aforementioned optimization on test2 without having to know or care about how or even if struct bar is ever defined anywhere.
I don't know whether any present implementations would ever do such a thing, but I can hardly rule out the possibility that a future implementation might do so since there some circumstances where it could allow more efficient code generation.

Can aligned structs inside a union be cast to the union to access aligned fields?

I'm trying to grok what exactly you get from the easement on aligned variables in C99:
Exception to strict aliasing rule in C from 6.5.2.3 Structure and union members
Does it give you carte blanche on casting to that union, if the original write was done through a pointer to one of the aligned structs as below?
#include <stdio.h>
#include <stdlib.h>
struct Foo { char t; int i; };
struct Bar { char t; float f; };
union FooBar {
struct Foo foo;
struct Bar bar;
};
void detector(union FooBar *foobar) {
if (((struct Foo*)foobar)->t == 'F')
printf("Foo %d\n", ((struct Foo*)foobar)->i);
else
printf("Bar %f\n", ((struct Bar*)foobar)->f);
}
int main() {
struct Foo *foo = (struct Foo*)malloc(sizeof(struct Foo));
struct Bar *bar = (struct Bar*)malloc(sizeof(struct Bar));
foo->t = 'F';
foo->i = 1020;
detector((union FooBar*)foo);
bar->t = 'B';
bar->f = 3.04;
detector((union FooBar*)bar);
return 0;
}
Note in the second call, t was written as a "bar's t" but then in order to discern which kind it has, the detector reads it as a "foo's t"
My reaction coming from C++ would be that you'd only be able to do it if you had "allocated it as a FooBar union in the first place". It's counter-intuitive to me to imagine this as legal, but for dynamic allocations in C there's no such thing. So if you can't do that, what exactly can you do with a dynamic memory allocation such as the above under this exception?
If you did something like this:
struct Foo foo;
struct Bar bar;
...
detector((union FooBar*)&foo);
detector((union FooBar*)&bar);
Then you might have issues with alignment, since the compiler could place each of these structs on the stack in a way that might not align properly for the other.
But because in your case you're dynamically allocating the memory for each struct, alignment is not an issue.
From the man page for malloc:
For calloc() and malloc(), the value returned is a pointer to the
allocated memory, which is suitably aligned for any kind of
variable, or NULL if the request fails.
But if you want to be sure that this won't be an issue, just declare an instance of the union instead of the containing struct anyplace where a function expecting the union would be called.
If Foo and Bar have different alignment, you shouldn't do that already for that reason alone. The union will have the maximum alignment of the two, and casting the one with the smaller value will give you a union that is not correctly aligned.
Your code is not a good example for the aliasing rules, because you basically don't have aliasing here. But in general, casts to another type are always bad in cases where you may have aliasing. Your compiler may make assumptions about two (or more) pointers that a code sees. If they are of different type (with exception of char types) the compiler can assume that they never point to the same object.

Is it possible to cast pointers from a structure type to another structure type extending the first in C?

If I have structure definitions, for example, like these:
struct Base {
int foo;
};
struct Derived {
int foo; // int foo is common for both definitions
char *bar;
};
Can I do something like this?
void foobar(void *ptr) {
((struct Base *)ptr)->foo = 1;
}
struct Derived s;
foobar(&s);
In other words, can I cast the void pointer to Base * to access its foo member when its type is actually Derived *?
You should do
struct Base {
int foo;
};
struct Derived {
struct Base base;
char *bar;
};
to avoid breaking strict aliasing; it is a common misconception that C allows arbitrary casts of pointer types: although it will work as expected in most implementations, it's non-standard.
This also avoids any alignment incompatibilities due to usage of pragma directives.
Many real-world C programs assume the construct you show is safe, and there is an interpretation of the C standard (specifically, of the "common initial sequence" rule, C99 §6.5.2.3 p5) under which it is conforming. Unfortunately, in the five years since I originally answered this question, all the compilers I can easily get at (viz. GCC and Clang) have converged on a different, narrower interpretation of the common initial sequence rule, under which the construct you show provokes undefined behavior. Concretely, experiment with this program:
#include <stdio.h>
#include <string.h>
typedef struct A { int x; int y; } A;
typedef struct B { int x; int y; float z; } B;
typedef struct C { A a; float z; } C;
int testAB(A *a, B *b)
{
b->x = 1;
a->x = 2;
return b->x;
}
int testAC(A *a, C *c)
{
c->a.x = 1;
a->x = 2;
return c->a.x;
}
int main(void)
{
B bee;
C cee;
int r;
memset(&bee, 0, sizeof bee);
memset(&cee, 0, sizeof cee);
r = testAB((A *)&bee, &bee);
printf("testAB: r=%d bee.x=%d\n", r, bee.x);
r = testAC(&cee.a, &cee);
printf("testAC: r=%d cee.x=%d\n", r, cee.a.x);
return 0;
}
When compiling with optimization enabled (and without -fno-strict-aliasing), both GCC and Clang will assume that the two pointer arguments to testAB cannot point to the same object, so I get output like
testAB: r=1 bee.x=2
testAC: r=2 cee.x=2
They do not make that assumption for testAC, but — having previously been under the impression that testAB was required to be compiled as if its two arguments could point to the same object — I am no longer confident enough in my own understanding of the standard to say whether or not that is guaranteed to keep working.
That will work in this particular case. The foo field in the first member of both structures and hit has the same type. However this is not true in the general case of fields within a struct (that are not the first member). Items like alignment and packing can make this break in subtle ways.
As you seem to be aiming at Object Oriented Programming in C I can suggest you to have a look at the following link:
http://www.planetpdf.com/codecuts/pdfs/ooc.pdf
It goes into detail about ways of handling oop principles in ANSI C.
In particular cases this could work, but in general - no, because of the structure alignment.
You could use different #pragmas to make (actually, attempt to) the alignment identical - and then, yes, that would work.
If you're using microsoft visual studio, you might find this article useful.
There is another little thing that might be helpful or related to what you are doing ..
#define SHARED_DATA int id;
typedef union base_t {
SHARED_DATA;
window_t win;
list_t list;
button_t button;
}
typedef struct window_t {
SHARED_DATA;
int something;
void* blah;
}
typedef struct window_t {
SHARED_DATA;
int size;
}
typedef struct button_t {
SHARED_DATA;
int clicked;
}
Now you can put the shared properties into SHARED_DATA and handle the different types via the "superclass" packed into the union.. You could use SHARED_DATA to store just a 'class identifier' or store a pointer.. Either way it turned out handy for generic handling of event types for me at some point. Hope i'm not going too much off-topic with this
I know this is an old question, but in my view there is more that can be said and some of the other answers are incorrect.
Firstly, this cast:
(struct Base *)ptr
... is allowed, but only if the alignment requirements are met. On many compilers your two structures will have the same alignment requirements, and it's easy to verify in any case. If you get past this hurdle, the next is that the result of the cast is mostly unspecified - that is, there's no requirement in the C standard that the pointer once cast still refers to the same object (only after casting it back to the original type will it necessarily do so).
However, in practice, compilers for common systems usually make the result of a pointer cast refer to the same object.
(Pointer casts are covered in section 6.3.2.3 of both the C99 standard and the more recent C11 standard. The rules are essentially the same in both, I believe).
Finally, you've got the so called "strict aliasing" rules to contend with (C99/C11 6.5 paragraph 7); basically, you are not allowed to access an object of one type via a pointer of another type (with certain exceptions, which don't apply in your example). See "What is the strict-aliasing rule?", or for a very in-depth discussion, read my blog post on the subject.
In conclusion, what you attempt in your code is not guaranteed to work. It might be guaranteed to always work with certain compilers (and with certain compiler options), and it might work by chance with many compilers, but it certainly invokes undefined behavior according to the C language standard.
What you could do instead is this:
*((int *)ptr) = 1;
... I.e. since you know that the first member of the structure is an int, you just cast directly to int, which bypasses the aliasing problem since both types of struct do in fact contain an int at this address. You are relying on knowing the struct layout that the compiler will use and you are still relying on the non-standard semantics of pointer casting, but in practice this is significantly less likely you give you problems.
The great/bad thing about C is that you can cast just about anything -- the problem is, it might not work. :) However, in your case, it will*, since you have two structs whose first members are both of the same type; see this program for an example. Now, if struct derived had a different type as its first element -- for example, char *bar -- then no, you'd get weird behavior.
* I should qualitfy that with "almost always", I suppose; there're a lot of different C compilers out there, so some may have different behavior. However, I know it'll work in GCC.

Resources