Is struct copying with memcpy() legal? - c

Lets say I have two structs:
typedef struct {
uint64_t type;
void(*dealloc)(void*);
} generic_t;
typedef struct {
uint64_t type;
void(*dealloc)(void*);
void* sth_else;
} specific_t;
The common way to copy to the simpler struct would be:
specific_t a = /* some code */;
generic_t b = *(generic_t*)&a;
But this is illegal because it violates strict aliasing rules.
However, if I memcpy the struct I only have void pointers which are not affected by strict aliasing rules:
extern void *memcpy(void *restrict dst, const void *restrict src, size_t n);
specific_t a = /* some code */;
generic_t b;
memcpy(&b, &a, sizeof(b));
Is it legal to copy a struct with memcpy like this?
An example use case would be a generic deallocator:
void dealloc_any(void* some_specific_struct) {
// Get the deallocator
generic_t b;
memcpy(&b, some_specific_struct, sizeof(b));
// Call the deallocator with the struct to deallocate
b.dealloc(some_specific_struct);
}
specific_t a = /* some code */;
dealloc_any(&a);

Legal.
According to memcpy manual: The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
So it doesn't care about types at all.
It just does exactly what you tell it to do.
So use it with caution, if you used sizeof(a) instead sizeof(b) you might've overwritten some other variables on the stack.

Although this is not a direct answer it may help.
If you want overlapping structs, you could use a union:
typedef union
{
generic_t generic;
specific_t specific;
} combined_t;
This avoids the need to cast. But you need to be very careful to make sure that you don't access sth_else if it's not initialised. You would need data & logic to determine which of the union members has been set plus access functions/macros. This is working towards building class inheritance in C.
In the past I've built a Java-like exception handling mechanism in C (so try, catch, ...). This featured link-time exception 'class' inheritance. So a compiled library could define a SomeException 'class', user code could then 'subclass' this exception, and the new 'subclass' exception would still be caught by a SomeException catch clause. If you need to stay in C, there's a lot you can do with smart macro's and a few well chosen uncomplicated C constructions.

Related

Am I violating strict aliasing rules by creating dummy struct data types?

I have these two functions:
static inline void *ether_payload(void *pkt)
{
return ((char*)pkt) + 14;
}
static inline uint16_t ip_id(const void *pkt)
{
const char *cpkt = pkt;
uint16_t id;
memcpy(&id, &cpkt[4], sizeof(id));
return ntohs(id);
}
Now, there's a type safety issue. For the first function, void pointer means Ethernet header. For the second function, void pointer means IPv4 header. This creates a huge possibility that somebody accidentally calls the second function for an Ethernet header directly. If somebody does so, the compiler gives no warning.
I would like to eliminate this type safety issue through two dummy structs the contents of which are never defined:
struct etherhdr;
struct ipv4hdr;
Now the functions would be:
static inline struct ipv4hdr *ether_payload(struct etherhdr *pkt)
{
return (struct ipv4hdr*)(((char*)pkt) + 14);
}
static inline uint16_t ip_id(const struct ipv4hdr *pkt)
{
const char *cpkt = (const char*)pkt;
uint16_t id;
memcpy(&id, &cpkt[4], sizeof(id));
return ntohs(id);
}
This solves the type safety issue. Note I'm not actually accessing the Ethernet or IP headers through a struct which would be very bad practice indeed.
My question is, am I violating strict aliasing rules by defining such an API? Note the data is never accessed via the struct; the data is just accessed via memcpy using a char pointer. My understanding is that char pointer can alias to anything.
Let's leave the fact that Ethernet packet can contain IPv6 as irrelevant, as this was just a very simple example.
As for answering your question, it was already answered by Cornstalks, no, you are not violating any strict aliasing rules.
You may convert a pointer to a char pointer. You may convert char pointer to another pointer if you are sure, that this another pointer is really there.
See Strict aliasing rule and 'char *' pointers
The Standard allows implementations to impose alignment restrictions for structures which are coarser than those of any items contained therein. This would allow an implementation for a platform that only supports aligned accesses, that was given e.g.
#include <string.h>
#include <stdint.h>
struct foo {uint32_t dat[1]; };
struct bar {uint16_t dat[2]; };
void test1(struct foo *dest, struct foo *src)
{
memcpy(dest, src, 4);
}
void test2(struct bar *dest, struct bar *src)
{
memcpy(dest, src, 4);
}
to generate code for test2 which is just as efficient as for test1 [using one 32-bit read and write, instead of two 16-bit reads and writes]. If an implementation were to always pad all structures out to a multiple of four bytes and align them to four-byte boundaries, such an implementation would be allowed to perform the aforementioned optimization on test2 without having to know or care about how or even if struct bar is ever defined anywhere.
I don't know whether any present implementations would ever do such a thing, but I can hardly rule out the possibility that a future implementation might do so since there some circumstances where it could allow more efficient code generation.

Is it a good practice to hide structure definition in C?

In my opinion, hiding the definition of a structure in C generally makes code safer, as you enforce—with the help of the compiler—that no member of the structure can be accessed directly.
However, it has a downside in that a user of the structure cannot declare variables of its type to be put on the stack, because the size of the structure becomes unavailable this way (and, therefore, the user has to resort to allocating on the heap via malloc() even when it is undesirable).
This can be (partially) solved via the alloca(3) function that is present in all major libc implementations, even though it does not conform to POSIX.
Keeping these pros and cons in mind, can such design be considered good in general?
In lib.h:
struct foo;
extern size_t foo_size;
int foo_get_bar(struct foo *);
In lib.c:
struct foo {
int bar;
};
size_t foo_size = sizeof foo;
int foo_get_bar(struct foo *foo)
{
return foo->bar;
}
In example.c:
#include "lib.h"
int bar(void)
{
struct foo *foo = alloca(foo_size);
foo_init(foo);
return foo_get_bar(foo);
}
Yes, it is a good practice to hide data.
As an alternate to the alloca(foo_size); pattern, one can declare an aligned character array and perform a pointer conversion. The pointer conversion is not fully portable, though. The character array needs to be a VLA, if the size is defined by a variable and not a compile-time constant:
extern size_t size;
struct sfoo;
#include <stddef.h>
int main(void) {
unsigned char _Alignas (max_align_t) cptr[size];
// or unsigned char _Alignas (_Complex long double) cptr[size]; // some widest type
struct sfoo *sfooptr = (struct sfoo *) cptr;
...
If VLAs are not desired or available, declare the size as a constant (#define foo_N 100) that is guaranteed to be at least as much as needed.
Function bar invokes undefined behavior: the structure pointed to by foo is uninitialized.
If you are going to hide the structure details, provide a foo_create() that allocates one and initializes it and foo_finalize that releases any resources and frees it.
What you are proposing could be made to work, but is error prone and is not a general solution.

Opaque types allocatable on stack in C

When designing a C interface, it is common to let into the public interface (.h) only what needs to be known by the user program.
Hence for example, the inner components of structures should remain hidden if the user program does not need to know them. This is indeed good practice, as the content and behavior of the struct could change in the future, without affecting the interface.
A great way to achieve that objective is to use incomplete types.
typedef struct foo opaqueType;
Now an interface using only pointers to opaqueType can be built, without the user program ever needing to know the inner working of struct foo.
But sometimes, it can be required to allocate such structure statically, typically on stack, for performance and memory fragmentation issues. Obviously, with above construction, opaqueType is incomplete, so its size is unknown, so it cannot be statically allocated.
A work around is to allocate a "shell type", such as :
typedef struct { int faketable[8]; } opaqueType;
Above construction enforces a size and an alignment, but doesn't go farther into describing what the structure really contains. So it matches the objective of keeping the type "opaque".
It mostly works. But in one circumstance (GCC 4.4), the compiler complains that it breaks strict-aliasing, and it generates buggy binary.
Now, I've read a ton of things about strict aliasing, so I guess I understand now what it means.
The question is : is there a way to define an opaque type which can nonetheless be allocated on stack, and without breaking strict aliasing rule ?
Note that I've attempted the union method described in this excellent article but it still generates the same warning.
Note also that visual, clang and gcc 4.6 and later don't complain and work fine with this construction.
[Edit] Information complement :
According to tests, the problem only happens in the following circumstances :
Private and public type different. I'm casting the public type to private inside the .c file. It doesn't matter apparently if they are part of the same union. It doesn't matter if the public type contains char.
If all operations on private type are just reads, there's no problem. Only writes cause problems.
I also suspect that only functions which are automatically inlined get into trouble.
Problem only happens on gcc 4.4 at -O3 setting. -O2 is fine.
Finally, my target is C90. Maybe C99 if there really is no choice.
You can force the alignment with max_align_t and you can avoid the strict aliasing issues using an array of char since char is explicitly allowed to alias any other type.
Something along the lines of:
#include <stdint.h>
struct opaque
{
union
{
max_align_t a;
char b[32]; // or whatever size you need.
} u;
};
If you want to support compiler that do not have the max_align_t, or if you know the alignment requirements of the real type, then you can use any other type for the a union member.
UPDATE: If you are targetting C11, then you may also use alignas():
#include <stdint.h>
#include <stdalign.h>
struct opaque
{
alignas(max_align_t) char b[32];
};
Of course, you can replace the max_align_t with whatever type you think appropriate. Or even an integer.
UPDATE #2:
Then, the use of this type in the library would be something along the lines of:
void public_function(struct opaque *po)
{
struct private *pp = (struct private *)po->b;
//use pp->...
}
This way, since you are type-punning a pointer to char you are not breaking the strict aliasing rules.
What you desire is some kind of equivalent of the C++ private access control in C. As you know, no such equivalent exists. The approach you give is approximately what I would do. However, I would make the opaqueType opaque to the inner components implementing the type, so I would be forced to cast it to the real type within the inner components. The forced cast should not generate the warning you are mentioning.
Although cumbersome to use, you can define an interface that provides "stack allocated" memory to an opaque type without exposing a sized structure. The idea is that the implementation code is in charge of the stack allocation, and the user passes in a callback function to get a pointer to the allocated type.
typedef struct opaqueType_raii_callback opqaueType_raii_callback;
struct opaqueType_raii_callback {
void (*func)(opqaueType_raii_callback *, opqaueType *);
};
extern void opaqueType_raii (opaqueType_raii_callback *);
extern void opaqueType_raii_v (opaqueType_raii_callback *, size_t);
void opaqueType_raii (opaqueType_raii_callback *cb) {
opaqueType_raii_v(cb, 1);
}
void opqaueType_raii_v (opaqueType_raii_callback *cb, size_t n) {
opaqueType x[n];
cb->func(cb, x);
}
The definitions above look a bit esoteric, but it is the way I normally implement a callback interface.
struct foo_callback_data {
opaqueType_raii_callback cb;
int my_data;
/* other data ... */
};
void foo_callback_function (opaqueType_raii_callback *cb, opaqueType *x) {
struct foo_callback_data *data = (void *)cb;
/* use x ... */
}
void foo () {
struct foo_callback_data data;
data.cb.func = foo_callback_function;
opaqueType_raii(&data.cb);
}
For me this seems to be something which just shouldn't be done.
The point of having an opaque pointer is to hide the implementation details. The type and alignment of memory where the actual structure is allocated, or whether the library manages additional data beyond what's pointed to are also implementation details.
Of course not that you couldn't document that one or another thing was possible, but the C language uses this approach (strict aliasing), which you can only more or less hack around by Rodrigo's answer (using max_align_t). By the rule you can't know by the interface what kind of constraints the particular compiler would impose on the actual structure within the implementation (for some esoteric microcontrollers, even the type of memory may matter), so I don't think this can be done reliably in a truly cross platform manner.

"Private" struct members in C with const

In order to have a clean code, using some OO concept can be useful, even in C.
I often write modules made of a pair of .h and .c files. The problem is that the user of the module have to be careful, since private members don't exist in C. The use of the pimpl idiom or abstract data types is ok, but it adds some code and/or files, and requires a heavier code. I hate using accessor when I don't need one.
Here is a idea which provides a way to make the compiler complain about invalid access to "private" members, with only a few extra code. The idea is to define twice the same structure, but with some extra 'const' added for the user of the module.
Of course, writing in "private" members is still possible with a cast. But the point is only to avoid mistakes from the user of the module, not to safely protect memory.
/*** 2DPoint.h module interface ***/
#ifndef H_2D_POINT
#define H_2D_POINT
/* 2D_POINT_IMPL need to be defined in implementation files before #include */
#ifdef 2D_POINT_IMPL
#define _cst_
#else
#define _cst_ const
#endif
typedef struct 2DPoint
{
/* public members: read and write for user */
int x;
/* private members: read only for user */
_cst_ int y;
} 2DPoint;
2DPoint *new_2dPoint(void);
void delete_2dPoint(2DPoint **pt);
void set_y(2DPoint *pt, int newVal);
/*** 2dPoint.c module implementation ***/
#define 2D_POINT_IMPL
#include "2dPoint.h"
#include <stdlib.h>
#include <string.h>
2DPoint *new_2dPoint(void)
{
2DPoint *pt = malloc(sizeof(2DPoint));
pt->x = 42;
pt->y = 666;
return pt;
}
void delete_2dPoint(2DPoint **pt)
{
free(*pt);
*pt = NULL;
}
void set_y(2DPoint *pt, int newVal)
{
pt->y = newVal;
}
#endif /* H_2D_POINT */
/*** main.c user's file ***/
#include "2dPoint.h"
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
2DPoint *pt = new_2dPoint();
pt->x = 10; /* ok */
pt->y = 20; /* Invalid access, y is "private" */
set_y(pt, 30); /* accessor needed */
printf("pt.x = %d, pt.y = %d\n", pt->x, pt->y); /* no accessor needed for reading "private" members */
delete_2dPoint(&pt);
return EXIT_SUCCESS;
}
And now, here is the question: is this trick OK with the C standard?
It works fine with GCC, and the compiler doesn't complain about anything, even with some strict flags, but how can I be sure that this is really OK?
This is almost certainly undefined behavior.
Writing/modifying an object declared as const is prohibited and doing so results in UB. Furthermore, the approach you take re-declares struct 2DPoint as two technically different types, which is also not permitted.
Note that this (as undefined behavior in general) does not mean that it "certainly won't work" or "it must crash". In fact, I find it quite logical that it works, because if one reads the source intelligently, he may easily find out what the purpose of it is and why it migh be regarded as correct. However, the compiler is not intelligent - at best, it's a finite automaton which has no knowledge about what the code is supposed to do; it only obeys (more or less) to the syntactical and semantical rules of the grammar.
This violates C 2011 6.2.7 1.
6.2.7 1 requires that two definitions of the same structure in different translation units have compatible type. It is not permitted to have const in one and not the other.
In one module, you may have a reference to one of these objects, and the members appear to be const to the compiler. When the compiler writes calls to functions in other modules, it may hold values from the const members in registers or other cache or in partially or fully evaluated expressions from later in the source code than the function call. Then, when the function modifies the member and returns, the original module will not have the changed value. Worse, it may use some combination of the changed value and the old value.
This is highly improper programming.
In Bjarne Stroustrup's words: C is not designed to support OOP, although it enables OOP, which means it is possible to write OOP programs in C, but only very hard to do so. As such, if you have to write OOP code in C, there seems nothing wrong with using this approach, but it is preferable to use a language better suited for the purpose.
By trying to write OOP code in C, you have already entered a territory where "common sense" has to be overridden, so this approach is fine as long as you take responsibility to use it properly. You also need to ensure that it is thoroughly and rigourously documented and everyone concerned with the code is aware of it.
Edit Oh, you may have to use a cast to get around the const. I fail to recall if the C-style cast can be used like C++ const_cast.
You can use different approach - declare two structs, one for user without private members (in header) and one with private members for internal use in your implementation unit. All private members should be placed after public ones.
You always pass around the pointer to the struct and cast it to internal-use when needed, like this:
/* user code */
struct foo {
int public;
};
int bar(void) {
struct foo *foo = new_foo();
foo->public = 10;
}
/* implementation */
struct foo_internal {
int public;
int private;
};
struct foo *new_foo(void) {
struct foo_internal *foo == malloc(sizeof(*foo));
foo->public = 1;
foo->private = 2;
return (struct foo*)foo; // to suppress warning
}
C11 allows unnamed structure fields (GCC supports it some time), so in case of using GCC (or C11 compliant compiler) you can declare internal structure as:
struct foo_internal {
struct foo;
int private;
};
therefore no extra effort required to keep structure definitions in sync.

Is it possible to cast pointers from a structure type to another structure type extending the first in C?

If I have structure definitions, for example, like these:
struct Base {
int foo;
};
struct Derived {
int foo; // int foo is common for both definitions
char *bar;
};
Can I do something like this?
void foobar(void *ptr) {
((struct Base *)ptr)->foo = 1;
}
struct Derived s;
foobar(&s);
In other words, can I cast the void pointer to Base * to access its foo member when its type is actually Derived *?
You should do
struct Base {
int foo;
};
struct Derived {
struct Base base;
char *bar;
};
to avoid breaking strict aliasing; it is a common misconception that C allows arbitrary casts of pointer types: although it will work as expected in most implementations, it's non-standard.
This also avoids any alignment incompatibilities due to usage of pragma directives.
Many real-world C programs assume the construct you show is safe, and there is an interpretation of the C standard (specifically, of the "common initial sequence" rule, C99 §6.5.2.3 p5) under which it is conforming. Unfortunately, in the five years since I originally answered this question, all the compilers I can easily get at (viz. GCC and Clang) have converged on a different, narrower interpretation of the common initial sequence rule, under which the construct you show provokes undefined behavior. Concretely, experiment with this program:
#include <stdio.h>
#include <string.h>
typedef struct A { int x; int y; } A;
typedef struct B { int x; int y; float z; } B;
typedef struct C { A a; float z; } C;
int testAB(A *a, B *b)
{
b->x = 1;
a->x = 2;
return b->x;
}
int testAC(A *a, C *c)
{
c->a.x = 1;
a->x = 2;
return c->a.x;
}
int main(void)
{
B bee;
C cee;
int r;
memset(&bee, 0, sizeof bee);
memset(&cee, 0, sizeof cee);
r = testAB((A *)&bee, &bee);
printf("testAB: r=%d bee.x=%d\n", r, bee.x);
r = testAC(&cee.a, &cee);
printf("testAC: r=%d cee.x=%d\n", r, cee.a.x);
return 0;
}
When compiling with optimization enabled (and without -fno-strict-aliasing), both GCC and Clang will assume that the two pointer arguments to testAB cannot point to the same object, so I get output like
testAB: r=1 bee.x=2
testAC: r=2 cee.x=2
They do not make that assumption for testAC, but — having previously been under the impression that testAB was required to be compiled as if its two arguments could point to the same object — I am no longer confident enough in my own understanding of the standard to say whether or not that is guaranteed to keep working.
That will work in this particular case. The foo field in the first member of both structures and hit has the same type. However this is not true in the general case of fields within a struct (that are not the first member). Items like alignment and packing can make this break in subtle ways.
As you seem to be aiming at Object Oriented Programming in C I can suggest you to have a look at the following link:
http://www.planetpdf.com/codecuts/pdfs/ooc.pdf
It goes into detail about ways of handling oop principles in ANSI C.
In particular cases this could work, but in general - no, because of the structure alignment.
You could use different #pragmas to make (actually, attempt to) the alignment identical - and then, yes, that would work.
If you're using microsoft visual studio, you might find this article useful.
There is another little thing that might be helpful or related to what you are doing ..
#define SHARED_DATA int id;
typedef union base_t {
SHARED_DATA;
window_t win;
list_t list;
button_t button;
}
typedef struct window_t {
SHARED_DATA;
int something;
void* blah;
}
typedef struct window_t {
SHARED_DATA;
int size;
}
typedef struct button_t {
SHARED_DATA;
int clicked;
}
Now you can put the shared properties into SHARED_DATA and handle the different types via the "superclass" packed into the union.. You could use SHARED_DATA to store just a 'class identifier' or store a pointer.. Either way it turned out handy for generic handling of event types for me at some point. Hope i'm not going too much off-topic with this
I know this is an old question, but in my view there is more that can be said and some of the other answers are incorrect.
Firstly, this cast:
(struct Base *)ptr
... is allowed, but only if the alignment requirements are met. On many compilers your two structures will have the same alignment requirements, and it's easy to verify in any case. If you get past this hurdle, the next is that the result of the cast is mostly unspecified - that is, there's no requirement in the C standard that the pointer once cast still refers to the same object (only after casting it back to the original type will it necessarily do so).
However, in practice, compilers for common systems usually make the result of a pointer cast refer to the same object.
(Pointer casts are covered in section 6.3.2.3 of both the C99 standard and the more recent C11 standard. The rules are essentially the same in both, I believe).
Finally, you've got the so called "strict aliasing" rules to contend with (C99/C11 6.5 paragraph 7); basically, you are not allowed to access an object of one type via a pointer of another type (with certain exceptions, which don't apply in your example). See "What is the strict-aliasing rule?", or for a very in-depth discussion, read my blog post on the subject.
In conclusion, what you attempt in your code is not guaranteed to work. It might be guaranteed to always work with certain compilers (and with certain compiler options), and it might work by chance with many compilers, but it certainly invokes undefined behavior according to the C language standard.
What you could do instead is this:
*((int *)ptr) = 1;
... I.e. since you know that the first member of the structure is an int, you just cast directly to int, which bypasses the aliasing problem since both types of struct do in fact contain an int at this address. You are relying on knowing the struct layout that the compiler will use and you are still relying on the non-standard semantics of pointer casting, but in practice this is significantly less likely you give you problems.
The great/bad thing about C is that you can cast just about anything -- the problem is, it might not work. :) However, in your case, it will*, since you have two structs whose first members are both of the same type; see this program for an example. Now, if struct derived had a different type as its first element -- for example, char *bar -- then no, you'd get weird behavior.
* I should qualitfy that with "almost always", I suppose; there're a lot of different C compilers out there, so some may have different behavior. However, I know it'll work in GCC.

Resources