I'm currently limited to coding in C and I want to do C object oriented programming.
One thing that comes to mind is how to correctly downcast a type in C without violating strict aliasing.
Imagine I have an animal struct with a vtable (meaning a struct of function pointers) and a dog like this:
typedef void (*sound_func)(const animal_t *animal);
struct animal_vtable {
sound_func sound;
};
typedef struct animal_vtable animal_vtable_t;
typedef struct animal {
animal_vtable_t * vtable;
int size;
} animal_t;
typedef struct dog {
animal_t animal;
} dog_t;
There will be cases when I want to know whether my animal is a dog, this is how I currently think of making an animal instance a dog, but I'm unsure if this will trigger undefined behavior or not.
dog_t *to_dog(animal_t *a) {
if (a->vtable != &dog_table) {
return NULL;
}
size_t offset = offsetof(dog_t, animal);
uintptr_t animal_offset = (uintptr_t) a;
return (dog_t *) (animal_offset - offset);
}
The key part here is that both the memory of dog_t * and animal_t * are on the same memory location for obvious reasons, but will this be a problem for optimizers? Currently I have -fno-strict-aliasing
enabled and thus I know it works, but is it safe to turn that off?
Below is the full working example which does not trigger errors when compiled with address and unefined behavior sanitizers.
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/*
* Animal section
*/
struct animal_vtable;
typedef struct animal_vtable animal_vtable_t;
typedef struct animal {
animal_vtable_t * vtable;
int size;
} animal_t;
typedef void (*sound_func)(const animal_t *animal);
struct animal_vtable {
sound_func sound;
};
void animal_sound(const animal_t* animal) {
animal->vtable->sound(animal);
}
int animal_size(const animal_t* animal) {
return animal->size;
}
/*
* dog section
*/
void dog_bark(const animal_t *animal);
static animal_vtable_t dog_table = {
.sound = dog_bark
};
typedef struct dog {
animal_t animal;
} dog_t;
dog_t* make_dog(int size) {
dog_t* dog = malloc(sizeof(dog_t));
if (dog == NULL) {
return dog;
}
dog->animal = (animal_t) { .vtable = &dog_table, .size = size };
return dog;
}
void dog_bark(const animal_t *animal) {
printf("wuff!\n");
}
dog_t *to_dog(animal_t *a) {
if (a->vtable != &dog_table) {
return NULL;
}
size_t offset = offsetof(dog_t, animal);
uintptr_t animal_offset = (uintptr_t) a;
return (dog_t *) animal_offset - offset;
}
/*
* main tests
*/
int main(int argc, char** argv) {
dog_t *dog = make_dog(10);
if (dog == NULL) {
exit(-1);
}
animal_t *animal = &(dog->animal);
animal_sound(animal);
dog_t *d2 = to_dog(animal);
printf("dog addr: %p, d2 addr: %p\n", dog, d2);
printf("dog size: %d\n", animal_size(&d2->animal));
printf("dog size: %d\n", animal_size(&dog->animal));
free(dog);
}
I'm unsure if this will trigger undefined behavior or not.
dog_t *to_dog(animal_t *a) {
if (a->vtable != &dog_table) {
return NULL;
}
size_t offset = offsetof(dog_t, animal);
uintptr_t animal_offset = (uintptr_t) a;
return (dog_t *) animal_offset - offset;
}
The expression (dog_t *) animal_offset - offset does not mean what you think it means. It is equivalent to ((dog_t *) animal_offset) - offset, whereas what you appear to want is (dog_t *) (animal_offset - offset) (and these are different).
But more generally, you are making it harder than it needs to be. Supposing that you implement inheritance as you seem inclined to do, by making the first member of the child type an instance of the parent type, you can perform the kind of pointer conversion you demonstrate via a simple cast: (dog_t *) a. The language specification guarantees that this is valid under the conditions described, supposing that a is in fact a pointer to the animal member of a dog_t. This is specified in C17, paragraph 6.7.2.1/15 (emphasis added):
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
Substantially the same wording appears in earlier versions of the standard, too.
As for
will this be a problem for optimizers? Currently I have
-fno-strict-aliasing enabled and thus I know it works, but is it safe to turn that off?
It should not be a problem for optimizers, provided that the definition of dog_t is visible in the translation unit. In that case, optimizers that are not deeply broken will know that pointers to dog_t and pointers to animal_t can alias each other.
However, the definition of dog_t being visible is a requirement for use of offsetof, but not a requirement for the pointer cast, so that may be something to watch out for. Also, it's not just this code where you need to watch out for aliasing issues. For safety relative to strict aliasing, every function that accesses pointers to both types will need to have the definition of dog_t visible.
Related
I define relative pointer to mean what Ginger Bill describes as Self-Relative Pointers:
... define the base [to which an offset will be applied] to be the memory address of the offset itself
For example, consider this struct:
struct house {
int32_t weight;
}
struct person {
int32_t age;
struct house* residence;
}
int32_t getPersonsHousesWeight(struct person* p) {
return p->residence->weight;
}
The relative-pointer implementation of the same thing in C that I think might work is:
struct house { ... } // same as before
struct person {
int32_t age;
int64_t residence; // an offset from the person's address in memory
}
int32_t getPersonsHousesWeight(struct person* p) {
return ((struct residence*)((char*)p + (p->residence)))->weight;
}
Assuming that alignment of everything is good (all 8 bytes), is this free of undefined behavior?
EDIT
#tstanisl has provided an excellent answer (which I've accepted) that thoroughly explains UB in the context of stack allocations. I am curious how allocation into a large slab of contiguous heap would impact this analysis. For example:
int foo(void) {
char* base = mmap(NULL,4096,PROT_WRITE | PROT_READ,-1,MAP_PRIVATE | MAP_ANONYMOUS);
// Omitting mmap error checking
struct person* myPerson = (struct person*)(base + 128);
struct house* myHouse = (struct house*)(base + 256);
int32_t delta = (char*)myHouse - (char*)myPerson;
// Does the computation of delta invoke UB?
}
Usually it is going to be UB.
The first case is when person and house belong to separate object.
In such a case it will be UB because the pointer arithmetics is performed outside of the object.
int foo(void) {
struct person p;
struct house h;
p.residence = (char*)&h - (char*)&p; // already UB
getPersonsHousesWeight(&p); // UB again
}
In practice it means that the compiler is not obligated to notice that objects accessed from a pointers constructed from &p can alias with object h because p and h are separete memory regions (aka objects).
When both objects are placed inside a larger object then the situation is a bit better. Though it still would be technical UB.
int foo(void) {
struct ph {
struct person p;
struct house h;
} ph;
ph.p.residence = (char*)&ph.h - (char*)&ph.p; // still UB
getPersonsHousesWeight(&ph.p); // UB again
}
It UB because pointer arithmetic is done outside the member object.
(char*)&ph.h - 1 is a pointer outside of ph.h.
Note, that this code will likely work pretty much everywhere.
Otherwise, heavily used container_of-like macros would not work breaking a lot of existing code including the Linux kernel.
To avoid UB the pointer must be constructed in a special way to avoid moving outside of the originating object.
Rather using &ph.h one should use (char*)&ph + offsetof(struct ph, h).
Similarly &ph.p should be replaced with (char*)&ph + offsetof(struct ph, p).
Now this code should be portable:
int foo(void) {
struct ph {
struct person p;
struct house h;
} ph;
struct person *p_ptr = (struct person*)((char*)&ph + offsetof(struct ph, p));
struct house *h_ptr = (struct house*) ((char*)&ph + offsetof(struct ph, h));
ph.p.residence = (char*)h_ptr - (char*)p_ptr;
getPersonsHousesWeight(p_ptr);
}
Though it is very obscure.
The interesting discussion on this topic can be found at link
I have heard conflicting things about the extent to which the C standards guarantee structure layout consistency. Arguments for a limited extent have mentioned strict aliasing rules. For example, compare these two answers: https://stackoverflow.com/a/3766251/1306666 and https://stackoverflow.com/a/3766967/1306666.
In the following code I assume in all structures foo, bar, and struct { char *id; } that char *id is in the same place, making it safe to cast between them if it is the only member accessed.
Regardless of whether the cast will ever result in an error, does it violate strict aliasing rules?
#include <string.h>
struct foo {
char *id;
int a;
};
struct bar {
char *id;
int x, y, z;
};
struct list {
struct list *next;
union {
struct foo *foop;
struct bar *barp;
void *either;
} ptr;
};
struct list *find_id(struct list *l, char *key)
{
while (l != NULL) {
/* cast to anonymous struct and dereferenced */
if (!strcmp(((struct { char *id; } *)(l->ptr.either))->id, key))
return l;
l = l->next;
}
return NULL;
}
gcc -o /dev/null -Wstrict-aliasing test.c
Note gcc gives no errors.
Yes, there are multiple aliasing-related issues in your program. The use of the lvalue with anonymous structure type, which does not match the type of the underlying object, results in undefined behavior. It could be fixed with something like:
*(char**)((char *)either + offsetof(struct { ... char *id; ... }, id))
if you know the id member is at the same offset in all of them (e.g. they all share same prefix). But in your specific case where it's the first member you can just do:
*(char**)either
because it's always valid to convert a pointer to a struct to a pointer to its first member (and back).
A separate issue is that your use of the union is wrong. The biggest issue is that it assumes struct foo *, struct bar *, and void * all have the same size and representation, which is not guaranteed. Also, it's arguably undefined to access a member of the union other than the one which was previously stored, but as a result of interpretations in defect reports, it's probably safe to say it's equivalent to a "reinterpret cast". But that gets you back to the issue of wrongly assuming same size/representation.
You should just remove the union, use a void * member, and convert the value (rather than reinterpret the bits) to the right pointer type to access the pointed-to structure (struct foo * or struct bar *) or its initial id field (char *).
I've been trying to work out how legal the below is and I could really use some help.
#include <stdio.h>
#include <stdlib.h>
typedef struct foo {
int foo;
int bar;
} foo;
void make_foo(void * p)
{
foo * this = (foo *)p;
this->foo = 0;
this->bar = 1;
}
typedef struct more_foo {
int foo;
int bar;
int more;
} more_foo;
void make_more_foo(void * p)
{
make_foo(p);
more_foo * this = (more_foo *)p;
this->more = 2;
}
int main(void)
{
more_foo * mf = malloc(sizeof(more_foo));
make_more_foo(mf);
printf("%d %d %d\n", mf->foo, mf->bar, mf->more);
return 0;
}
As far as I've gathered, doing this is type punning and is supposed to violate the strict aliasing rule. Does it, though? The pointers passed around are void. You are allowed to interpret a void pointer any way you wish, correct?
Also, I read that there may be memory alignment issues. But struct alignment is deterministic. If the initial members are the same, then they'll get aligned the same way, and there should be no problems accessing all foo members from a more_foo pointer. Is that correct?
GCC compiles with -Wall without warnings, the program runs as expected. However, I'm not sure if it's UB or not and why.
I also saw that this:
typedef union baz {
struct foo f;
struct more_foo mf;
} baz;
void some_func(void)
{
baz b;
more_foo * mf = &b.mf; // or more_foo * mf = (more_foo *)&b;
make_more_foo(mf);
printf("%d %d %d\n", mf->foo, mf->bar, mf->more);
}
seems to be allowed. Because of the polymorphic nature of unions the compiler would be ok with it. Is that correct? Does that mean that by compiling with strict aliasing off you don't have to use an union and can use only structs instead?
Edit: union baz now compiles.
The authors of the Standard didn't think it necessary to specify any means by which an lvalue of a struct or union's member type may be used to access the underlying struct or union. The way N1570 6.5p7 is written doesn't even allow for someStruct.member = 4; unless member if of character type. Being able to apply the & operator to struct and union members wouldn't make any sense, however, unless the authors of the Standard expected that the resulting pointers would be useful for something. Given footnote 88: "The intent of this list is to specify those circumstances in which an object may or may not be aliased", the most logical expectation is that it was only intended to apply in cases where lvalues' useful lifetimes would overlap in ways that would involve aliasing.
Consider the two functions within the code below:
struct s1 {int x;};
struct s2 {int x;};
union {struct s1 v1; struct s2 v2;} arr[10];
void test1(int i, int j)
{
int result;
{ struct s1 *p1 = &arr[i].v1; result = p1->x; }
if (result)
{ struct s2 *p2 = &arr[j].v2; p2->x = 2; }
{ struct s1 *p3 = &arr[i].v1; result = p3->x; }
return result;
}
void test2(int i, int j)
{
int result;
struct s1 *p1 = &arr[i].v1; result = p1->x;
if (result)
{ struct s2 *p2 = &arr[j].v2; p2->x = 2; }
result = p1->x; }
return result;
}
In the test1, even if i==j, all pointer that will ever be accessed during p1's lifetime will be accessed through p1, so p1 won't alias anything. Likewise with p2 and p3. Thus, since there is no aliasing, there should be no problem if i==j. In test2, however, if i==j, then the creation of p1 and the last use of it to access p1->x would be separated by another action which access that storage with a pointer not derived from p1. Consequently, if i==j, then the access via p2 would alias p1, and per N1570 5.6p7 a compiler would not be required to allow for that possibility.
If the rules of 5.6p7 are applicable even in cases that don't involve actual aliasing, then structures and unions would be pretty useless. If they only apply in cases that do involve actual aliasing, then a lot of needless complexity like the "Effective Type" rules could be done away with. Unfortunately, some compilers like gcc and clang use the rules to justify "optimizing" the first function above and then assuming that they don't have to worry about the resulting alias which is present in their "optimized" version but wasn't in the original.
Your code will work fine in any compiler whose authors make any effort to recognize derived lvalues. Both gcc and clang, however, will botch even the test1() function above unless they are invoked with the -fno-strict-aliasing flag. Given that the Standard doesn't even allow for someStruct.member = 4;, I'd suggest that you refrain from the kind of aliasing seen in test2() above and not bother targeting compilers that can't even handle test1().
I'd say it isn't strict since if you change "foo" structure, "more foo" structure will have to change with it . "foo" must become the base of "more foo", this is inheritance, not quite polymorphism. But you can use function pointers to introduce polymorphism to help with these structures.
Example
#include <stdio.h>
#include <stdlib.h>
#define NEW(x) (x*)malloc(sizeof(x));
typedef struct
{
void(*printme)(void*);
int _foo;
int bar;
} foo;
typedef struct
{
// inherits foo
foo base;
int more;
} more_foo;
void foo_print(void *t)
{
foo *this = (foo*)t;
printf("[foo]\r\n\tfoo=%d\r\n\tbar=%d\r\n[/foo]\r\n", this->bar, this->_foo);
}
void more_foo_print(void *t)
{
more_foo *this = t;
printf("[more foo]\r\n");
foo_print(&this->base);
printf("\tmore=%d\r\n", this->more);
printf("[/more foo]\r\n");
}
void foo_construct( foo *this, int foo, int bar )
{
this->_foo = foo;
this->bar = bar;
this->printme = foo_print;
}
void more_foo_construct(more_foo *t, int _foo, int bar, int more)
{
foo_construct((foo*)t, _foo, bar);
t->more = more;
// Overrides printme
t->base.printme = more_foo_print;
}
more_foo *new_more_foo(int _foo, int bar, int more)
{
more_foo * new_mf = NEW(more_foo);
more_foo_construct(new_mf, _foo, bar, more);
return new_mf;
}
foo *new_foo(int _foo, int bar)
{
foo *new_f = NEW(foo);
foo_construct(new_f, _foo, bar);
return new_f;
}
int main(void)
{
foo * mf = (foo*)new_more_foo(1, 2, 3);
foo * f = new_foo(7,8);
mf->printme(mf);
f->printme(f);
return 0;
}
printme() is overridden when creating "more foo". (polymorphism)
more_foo includes foo as a base structure (inheritance) so when "foo" structure changes, "more foo" changes with it (example new values added).
more_foo can be cast as "foo".
The question is based on a design pattern solution easily doable in other languages but difficult to implement in C. The narrowed down code is below.
Building on this answer, I'm trying to find a solution for the dynamically generated values in an anonymous function.
Excerpt from the answer:
int (*max)(int, int) =
({
int __fn__ (int x, int y) { return x > y ? x : y; }
__fn__;
});
Static Library Code
struct Super{
}
void add(struct Super *(*superRef)()) {
// cache the reference (in some linked list)
// later at some point when an event occurs.
struct Super *super = superRef(); // instantiate and use it.
}
Client Code linked: User of the Library Code
struct Sub{
struct Super *super;
}
add(({
struct Sub __fn__() { return malloc(sizeof(struct Sub)); } // error
__fn__;
}));
Error:
error: passing 'void' to parameter of incompatible type 'struct Sub *(*)()
As per the request for clarification, think of the receiving function in a static library file receiving references to the structure objects (non-instantiated). The lib receives this object from the client code.
Secondly the client or static library library doesn't instantiate the received structure reference right away. Later when there's a notification in the system, the structure reference is called to instantiate and execute the rest of the stuff.
I repeat, the specific requirement is to hold non-instantiated references to the structures passed by users of the library (client code).
Summary
Basically a Runner that receives pointer to a polymorphic factory method which it caches and later calls to instantiate and executes when an event occurs.
The correct order is:
learn C
do magic
It just will not work in the other way. ({}) does not bend the semantics for you. If your add expects a function which returns struct Super*, it will not work with struct Sub, not even if you put the missing * there.
This just works on TutorialsPoint:
#include <stdio.h>
#include <stdlib.h>
int max(int a,int b){
if(a>b)
return a;
return b;
}
struct Super{};
void add(struct Super *(*superRef)()) {
struct Super *(*secretStorage)()=superRef;
/* ... */
struct Super *super = secretStorage();
/* ... */
free(super);
printf("Stillalive\n");
}
int main()
{
printf("Hello, World!\n");
int (*myMax)(int,int); // <-- that is a function pointer
myMax=max; // <-- set with oldschool function
printf("%d\n",myMax(1,2));
myMax = ({ // <-- set with fancy magic
int __fn__ (int x, int y) { return x < y ? x : y; }
__fn__;
});
printf("%d - intentionally wrong\n",myMax(1,2));
add(
({
struct Super* fn(){
printf("Iamhere\n");
return malloc(sizeof(struct Super));
}
fn;}));
printf("Byfornow\n");
return 0;
}
Created a small library project with anonymous magic embedded in anonymous magic and heap allocation. It does not make much sense, but it works:
testlib.h
#ifndef TESTLIB_H_
#define TESTLIB_H_
struct Testruct{
const char *message;
void (*printmessage)(const char *message);
};
extern struct Testruct *(*nonsense())();
#endif
testlib.c
#include "testlib.h"
#include <stdio.h>
#include <stdlib.h>
const char *HELLO="Hello World\n";
struct Testruct *(*nonsense())(){
return ({
struct Testruct *magic(){
struct Testruct *retval=malloc(sizeof(struct Testruct));
retval->message=HELLO;
retval->printmessage=({
void magic(const char *message){
printf(message);
}
magic;
});
return retval;
}
magic;
});
}
test.c
#include "testlib.h"
#include <stdio.h>
#include <stdlib.h>
int main(){
struct Testruct *(*factory)()=nonsense();
printf("Alive\n");
struct Testruct *stuff=factory();
printf("Alive\n");
stuff->printmessage(stuff->message);
printf("Alive\n");
free(stuff);
printf("Alive\n");
return 0;
}
I followed the steps in https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html for building an running it (practically 3 gcc calls: gcc -c -Wall -Werror -fpic testlib.c, gcc -shared -o libtestlib.so testlib.o, gcc -L. -Wall -o test test.c -ltestlib and a bit of fight with LD_LIBRARY_PATH)
The code shown in the question is not standard C, but the GNU C variant that GCC supports. Unfortunately, there does not seem to be a gnu-c tag, to correctly specify the variant of C involved.
Furthermore, the use case seems to rely on shoehorning specific type of object-oriented paradigm into a C library interface. This is horrible, because it involves assumptions and features C simply does not have. There is a reason why C (and GNU-C) and C++ and Objective-C are different programming languages.
The simple answer to "functions returning dynamically allocated values" where the type of the value is opaque to the library, is to use void *, and for function pointers, (void *)(). Note that in POSIX C, void * can also hold a function pointer.
The more complex answer would describe how libraries like GObject support object-oriented paradigms in C.
In practice, especially in POSIX C, using a type tag (usually int, but can be any other type) and an union, one can implement polymorphic structures, based on an union of structures with all having that type tag as the same first element. The most common example of such functionality is struct sockaddr.
Basically, your header file defines one or more structures with the same initial member, for example
enum {
MYOBJECT_TYPE_DOUBLE,
MYOBJECT_TYPE_VOID_FUNCTION,
};
struct myobject_double {
int type; /* MYOBJECT_TYPE_DOUBLE */
double value;
};
struct myobject_void_function {
int type; /* MYOBJECT_TYPE_VOID_FUNCTION */
void (*value)();
};
and at the end, an union type, or a structure type with an anonymous union (as provided by C11 or GNU-C), of all the structure types,
struct myobject {
union {
struct { int type; }; /* for direct 'type' member access */
struct myobject_double as_double;
struct myobject_void_function as_void_function;
};
};
Note that technically, wherever that union is visible, it is valid to cast any pointer of any of those structure types to another of those structure types, and access the type member (see C11 6.5.2.3p6). It is not necessary to use the union at all, it suffices for the union to be defined and visible.
Still, for ease of maintenance (and to avoid arguments with language lawyer wannabes who did not read that paragraph in the C standard), I do recommend using the structure containing the anonymous union as the "base" type in the library interface.
For example, the library might provide a function to return the actual size of some object:
size_t myobject_size(struct myobject *obj)
{
if (obj)
switch (obj->type) {
case MYOBJECT_TYPE_DOUBLE: return sizeof (struct myobject_double);
case MYOBJECT_TYPE_VOID_FUNCTION: return sizeof (struct myobject_void_function);
}
errno = EINVAL;
return 0;
}
It seems to me OP is trying to implement a factory pattern, where the library function provides the specification (class in OOP) for the object created, and a method to produce those objects later.
The only way in C to implement dynamic typing is via the kind of polymorphism I show above. This means that the specification for the future objects (again, class in OOP) must be an ordinary object itself.
The factory pattern itself is pretty easy to implement in standard C. The library header file contains for example
#include <stdlib.h>
/*
* Generic, application-visible stuff
*/
struct any_factory {
/* Function to create an object */
void *(*produce)(struct any_factory *);
/* Function to discard this factory */
void (*retire)(struct any_factory *);
/* Flexible array member; the actual
size of this structure varies. */
unsigned long payload[];
};
static inline void *factory_produce(struct any_factory *factory)
{
if (factory && factory->produce)
return factory->produce(factory);
/* C has no exceptions, but does have thread-local 'errno'.
The error codes do vary from system to system. */
errno = EINVAL;
return NULL;
}
static inline void factory_retire(struct any_factory *factory)
{
if (factory) {
if (factory->retire) {
factory->retire(factory);
} else {
/* Optional: Poison function pointers, to easily
detect use-after-free bugs. */
factory->produce = NULL;
factory->retire = NULL; /* Already NULL, too. */
/* Free the factory object. */
free(factory);
}
}
}
/*
* Library function.
*
* This one takes a pointer and size in chars, and returns
* a factory object that produces dynamically allocated
* copies of the data.
*/
struct any_factory *mem_factory(const void *, const size_t);
where factory_produce() is a helper function which invokes the factory to produce one object, and factory_retire() retires (discards/frees) the factory itself. Aside from the extra error checking, factory_produce(factory) is equivalent to (factory)->produce(factory), and factory_retire(factory) to (factory)->retire(factory).
The mem_factory(ptr, len) function is an example of a factory function provided by a library. It creates a factory, that produces dynamically allocated copies of the data seen at the time of the mem_factory() call.
The library implementation itself would be something along the lines of
#include <stdlib.h>
#include <string.h>
#include <errno.h>
struct mem_factory {
void *(*produce)(struct any_factory *);
void (*retire)(struct any_factory *);
size_t size;
unsigned char data[];
};
/* The visibility of this union ensures the initial sequences
in the structures are compatible; see C11 6.5.2.3p6.
Essentially, this causes the casts between these structure
types, for accessing their initial common members, valid. */
union factory_union {
struct any_factory any;
struct mem_factory mem;
};
static void *mem_producer(struct any_factory *any)
{
if (any) {
struct mem_factory *mem = (struct mem_factory *)any;
/* We return a dynamically allocated copy of the data,
padded with 8 to 15 zeros.. for no reason. */
const size_t size = (mem->size | 7) + 9;
char *result;
result = malloc(size);
if (!result) {
errno = ENOMEM;
return NULL;
}
/* Clear the padding. */
memset(result + size - 16, 0, 16);
/* Copy the data, if any. */
if (mem->size)
memcpy(result, mem->data, size);
/* Done. */
return result;
}
errno = EINVAL;
return NULL;
}
static void mem_retirer(struct any_factory *any)
{
if (any) {
struct mem_factory *mem = (struct mem_factory *)any;
mem->produce = NULL;
mem->retire = NULL;
mem->size = 0;
free(mem);
}
}
/* The only exported function:
*/
struct any_factory *mem_factory(const void *src, const size_t len)
{
struct mem_factory *mem;
if (len && !src) {
errno = EINVAL;
return NULL;
}
mem = malloc(len + sizeof (struct mem_factory));
if (!mem) {
errno = ENOMEM;
return NULL;
}
mem->produce = mem_producer;
mem->retire = mem_retirer;
mem->size = len;
if (len > 0)
memcpy(mem->data, src, len);
return (struct any_factory *)mem;
}
Essentially, the struct any_factory type is actually polymorphic (not in the application, but within the library only). All its variants (struct mem_factory here) has the two initial function pointers in common.
Now, if we examine the code above, and consider the factory pattern, you should realize that the function pointers provide very little of value: you could just use the polymorphic type I showed earlier in this answer, and have the inline producer and consumer functions call subtype-specific internal functions based on the type of the factory. factory.h:
#ifndef FACTORY_H
#define FACTORY_H
#include <stdlib.h>
struct factory {
/* Common member across all factory types */
const int type;
/* Flexible array member to stop applications
from declaring static factories. */
const unsigned long data[];
};
/* Generic producer function */
void *produce(const struct factory *);
/* Generic factory discard function */
void retire(struct factory *);
/*
* Library functions that return factories.
*/
struct factory *mem_factory(const void *, const size_t);
#endif /* FACTORY_H */
and factory.c:
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include "factory.h"
enum {
INVALID_FACTORY = 0,
/* List of known factory types */
MEM_FACTORY,
/* 1+(the highest known factory type) */
NUM_FACTORY_TYPES
};
struct mem_factory {
int type;
size_t size;
char data[];
};
/* The visibility of this union ensures the initial sequences
in the structures are compatible; see C11 6.5.2.3p6.
Essentially, this causes the casts between these structure
types, for accessing their initial common members, valid. */
union all_factories {
struct factory factory;
struct mem_factory mem_factory;
};
/* All factories thus far implemented
are a single structure dynamically
allocated, which makes retiring simple.
*/
void retire(struct factory *factory)
{
if (factory &&
factory->type > INVALID_FACTORY &&
factory->type < NUM_FACTORY_TYPES) {
/* Poison factory type, to make it easier
to detect use-after-free bugs. */
factory->type = INVALID_FACTORY;
free(factory);
}
}
char *mem_producer(struct mem_factory *mem)
{
/* As a courtesy for users, return the memory
padded to a length multiple of 16 chars
with zeroes. No real reason to do this. */
const size_t size = (mem->size | 7) + 9;
char *result;
result = malloc(size);
if (!result) {
errno = ENOMEM;
return NULL;
}
/* Clear padding. */
memset(result + size - 16, 0, 16);
/* Copy data, if any. */
if (mem->size)
memcpy(result, mem->data, mem->size);
return result;
}
/* Generic producer function.
Calls the proper individual producers.
*/
void *factory_producer(struct factory *factory)
{
if (!factory) {
errno = EINVAL;
return NULL;
}
switch (factory->type) {
case mem_factory:
return mem_producer((struct mem_factory *)factory);
default:
errno = EINVAL;
return NULL;
}
}
/* Library functions that return factories.
*/
struct factory *mem_factory(const void *ptr, const size_t len)
{
struct mem_factory *mem;
if (!ptr && len > 0) {
errno = EINVAL;
return NULL;
}
mem = malloc(len + sizeof (struct mem_factory));
if (!mem) {
errno = ENOMEM;
return NULL;
}
mem->type = MEM_FACTORY;
mem->size = len;
if (len > 0)
memcpy(mem->data, ptr, len);
return (struct factory *)mem;
}
If we look at standard C and POSIX C library implementations, we'll see that both of these approaches are used.
The standard I/O FILE structure often contains function pointers, and the fopen(), fread(), fwrite(), etc. functions are just wrappers around these. This is especially the case if the C library supports an interface similar to GNU fopencookie().
POSIX.1 socket, especially the struct sockaddr type, is the original prototype for the polymorphic structure shown first in this answer. Because their interface does not support anything similar to fopencookie() (that is, overriding the implementation of e.g. send(), recv(), read(), write(), close()), there is no need for the function pointers.
So, please do not ask which one is more suitable, as both are very commonly used, and it very much depends on minute details.. In general, I prefer the one that yields a simpler implementation providing all the necessary functionality.
I have personally found that it is not that useful to worry about future use cases without practical experience and feedback first. Rather than trying to create the end-all, best-ever framework that solves all future problems, the KISS principle and the Unix philosophy seem to yield much better results.
(Quoting your accepted answer to yourself)
Secondly a pointer to a parent struct can't receive a pointer to it's derived type (Embedded parent struct) so I can't do much there. I tried using void * but perhaps a solution might exists using memory address and then access some member of the struct without casting to specific types. I'll ask that in another question.
This is yet another pointer that one should learn the basics first. The thing you miss is called 'forward declaration':
struct chicken; // here we tell the compiler that 'struct chicken' is a thing
struct egg{
struct chicken *laidby; // while the compiler knows no details about 'struct chicken',
// its existence is enough to have pointers for it
};
struct chicken{ // and later it has to be declared properly
struct egg *myeggs;
};
What I'm missing is the ability to call the super method from the overridden run method in some way?
These are not methods and there is no override. In your code no OOP happens, C is a procedural programming language. While there are OOP extensions for C, you really should not go for them without knowing C basics.
First community told me that anonymous functions are not part of C, so the alternate suggestion is to use named functions and pointer to it.
Secondly a pointer to a parent struct can't receive a pointer to it's derived type (Embedded parent struct) so I can't do much there. I tried using void * but perhaps a solution might exists using memory address and then access some member of the struct without casting to specific types. I'll ask that in another question.
What I'm missing is the ability to call the super method from the overridden run method in some way?
src/super.h
struct Super {
void (*run)();
};
struct Super *newSuper();
src/super.c
static void run() {
printf("Running super struct\n");
}
struct Super *newSuper() {
struct Super *super = malloc(sizeof(struct Super));
super->run = run;
return super;
}
src/Runner.h
struct Runner {
void (*addFactoryMethod)(struct Super *(*ref)());
void (*execute)();
};
struct Runner *newRunner();
src/runner.c
struct Super *(*superFactory)();
void addFactoryMethod(struct Super *(*ref)()) {
superFactory = ref;
}
static void execute() {
struct Super *sup = superFactory(); // calling cached factory method
sup->run();
}
struct Runner *newRunner() {
struct Runner *runner = malloc(sizeof(struct Runner));
runner->addFactoryMethod = addFactoryMethod;
runner->execute = execute;
return runner;
}
test/runner_test.c
void anotherRunMethod() {
printf("polymorphism working\n");
// how can i've the ability to call the overridden super method in here?
}
struct Super *newAnotherSuper() {
struct Super *super = malloc(sizeof(struct Super));
super->run = anotherRunMethod;
return super;
}
void testSuper() {
struct Runner *runner = newRunner();
runner->addFactoryMethod(&newAnotherSuper);
runner->execute();
}
int main() {
testSuper();
return 0;
}
I have a working C code when compiled using GCC, but I am trying to find out if the code works because of pure luck or because GCC handles this code as I expect by design.
NOTE
I am not trying to "fix" it. I am trying to understand the compiler
Here is what I have:
iexample.h
#ifndef IEXAMPLE_H_
#define IEXAMPLE_H_
/* The interface */
struct MyIf
{
int (* init)(struct MyIf* obj);
int (* push)(struct MyIf* obj, int x);
void (* sort)(struct MyIf* obj);
};
/* The object, can be in different header */
struct Obj1
{
struct MyIf myinterface;
int val1;
int val2;
};
struct Obj1* newObj1();
#endif
iexample.c
#include <stdio.h>
#include <stdlib.h>
#include "iexample.h"
/* Functions here are "equivalent" to methods on the Obj1 struct */
int Obj1_init(struct Obj1* obj)
{
printf("Obj1_init()\n");
return 0;
}
int Obj1_push(struct Obj1* obj, int x)
{
printf("Obj1_push()\n");
return 0;
}
void Obj1_sort(struct Obj1* obj)
{
printf("Obj1_sort()\n");
}
struct Obj1* newObj1()
{
struct Obj1* obj = malloc(sizeof(struct Obj1));
obj->myinterface.init = Obj1_init;
obj->myinterface.push = Obj1_push;
obj->myinterface.sort = Obj1_sort;
return obj;
}
main.c
#include "iexample.h"
int main(int argc, char* argv[])
{
struct MyIf* myIf = (struct MyIf*) newObj1();
myIf->init(myIf);
myIf->push(myIf, 3);
myIf->sort(myIf);
/* ... free, return ... */
}
When I compile, as I expect, I get for assigning the pointers in newObj1(),
warning: assignment from incompatible pointer type
The code works as long as I have the "struct MyIf myinterface" to be the first member of the struct, which is by design (I like to shoot myself in the foot)
Now, although I am assigning incompatible pointer types, and the C spec says behavior is undefined, does GCC or other compilers make any design claim on how this case is handled? I can almost swear that this OUGHT TO WORK due to how struct memory is laid out, but I cannot find the proof.
Thanks
C11 standard 6.7.2.1 Structure and union specifiers:
Within a structure object, the non-bit-field members and the
units in which bit-fields reside have addresses that increase in
the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within
a structure object, but not at its beginning.
So it should work as long, as you access only first structure member. However, I believe you understand, that this is pretty bad idea. Should you port this code to C++ and make some Obj1 member virtual, this will immediately fail.