Using different struct definitions to simulate public and private fields in C - c

I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.
The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).
- example-project
- cmake-build-debug
- example-lib-name
- include
- example-lib-name
- example-header-file.h
- src
- example-lib-name
- example-source-file.c
- CMakeLists.txt
- CMakeLists.txt
- main.c
Let's say that example-header-file.h contains:
typedef struct ExampleStruct {
int data;
} ExampleStruct;
ExampleStruct* new_example_struct(int, double);
which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.
Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling
ExampleStruct* new_struct = new_example_struct(<int>, <double>);,
and will be able to access the data property like: new_struct->data.
However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.
So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.
In our implementation file (example-source-file.c), let's say we have the following code:
#include <stdlib.h>
#include <stdbool.h>
typedef struct ExampleStruct {
int data;
double val;
} ExampleStruct;
ExampleStruct* new_example_struct(int data, double val) {
ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
example_struct->data=data;
example_struct->val=val;
return new_example_struct;
}
double get_val(ExampleStruct* e) {
return e->val;
}
This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:
#include <stdio.h>
#include "example-lib-name/example-header-file.h"
int main() {
printf("Hello, World!\n");
ExampleStruct* test = new_example(6, 7.2);
printf("%d\n", test->data); // <-- THIS WORKS
double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
printf("%f\n", x); //
// printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
return 0;
}
I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.
Some things I see that may be cause for concern:
This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.
Questions I'd like answers on:
Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.

Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Yes: your approach produces undefined behavior.
C requires that
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(C17 6.2.7/2)
and that
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
[...]
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")
Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.
Other alternatives:
Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.

As long as the public has a complete definition for ExampleStruct, it can make code like:
ExampleStruct a = *new_example_struct(42, 1.234);
Then the below will certainly fail.
printf("%g\n", get_val(&a));
I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.
Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)

Are there significant performance penalties I may suffer as a result of writing code this way?
Probably:
Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.
i.e. is there a simpler way to do this
Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:
#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)
typedef struct ExampleStruct {
int data;
_Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;
This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).
The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.
An example of the use of such an approach can be found in CUDA's driver API:
Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer
The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.

This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.
In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.
struct example { int data; };
struct example *new_example(int, double);
double example_val(struct example *e);
and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)
#include <stdlib.h>
#include "example.h"
struct private_example {
struct example public;
double val;
};
struct example *new_example(int data, double val) {
struct private_example *const example = malloc(sizeof *example);
if(!example) return 0;
example->public.data = data;
example->val = val;
return &example->public;
}
/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
return (struct private_example *)(void *)
((char *)example - offsetof(struct private_example, public));
}
double example_val(struct example *e) {
return example_upcast(e)->val;
}
Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.

Related

C struct information hiding (Opaque pointer)

I'm currently a bit confused regarding the concept of information hiding of C-structs.
The backround of this question is an embedded c project with nearly zero knowledge of OOP.
Up until now I always declared my typedef structs inside the header file of the corresponding module.
So every module which wants to use this struct knows the struct type.
But after a MISRA-C check I discovered the medium severity warning: MISRAC2012-Dir-4.8
- The implementation of a structure is unnecessarily exposed to a translation unit.
After a bit of research I discovered the concept of information hiding of C-structs by limiting the visible access of the struct members to private scope.
I promptly tried a simple example which goes like this:
struct_test.h
//struct _structName;
typedef struct _structName structType_t;
struct_test.c
#include "struct_test.h"
typedef struct _structName
{
int varA;
int varB;
char varC;
}structType_t;
main.c
#include "struct_test.h"
structType_t myTest;
myTest.varA = 0;
myTest.varB = 1;
myTest.varC = 'c';
This yields the compiler error, that for main.c the size of myTest is unknown.
And of course it is, main.c has only knowledge that a struct of the type structType_t exists and nothing else.
So I continued my research and stumbled upon the concept of opaque pointers.
So I tried a second attempt:
struct_test.h
typedef struct _structName *myStruct_t;
struct_test.c
#include "struct_test.h"
typedef struct _structName
{
int varA;
int varB;
char varC;
}structType_t;
main.c
#include "struct_test.h"
myStruct_t myTest;
myTest->varA = 1;
And I get the compiler error: dereferencing pointer to incomplete type struct _structName
So obviously I haven't understood the basic concept of this technique.
My main point of confusion is where the data of the struct object will?
Up until now I had the understanding that a pointer usually points to a "physical" representation of the datatype and reads/writes the content on the corresponding address.
But with the method above, I declare a pointer myTest but never set an address where it should point to.
I took the idea from this post:
What is an opaque pointer in C?
In the post it is mentioned, that the access is handled with set/get interface methods so I tried adding one similiar like this:
void setVarA ( _structName *ptr, int valueA )
{
ptr->varA = valueA;
}
But this also doesn't work because now he tells me that _structName is unknown...
So can I only access the struct with the help of additional interface methods and, if yes, how can I achieve this in my simple example?
And my bigger question still remains where the object of my struct is located in memory.
I only know the pointer concept:
varA - Address: 10 - Value: 1
ptrA - Address: 22 - Value: 10
But in this example I only have
myTest - Address: xy - Value: ??
I have trouble understanding where the "physical" representation of the corresponding myTest pointer is located?
Furthermore I can not see the benefits of doing it like this in relatively small scope embedded projects where I am the producer and consumer of the modules.
Can someone explain me if this method is really reasonable for small to mid scale embedded projects with 1-2 developers working with the code?
Currently it seems like more effort to make all this interface pointer methods than just declaring the struct in my header-file.
Thank you in advance
My main point of confusion is where the data of the struct object will?
The point is that you do not use the struct representation (i.e. its size, fields, layout, etc.) in other translation units, but rather call functions that do the work for you. You need to use an opaque pointer for that, yes.
how can I achieve this in my simple example?
You have to put all the functions that use the struct fields (the real struct) in one file (the implementation). Then, in a header, expose only the interface (the functions that you want users to call, and those take an opaque pointer). Finally, users will use the header to call only those functions. They won't be able to call any other function and they won't be able to know what is inside the struct, so code trying to do that won't compile (that is the point!).
Furthermore I can not see the benefits of doing it like this in relatively small scope embedded projects where I am the producer and consumer of the modules.
It is a way to force modules to be independent of each other. Sometimes it is used to hide implementations to customers or to be able to guarantee ABI stability.
But yes, for internal usage, it is usually a burden (and hinders optimization, since everything becomes a black box to the compiler except if you use LTO etc.). A syntactic approach like public/private in other languages like C++ is way better for that.
However, if you are bound to follow MISRA to such degree (i.e. if your project has to follow that rule, even if it is only advisory), there is not much you can do.
Can someone explain me if this method is really reasonable for small to mid scale embedded projects with 1-2 developers working with the code?
That is up to you. There are very big projects that do not follow that advice and are successful. Typically a comment for private fields, or a naming convention, is enough.
As you've deduced, when using an opaque type such as this the main source file can't access the members of the struct, and in fact doesn't know how big the struct is. Because of this, not only do you need accessor functions to read/write the fields of the struct, but you also need a function to allocate memory for the struct, since only the library source knows the definition and size of the struct.
So your header file would contain the following:
typedef struct _structName structType_t;
structType_t *init();
void setVarA(structType_t *ptr, int valueA );
int getVarA(structType_t *ptr);
void cleanup(structType_t *ptr);
This interface allows a user to create an instance of the struct, get and set values, and clean it up. The library source would look like this:
#include "struct_test.h"
struct _structName
{
int varA;
int varB;
char varC;
};
structType_t *init()
{
return malloc(sizeof(structType_t ));
}
void setVarA(structType_t *ptr, int valueA )
{
ptr->varA = valueA;
}
int getVarA(structType_t *ptr)
{
return ptr->varA;
}
void cleanup(structType_t *ptr)
{
free(ptr);
}
Note that you only need to define the typedef once. This both defines the type alias and forward declares the struct. Then in the source file the actual struct definition appears without the typedef.
The init function is used by the caller to allocate space for the struct and return a pointer to it. That pointer can then be passed to the getter / setter functions.
So now your main code can use this interface like this:
#include "struct_test.h"
int main()
{
structType_t *s = init();
setVarA(s, 5);
printf("s->a=%d\n", getVarA(s));
cleanup(s);l
}
In the post it is mentioned, that the access is handled with set/get interface methods so I tried adding one similiar like this:
void setVarA ( _structName *ptr, int valueA )
{
ptr->varA = valueA;
}
But this also doesn't work because now he tells me that _structName is unknown...
The type is not _structName, but struct _structName or (as defined) structType_t.
And my bigger question still remains where the object of my struct is located in memory.
With this technique, there would be a method which returns the address of such an opaque object. It could be statically or dynamically allocated. There should of course also be a method to free an object.
Furthermore I can not see the benefits of doing it like this in relatively small scope embedded projects where I am the producer and consumer of the modules.
I agree with you.

Opaque types allocatable on stack in C

When designing a C interface, it is common to let into the public interface (.h) only what needs to be known by the user program.
Hence for example, the inner components of structures should remain hidden if the user program does not need to know them. This is indeed good practice, as the content and behavior of the struct could change in the future, without affecting the interface.
A great way to achieve that objective is to use incomplete types.
typedef struct foo opaqueType;
Now an interface using only pointers to opaqueType can be built, without the user program ever needing to know the inner working of struct foo.
But sometimes, it can be required to allocate such structure statically, typically on stack, for performance and memory fragmentation issues. Obviously, with above construction, opaqueType is incomplete, so its size is unknown, so it cannot be statically allocated.
A work around is to allocate a "shell type", such as :
typedef struct { int faketable[8]; } opaqueType;
Above construction enforces a size and an alignment, but doesn't go farther into describing what the structure really contains. So it matches the objective of keeping the type "opaque".
It mostly works. But in one circumstance (GCC 4.4), the compiler complains that it breaks strict-aliasing, and it generates buggy binary.
Now, I've read a ton of things about strict aliasing, so I guess I understand now what it means.
The question is : is there a way to define an opaque type which can nonetheless be allocated on stack, and without breaking strict aliasing rule ?
Note that I've attempted the union method described in this excellent article but it still generates the same warning.
Note also that visual, clang and gcc 4.6 and later don't complain and work fine with this construction.
[Edit] Information complement :
According to tests, the problem only happens in the following circumstances :
Private and public type different. I'm casting the public type to private inside the .c file. It doesn't matter apparently if they are part of the same union. It doesn't matter if the public type contains char.
If all operations on private type are just reads, there's no problem. Only writes cause problems.
I also suspect that only functions which are automatically inlined get into trouble.
Problem only happens on gcc 4.4 at -O3 setting. -O2 is fine.
Finally, my target is C90. Maybe C99 if there really is no choice.
You can force the alignment with max_align_t and you can avoid the strict aliasing issues using an array of char since char is explicitly allowed to alias any other type.
Something along the lines of:
#include <stdint.h>
struct opaque
{
union
{
max_align_t a;
char b[32]; // or whatever size you need.
} u;
};
If you want to support compiler that do not have the max_align_t, or if you know the alignment requirements of the real type, then you can use any other type for the a union member.
UPDATE: If you are targetting C11, then you may also use alignas():
#include <stdint.h>
#include <stdalign.h>
struct opaque
{
alignas(max_align_t) char b[32];
};
Of course, you can replace the max_align_t with whatever type you think appropriate. Or even an integer.
UPDATE #2:
Then, the use of this type in the library would be something along the lines of:
void public_function(struct opaque *po)
{
struct private *pp = (struct private *)po->b;
//use pp->...
}
This way, since you are type-punning a pointer to char you are not breaking the strict aliasing rules.
What you desire is some kind of equivalent of the C++ private access control in C. As you know, no such equivalent exists. The approach you give is approximately what I would do. However, I would make the opaqueType opaque to the inner components implementing the type, so I would be forced to cast it to the real type within the inner components. The forced cast should not generate the warning you are mentioning.
Although cumbersome to use, you can define an interface that provides "stack allocated" memory to an opaque type without exposing a sized structure. The idea is that the implementation code is in charge of the stack allocation, and the user passes in a callback function to get a pointer to the allocated type.
typedef struct opaqueType_raii_callback opqaueType_raii_callback;
struct opaqueType_raii_callback {
void (*func)(opqaueType_raii_callback *, opqaueType *);
};
extern void opaqueType_raii (opaqueType_raii_callback *);
extern void opaqueType_raii_v (opaqueType_raii_callback *, size_t);
void opaqueType_raii (opaqueType_raii_callback *cb) {
opaqueType_raii_v(cb, 1);
}
void opqaueType_raii_v (opaqueType_raii_callback *cb, size_t n) {
opaqueType x[n];
cb->func(cb, x);
}
The definitions above look a bit esoteric, but it is the way I normally implement a callback interface.
struct foo_callback_data {
opaqueType_raii_callback cb;
int my_data;
/* other data ... */
};
void foo_callback_function (opaqueType_raii_callback *cb, opaqueType *x) {
struct foo_callback_data *data = (void *)cb;
/* use x ... */
}
void foo () {
struct foo_callback_data data;
data.cb.func = foo_callback_function;
opaqueType_raii(&data.cb);
}
For me this seems to be something which just shouldn't be done.
The point of having an opaque pointer is to hide the implementation details. The type and alignment of memory where the actual structure is allocated, or whether the library manages additional data beyond what's pointed to are also implementation details.
Of course not that you couldn't document that one or another thing was possible, but the C language uses this approach (strict aliasing), which you can only more or less hack around by Rodrigo's answer (using max_align_t). By the rule you can't know by the interface what kind of constraints the particular compiler would impose on the actual structure within the implementation (for some esoteric microcontrollers, even the type of memory may matter), so I don't think this can be done reliably in a truly cross platform manner.

Typedef and Struct in C and H files

I've been using the following code to create various struct, but only give people outside of the C file a pointer to it. (Yes, I know that they could potentially mess around with it, so it's not entirely like the private keyword in Java, but that's okay with me).
Anyway, I've been using the following code, and I looked at it today, and I'm really surprised that it's actually working, can anyone explain why this is?
In my C file, I create my struct, but don't give it a tag in the typedef namespace:
struct LABall {
int x;
int y;
int radius;
Vector velocity;
};
And in the H file, I put this:
typedef struct LABall* LABall;
I am obviously using #include "LABall.h" in the c file, but I am NOT using #include "LABall.c" in the header file, as that would defeat the whole purpose of a separate header file. So, why am I able to create a pointer to the LABall* struct in the H file when I haven't actually included it? Does it have something to do with the struct namespace working accross files, even when one file is in no way linked to another?
Thank you.
A common pattern for stuff like that is to have a foo.h file defining the API like
typedef struct _Foo Foo;
Foo *foo_new();
void foo_do_something(Foo *foo);
and a foo.c file providing an implementation for that API like
struct _Foo {
int bar;
};
Foo *foo_new() {
Foo *foo = malloc(sizeof(Foo));
foo->bar = 0;
return foo;
}
void foo_do_something(Foo *foo) {
foo->bar++;
}
This hides all the memory layout and size of the struct in the implementation in foo.c, and the interface exposed via foo.h is completely independent of those internals: A caller.c which only does #include "foo.h" will only have to store a pointer to something, and pointers are always the same size:
#include "foo.h"
void bleh() {
Foo *f = foo_new();
foo_do_something(f);
}
Note: The ISO C standard section on reserved identifiers says that all identifiers beginning with an underscore are reserved. So typedef struct Foo Foo; is actually a better way to name things than typedef struct _Foo Foo;.
Note: I have left freeing the memory as an exercise to the reader. :-)
Of course, this means that the following file broken.c will NOT work:
#include "foo.h"
void broken() {
Foo f;
foo_do_something(&f);
}
as the memory size necessary for actually creating a variable of type Foo is not known in this file.
Since you're asking a precise reason as to "why" the language works this way, I'm assuming you want some precise references. If you find that pedant, just skip the notes...
It works because of two things:
All pointer to structure types have the same representation (note that it's not true of all pointer types, as far as standard C is concerned).[1] Hence, the compiler has enough information to generate proper code for all uses of your pointer-to-struct type.
The tag namespace (struct, enum, union) is indeed compatible accross all translation units.[2] Thus, the two structures (even though one is not completely defined, i.e. it lacks member declarations) are one and the same.
(BTW, #import is non-standard.)
[1] As per n1256 §6.2.5.27:
All pointers to structure types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
[2] As per n1256 §6.2.7.1:
two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are complete types, then the following additional requirements apply: [does not concern us].
In
typedef struct A* B;
since all pointers' interfaces are the same, knowing that B means a pointer to a struct A contains enough information already. The actual implementation of A is irrelevant (this technique is called "opaque pointer".)
(BTW, better rename one of the LABall's. It's confusing that the same name is used for incompatible types.)

Representing dynamic typing in C

I'm writing a dynamically-typed language. Currently, my objects are represented in this way:
struct Class { struct Class* class; struct Object* (*get)(struct Object*,struct Object*); };
struct Integer { struct Class* class; int value; };
struct Object { struct Class* class; };
struct String { struct Class* class; size_t length; char* characters; };
The goal is that I should be able to pass everything around as a struct Object* and then discover the type of the object by comparing the class attribute. For example, to cast an integer for use I would simply do the following (assume that integer is of type struct Class*):
struct Object* foo = bar();
// increment foo
if(foo->class == integer)
((struct Integer*)foo)->value++;
else
handleTypeError();
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
There are alternatives to this approach:
struct Object
{
struct Class* class;
union Value
{
struct Class c;
int i;
struct String s;
} value;
};
The problem here is that the union uses up as much space as the size of the largest thing that can be stored in the union. Given that some of my types are many times as large as my other types, this would mean that my small types (int) would take up as much space as my large types (map) which is an unacceptable tradeoff.
struct Object
{
struct Class* class;
void* value;
};
This creates a level of redirection that will slow things down. Speed is a goal here.
The final alternative is to pass around void*s and manage the internals of the structure myself. For example, to implement the type test mentioned above:
void* foo = bar();
// increment foo
if(*((struct Class*) foo) == integer)
(*((int*)(foo + sizeof(struct Class*))))++;
else
handleTypeError();
This gives me everything I want (portability, different sizes for different types, etc.) but has at least two downsides:
Hideous, error-prone C. The code above only calculates a single-member offset; it will get much worse with types more complex than integers. I might be able to alleviate this a bit using macros, but this will be painful no matter what.
Since there is no struct that represents the object, I don't have the option of stack allocations (at least without implementing my own stack on the heap).
Basically, my question is, how can I get what I want without paying for it? Is there a way to be portable, have variance in size for different types, not use redirection, and keep my code pretty?
EDIT: This is the best response I've ever received for an SO question. Choosing an answer was hard. SO only allows me to choose one answer so I chose the one that lead me to my solution, but you all received upvotes.
See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C. The Python solution can be directly applied to your problem. Essentially you want to do this:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.
C gives you sufficient guarantees that your first approach will work. The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:
union allow_aliasing {
struct Class class;
struct Object object;
struct Integer integer;
struct String string;
};
(You don't need to ever use the union for anything - it just has to be in scope)
I believe the relevant part of the standard is this:
[#5] With one exception, if the value
of a member of a union object is used
when the most recent store to the
object was to a different member, the
behavior is implementation-defined.
One special guarantee is made in order
to simplify the use of unions: If a
union contains several structures that
share a common initial sequence (see
below), and if the union object
currently contains one of these
structures, it is permitted to inspect
the common initial part of any of them
anywhere that a declaration of the
completed type of the union is
visible. Two structures share a common
initial sequence if corresponding
members have compatible types (and,
for bit-fields, the same widths) for a
sequence of one or more initial
members.
(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).
There are 3 major approaches for implementing dynamic types and which one is best depends on the situation.
1) C-style inheritance: The first one is shown in Josh Haberman's answer. We create a type-hierarchy using classic C-style inheritance:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
Functions with dynamically typed arguments receive them as Object*, inspect the class member, and cast as appropriate. The cost to check the type is two pointer hops. The cost to get the underlying value is one pointer hop. In approaches like this one, objects are typically allocated on the heap since the size of objects is unknown at compile time. Since most `malloc implementations allocate a minimum of 32 bytes at a time, small objects can waste a significant amount of memory with this approach.
2) Tagged union: We can remove a level of indirection for accessing small objects using the "short string optimization"/"small object optimization":
struct Object {
struct Class* class;
union {
// fundamental C types or other small types of interest
bool as_bool;
int as_int;
// [...]
// object pointer for large types (or actual pointer values)
void* as_ptr;
};
};
Functions with dynamically typed arguments receive them as Object, inspect the class member, and read the union as appropriate. The cost to check the type is one pointer hop. If the type is one of the special small types, it is stored directly in the union, and there is no indirection to retrieve the value. Otherwise, one pointer hop is required to retrieve the value. This approach can sometimes avoid allocating objects on the heap. Although the exact size of an object still isn't known at compile time, we now know the size and alignment (our union) needed to accommodate small objects.
In these first two solutions, if we know all the possible types at compile time, we can encode the type using an integer type instead of a pointer and reduce type check indirection by one pointer hop.
3) Nan-boxing: Finally, there's nan-boxing where every object handle is only 64 bits.
double object;
Any value corresponding to a non-NaN double is understood to simply be a double. All other object handles are a NaN. There are actually large swaths of bit values of double precision floats that correspond to NaN in the commonly used IEEE-754 floating point standard. In the space of NaNs, we use a few bits to tag types and the remaining bits for data. By taking advantage of the fact that most 64-bit machines actually only have a 48-bit address space, we can even stash pointers in NaNs. This method incurs no indirection or extra memory use but constrains our small object types, is awkward, and in theory is not portable C.
Section 6.2.5 of ISO 9899:1999 (the C99 standard) says:
A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.
Section 6.7.2.1 also says:
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[...]
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
This guarantees what you need.
In the question you say:
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works.
This will work on all platforms. It also means that your first alternative - what you are currently using - is safe enough.
But on another platform struct StringInteger might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
No compliant compiler is allowed to do that. [I replaced String by Integer assuming you were referring to the first set of declarations. On closer examination, you might have been referring to the structure with an embedded union. The compiler still isn't allowed to reorder class and value.]
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
I believe you're wrong here. First, because your struct String doesn't have a value member. Second, because I believe C does guarantee the layout in memory of your struct's members. That's why the following are different sizes:
struct {
short a;
char b;
char c;
}
struct {
char a;
short b;
char c;
}
If C made no guarantees, then compilers would probably optimize both of those to be the same size. But it guarantees the internal layout of your structs, so the natural alignment rules kick in and make the second one larger than the first.
I appreciate the pedantic issues raised by this question and answers, but I just wanted to mention that CPython has used similar tricks "more or less forever" and it's been working for decades across a huge variety of C compilers. Specifically, see object.h, macros like PyObject_HEAD, structs like PyObject: all kinds of Python Objects (down at the C API level) are getting pointers to them forever cast back and forth to/from PyObject* with no harm done. It's been a while since I last played sea lawyer with an ISO C Standard, to the point that I don't have a copy handy (!), but I do believe that there are some constraints there that should make this keep working as it has for nearly 20 years...

How do you implement a class in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Assuming I have to use C (no C++ or object oriented compilers) and I don't have dynamic memory allocation, what are some techniques I can use to implement a class, or a good approximation of a class? Is it always a good idea to isolate the "class" to a separate file? Assume that we can preallocate the memory by assuming a fixed number of instances, or even defining the reference to each object as a constant before compile time. Feel free to make assumptions about which OOP concept I will need to implement (it will vary) and suggest the best method for each.
Restrictions:
I have to use C and not an OOP
because I'm writing code for an
embedded system, and the compiler and
preexisting code base is in C.
There is no dynamic memory allocation
because we don't have enough memory
to reasonably assume we won't run out
if we start dynamically allocating
it.
The compilers we work with have no problems with function pointers
That depends on the exact "object-oriented" feature-set you want to have. If you need stuff like overloading and/or virtual methods, you probably need to include function pointers in structures:
typedef struct {
float (*computeArea)(const ShapeClass *shape);
} ShapeClass;
float shape_computeArea(const ShapeClass *shape)
{
return shape->computeArea(shape);
}
This would let you implement a class, by "inheriting" the base class, and implementing a suitable function:
typedef struct {
ShapeClass shape;
float width, height;
} RectangleClass;
static float rectangle_computeArea(const ShapeClass *shape)
{
const RectangleClass *rect = (const RectangleClass *) shape;
return rect->width * rect->height;
}
This of course requires you to also implement a constructor, that makes sure the function pointer is properly set up. Normally you'd dynamically allocate memory for the instance, but you can let the caller do that, too:
void rectangle_new(RectangleClass *rect)
{
rect->width = rect->height = 0.f;
rect->shape.computeArea = rectangle_computeArea;
}
If you want several different constructors, you will have to "decorate" the function names, you can't have more than one rectangle_new() function:
void rectangle_new_with_lengths(RectangleClass *rect, float width, float height)
{
rectangle_new(rect);
rect->width = width;
rect->height = height;
}
Here's a basic example showing usage:
int main(void)
{
RectangleClass r1;
rectangle_new_with_lengths(&r1, 4.f, 5.f);
printf("rectangle r1's area is %f units square\n", shape_computeArea(&r1));
return 0;
}
I hope this gives you some ideas, at least. For a successful and rich object-oriented framework in C, look into glib's GObject library.
Also note that there's no explicit "class" being modelled above, each object has its own method pointers which is a bit more flexible than you'd typically find in C++. Also, it costs memory. You could get away from that by stuffing the method pointers in a class structure, and invent a way for each object instance to reference a class.
I had to do it once too for a homework. I followed this approach:
Define your data members in a
struct.
Define your function members that
take a pointer to your struct as
first argument.
Do these in one header & one c.
Header for struct definition &
function declarations, c for
implementations.
A simple example would be this:
/// Queue.h
struct Queue
{
/// members
}
typedef struct Queue Queue;
void push(Queue* q, int element);
void pop(Queue* q);
// etc.
///
If you only want one class, use an array of structs as the "objects" data and pass pointers to them to the "member" functions. You can use typedef struct _whatever Whatever before declaring struct _whatever to hide the implementation from client code. There's no difference between such an "object" and the C standard library FILE object.
If you want more than one class with inheritance and virtual functions, then it's common to have pointers to the functions as members of the struct, or a shared pointer to a table of virtual functions. The GObject library uses both this and the typedef trick, and is widely used.
There's also a book on techniques for this available online - Object Oriented Programming with ANSI C.
C Interfaces and Implementations: Techniques for Creating Reusable Software, David R. Hanson
http://www.informit.com/store/product.aspx?isbn=0201498413
This book does an excellent job of covering your question. It's in the Addison Wesley Professional Computing series.
The basic paradigm is something like this:
/* for data structure foo */
FOO *myfoo;
myfoo = foo_create(...);
foo_something(myfoo, ...);
myfoo = foo_append(myfoo, ...);
foo_delete(myfoo);
you can take a look at GOBject. it's an OS library that give you a verbose way to do an object.
http://library.gnome.org/devel/gobject/stable/
I will give a simple example of how OOP should be done in C. I realize this thread is from 2009 but would like to add this anyway.
/// Object.h
typedef struct Object {
uuid_t uuid;
} Object;
int Object_init(Object *self);
uuid_t Object_get_uuid(Object *self);
int Object_clean(Object *self);
/// Person.h
typedef struct Person {
Object obj;
char *name;
} Person;
int Person_init(Person *self, char *name);
int Person_greet(Person *self);
int Person_clean(Person *self);
/// Object.c
#include "object.h"
int Object_init(Object *self)
{
self->uuid = uuid_new();
return 0;
}
uuid_t Object_get_uuid(Object *self)
{ // Don't actually create getters in C...
return self->uuid;
}
int Object_clean(Object *self)
{
uuid_free(self->uuid);
return 0;
}
/// Person.c
#include "person.h"
int Person_init(Person *self, char *name)
{
Object_init(&self->obj); // Or just Object_init(&self);
self->name = strdup(name);
return 0;
}
int Person_greet(Person *self)
{
printf("Hello, %s", self->name);
return 0;
}
int Person_clean(Person *self)
{
free(self->name);
Object_clean(self);
return 0;
}
/// main.c
int main(void)
{
Person p;
Person_init(&p, "John");
Person_greet(&p);
Object_get_uuid(&p); // Inherited function
Person_clean(&p);
return 0;
}
The basic concept involves placing the 'inherited class' at the top of the struct. This way, accessing the first 4 bytes in the struct also accesses the first 4 bytes in the 'inherited class' (assuming non-crazy optimizations). Now, when the pointer of the struct is cast to the 'inherited class', the 'inherited class' can access the 'inherited values' in the same way it would access its members normally.
This and some naming conventions for constructors, destructors, allocation, and deallocation functions (I recommend _init, _clean, _new, and _free) will get you a long way.
As for Virtual functions, use function pointers in the struct, possibly with Class_func(...); wrapper too.
As for (simple) templates, add a size_t parameter to determine size, require a void* pointer, or require a 'class' type with just the functionality you care about. (e.g. int GetUUID(Object *self); GetUUID(&p);)
Use a struct to simulate the data members of a class. In terms of method scope you can simulate private methods by placing the private function prototypes in the .c file and the public functions in the .h file.
GTK is built entirely on C and it uses many OOP concepts. I have read through the source code of GTK and it is pretty impressive, and definitely easier to read. The basic concept is that each "class" is simply a struct, and associated static functions. The static functions all accept the "instance" struct as a parameter, do whatever then need, and return results if necessary. For Example, you may have a function "GetPosition(CircleStruct obj)". The function would simply dig through the struct, extract the position numbers, probably build a new PositionStruct object, stick the x and y in the new PositionStruct, and return it. GTK even implements inheritance this way by embedding structs inside structs. pretty clever.
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <uchar.h>
/**
* Define Shape class
*/
typedef struct Shape Shape;
struct Shape {
/**
* Variables header...
*/
double width, height;
/**
* Functions header...
*/
double (*area)(Shape *shape);
};
/**
* Functions
*/
double calc(Shape *shape) {
return shape->width * shape->height;
}
/**
* Constructor
*/
Shape _Shape() {
Shape s;
s.width = 1;
s.height = 1;
s.area = calc;
return s;
}
/********************************************/
int main() {
Shape s1 = _Shape();
s1.width = 5.35;
s1.height = 12.5462;
printf("Hello World\n\n");
printf("User.width = %f\n", s1.width);
printf("User.height = %f\n", s1.height);
printf("User.area = %f\n\n", s1.area(&s1));
printf("Made with \xe2\x99\xa5 \n");
return 0;
};
In your case the good approximation of the class could be the an ADT. But still it won't be the same.
My strategy is:
Define all code for the class in a separate file
Define all interfaces for the class in a separate header file
All member functions take a "ClassHandle" which stands in for the instance name (instead of o.foo(), call foo(oHandle)
The constructor is replaced with a function void ClassInit(ClassHandle h, int x, int y,...) OR ClassHandle ClassInit(int x, int y,...) depending on the memory allocation strategy
All member variables are store as a member of a static struct in the class file, encapsulating it in the file, preventing outside files from accessing it
The objects are stored in an array of the static struct above, with predefined handles (visible in the interface) or a fixed limit of objects that can be instantiated
If useful, the class can contain public functions that will loop through the array and call the functions of all the instantiated objects (RunAll() calls each Run(oHandle)
A Deinit(ClassHandle h) function frees the allocated memory (array index) in the dynamic allocation strategy
Does anyone see any problems, holes, potential pitfalls or hidden benefits/drawbacks to either variation of this approach? If I am reinventing a design method (and I assume I must be), can you point me to the name of it?
Also see this answer and this one
It is possible. It always seems like a good idea at the time but afterwards it becomes a maintenance nightmare. Your code become littered with pieces of code tying everything together. A new programmer will have lots of problems reading and understanding the code if you use function pointers since it will not be obvious what functions is called.
Data hiding with get/set functions is easy to implement in C but stop there. I have seen multiple attempts at this in the embedded environment and in the end it is always a maintenance problem.
Since you all ready have maintenance issues I would steer clear.
My approach would be to move the struct and all primarily-associated functions to a separate source file(s) so that it can be used "portably".
Depending on your compiler, you might be able to include functions into the struct, but that's a very compiler-specific extension, and has nothing to do with the last version of the standard I routinely used :)
The first c++ compiler actually was a preprocessor which translated the C++ code into C.
So it's very possible to have classes in C.
You might try and dig up an old C++ preprocessor and see what kind of solutions it creates.
Do you want virtual methods?
If not then you just define a set of function pointers in the struct itself. If you assign all the function pointers to standard C functions then you will be able to call functions from C in very similar syntax to how you would under C++.
If you want to have virtual methods it gets more complicated. Basically you will need to implement your own VTable to each struct and assign function pointers to the VTable depending on which function is called. You would then need a set of function pointers in the struct itself that in turn call the function pointer in the VTable. This is, essentially, what C++ does.
TBH though ... if you want the latter then you are probably better off just finding a C++ compiler you can use and re-compiling the project. I have never understood the obsession with C++ not being usable in embedded. I've used it many a time and it works is fast and doesn't have memory problems. Sure you have to be a bit more careful about what you do but its really not that complicated.
C isn't an OOP language, as your rightly point out, so there's no built-in way to write a true class. You're best bet is to look at structs, and function pointers, these will let you build an approximation of a class. However, as C is procedural you might want to consider writing more C-like code (i.e. without trying to use classes).
Also, if you can use C, you can probally use C++ and get classes.

Resources