Related
I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.
The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).
- example-project
- cmake-build-debug
- example-lib-name
- include
- example-lib-name
- example-header-file.h
- src
- example-lib-name
- example-source-file.c
- CMakeLists.txt
- CMakeLists.txt
- main.c
Let's say that example-header-file.h contains:
typedef struct ExampleStruct {
int data;
} ExampleStruct;
ExampleStruct* new_example_struct(int, double);
which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.
Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling
ExampleStruct* new_struct = new_example_struct(<int>, <double>);,
and will be able to access the data property like: new_struct->data.
However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.
So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.
In our implementation file (example-source-file.c), let's say we have the following code:
#include <stdlib.h>
#include <stdbool.h>
typedef struct ExampleStruct {
int data;
double val;
} ExampleStruct;
ExampleStruct* new_example_struct(int data, double val) {
ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
example_struct->data=data;
example_struct->val=val;
return new_example_struct;
}
double get_val(ExampleStruct* e) {
return e->val;
}
This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:
#include <stdio.h>
#include "example-lib-name/example-header-file.h"
int main() {
printf("Hello, World!\n");
ExampleStruct* test = new_example(6, 7.2);
printf("%d\n", test->data); // <-- THIS WORKS
double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
printf("%f\n", x); //
// printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
return 0;
}
I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.
Some things I see that may be cause for concern:
This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.
Questions I'd like answers on:
Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Yes: your approach produces undefined behavior.
C requires that
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(C17 6.2.7/2)
and that
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
[...]
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")
Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.
Other alternatives:
Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.
As long as the public has a complete definition for ExampleStruct, it can make code like:
ExampleStruct a = *new_example_struct(42, 1.234);
Then the below will certainly fail.
printf("%g\n", get_val(&a));
I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.
Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)
Are there significant performance penalties I may suffer as a result of writing code this way?
Probably:
Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.
i.e. is there a simpler way to do this
Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:
#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)
typedef struct ExampleStruct {
int data;
_Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;
This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).
The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.
An example of the use of such an approach can be found in CUDA's driver API:
Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer
The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.
This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.
In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.
struct example { int data; };
struct example *new_example(int, double);
double example_val(struct example *e);
and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)
#include <stdlib.h>
#include "example.h"
struct private_example {
struct example public;
double val;
};
struct example *new_example(int data, double val) {
struct private_example *const example = malloc(sizeof *example);
if(!example) return 0;
example->public.data = data;
example->val = val;
return &example->public;
}
/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
return (struct private_example *)(void *)
((char *)example - offsetof(struct private_example, public));
}
double example_val(struct example *e) {
return example_upcast(e)->val;
}
Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.
If you need some constants in your code you can declare them with enums:
enum {
DOG,
CAT,
FISH,
};
enum {
CAR,
BUS,
TRAIN,
};
and then use DOG, BUS, etc. as needed. But enums may be declared in a more verbose style as well:
enum animals {
DOG,
CAT,
FISH,
} pets;
enum transport {
CAR,
BUS,
TRAIN,
} vehicles;
Given that enum constants have global scope and cannot be referred by pets.DOG in the way that structs and unions can, are there any good use cases for the verbose style? To me the type tags and variable names for enums look quite redundant, even offputting as they look like structs but can't be used like structs. I hope I'm missing something and they do have a good use.
There is a related SO Q&A where the overriding assumption is that one would use the type tags and variable names when using enums. So my question can be restated as "In what tasks would I fail if I use anonymous enums only?" Because to me, the whole point of enums are the DOG, CAT, CAR constants and I see no use for assigning one of these to an enum variable. I'm still learning, so I'm sure I must be missing something.
You can give enum types a name in case you want to declare variables of that type:
enum animals a1 = DOG;
enum animals a2 = CAT;
Or have them as function arguments:
void foo(enum animals a);
While enums are considered integer types, and you could also use an int to store one of these values, using a variable of an enum type helps to document your code and make you intent clear to the reader.
"To me the type tags and variable names for enums look quite redundant..."
The value in using a sequential collection of named integer values in the form of an enumerated list ( enum ) might seem subtle at first glance, but becomes very apparent when used in C projects for a couple of reasons:
The names associated with an enumerated list provide
self-documenting code, i.e. particularly when collection of names
chosen to represent a set of enumerated values forms a theme related
to the task at hand. (Your animal enum is a good example, as
would be one used to enumerate eg. a large list of commands, or
position types within a company.)
The default assignment of values in an enumerated list are
sequential, starting from 0, and increment by one until the end of
the list, resulting in a list of unique values, very well suited for
use when indexing through an array of strings with particular
meaning, or when used in a switch statement as the constant integer
value for each of the case statements.
And regarding comment: "...but the advantage of using the ANML type instead of int is minimal,..."
enum lists also provide a documented constraint. For example using ANML anml;
rather than int anml; as a struct member will quickly indicate to those who will maintain/update the source code (In the months or years to come.) that there is an associated list of
related values that this member is constrained to use, rather than
any random integer value. This is important when an enumerated list
will be used, eg. in a switch statement designed only to handle a set
of case statements that correspond with the constant integer values in that enum.
These two together are part of the use-case I have found particularly useful i.e. to use enumerations in conjucntion with string arrays for selecting content for user interface, or for sub-string search, etc.
eg:
typedef enum {
CAT,
DOG,
FISH,
MAX_ANML
}ANML;//for use in struct
char *strings[MAX_ANML] = {"cat","dog","fish"};
typedef struct {
char content[80];
ANML anml;
}SEARCH;
Where for example, the two constructs then can be used in conjunction with a switch statement:
bool searchBuf(SEARCH *animal)
{
bool res = FALSE;
switch (animal->anml) {
case CAT:
//use the string animal[type] for a search, or user interface content, etc.
if(strstr(animal->content, strings[CAT]))
res = TRUE;
break;
case DOG:
if(strstr(animal->content, strings[DOG]))
res = TRUE;
break;
case FISH:
if(strstr(animal->content, strings[FISH]))
res = TRUE;
break;
};
return res;
}
int main(void)
{
char buffer[] = {"this is a string containing cat."};
SEARCH search;
strcpy(search.content, buffer);
search.anml = CAT;
bool res = searchBuf(&search);
//use res...
return 0;
}
I like enums instead of #defines when I debug. The debuer shows me not only the numeric value, but also the enum name - very handy
I am posting my own answer after a number of others have posted useful answers and comments, and we got as far as establishing that enum typenames and variable names are useful as self-documenting code and making the intent of the code clear (thanks to answers by ryyker and dbush).
As I was experimenting and looking for stronger reasons to use non-anonymous enums, I established that there cannot be any by definition. Enums have no scoping and bounds checks, not at compile time (GCC 6) nor at runtime. Here is a snippet demonstrating the weakness:
enum withType { // enum with type name and variable name
ONE,
TWO,
THREE,
} wtEnum;
enum { // Anonymous enum
TINY,
SMALL,
MID,
LARGE,
BIGGEST,
};
int main(void) {
enum withType wt1 = LARGE; // Overflow!
wtEnum = BIGGEST; // Overflow!
printf("Enum test values: %d, %d, %d\n", THREE, wt1, wtEnum);
return 0;
};
The example makes clear that any "scoping" you may want to do with enum type names and variable names is by convention only, and relies on coder discipline not to cross enum "domains". I would go so far as to claim that given this reality, enum functionality in C is "mis-designed". It creates the impression of the kind of utility we see with structs and unions, but provides nothing of the kind. After this insight, I consider anonymous enums the only safe enums to use!
All of that said, I have accepted ryyker's answer as it provides a nice demonstration of mainstream usage of enums. But I am leaving my own answer here as well, because the points I have raised are valid.
I was studying about disjoint unions in programming. I came across with the saying that Pascal,SML and C have their own union version: variant record,construction and union. It was also saying that Pascal contains a "tag" that you don't have to use it, SML has a tag that you required to use it and C does not have a tag. furthermore, SML will throw exception if we used it wrong, Pascal allows check during runtime and C does not have a feature for checking during runtime and the programmer have to add a field for a "tag" manually.
First of all, I don't understand what is "tag". I was trying to look at some examples of those unions but didn't understand what "tag" represents. If "tags" are important, how come C does have one? what is the difference between those unions.
Also, I didn't find any material related to the "tag" of unions.
Futhermore, what does it mean "checking during runtime", checking what? It will be great to see smiple examples that show those features.
One could call such disjoint unions a very early form of polymorphism. You have one type that can have several forms. In some languages, which of these forms is being used (is active) is distinguished by a member of the type, called a tag. This can be a boolean, a byte, an enum, or some other ordinal.
In some (older?) versions of Pascal, the tag is actually required to contain the correct value. A Pascal "union" (or, as they are called in Pascal, variant record) contains a value that distinguishes which of the branches is currently "active".
An example:
type
MyUnion = record // Pascal's version of a struct -- or union
case Tag: Byte of // This doesn't have to be called Tag, it can have any name
0: (B0, B1, B2, B3: Byte); // only one of these branches is present
1: (W0, W1: Word); // they overlap each other in memory
2: (L: Longint);
end;
In such versions of Pascal, if Tag has the value 0, you can only access B0, B1, B2 or B3 and not the other variants. If Tag is 1, you can only access W0 and W1, etc...
In most Pascal versions, there is no such restriction and the tag value is purely informative. In many of those, you don't even need an explicit tag value anymore:
MyUnion = record
case Byte of // no tag, just a type, to keep the syntax similar
etc...
Note that Pascal variant records are not pure unions, where each part is an alternative:
type
MyVariantRec = record
First: Integer; // the non-variant part begins here
Second: Double;
case Byte of // only the following part is a "union", the variant part.
0: ( B0, B1, B2, B3: Byte; );
1: ( W0, W1: Word; );
2: ( L: Longint);
end;
In C, you would have to nest a union in a struct to get something nearly the same:
// The following is more or less the equivalent of the Pascal record above
struct MyVariantRec
{
int first;
double second;
union
{
struct { unsigned char b0, b1, b2, b3; };
struct { unsigned short w0, w1 };
struct { long l };
};
}
First of all, I don't understand what is "tag".
Wikipedia has a reasonably nice discussion of the overall concept, which starts off with a list of synonyms, including "tagged union". Actually, "tagged union" is the main heading of the article, and disjoint union is one of the synonyms. It starts with a pretty succinct explanation:
a data structure used to hold a value that could take on several
different, but fixed, types. Only one of the types can be in use at
any one time, and a tag field explicitly indicates which one is in
use.
You go on to ask,
If "tags" are important, how come C does have one?
How important tags are in this context is a language-design question on which C, Pascal, and SML take different positions. Inasmuch as C is inclined to take a rather low-level approach to most things, and to allow users a great deal of control, it is not surprising that it does not force tag usage. Users who want tags can implement them themselves with comparative ease, as indeed I have done myself on occasion.
Alternatively, it might be easier to say that C doesn't have tagged unions as a built-in language feature at all, only plain, untagged unions. From that perspective, if you want a tagged union in C then you have to implement it yourself. This is probably the most consistent view, but I gather that it differs from the one presented in the material you have been studying.
what is the difference between those unions.
They are different implementations of a similar concept, provided by different languages. A full analysis would be beyond the reasonable scope of an SO answer. Like many things in Computer Science and elsewhere, the abstract idea of disjoint unions can be realized in a great many different ways.
Also, I didn't find any material related to the "tag" of unions.
See above, and the linked Wikipedia article. I'm sure you could turn up a lot more material, too, especially with WP's synonym list to work with.
Futhermore, what does it mean "checking during runtime", checking what?
I'd have to see the context and exact statement to be sure, but it seems likely that your source was talking about checking one or more of these things:
that the tag of a particular instance of the union is one of those defined for that union type, or
that the contents of the union are of the type indicated by the tag, or
that a list of alternative actions (see below) covers all possible alternatives.
It will be great to see smiple examples that show those features.
My Pascal is too rusty to be of any use, and I do not know SML. Even just a C example may be instructive, however:
enum my_tag { INT_TAG, STRING_TAG, DOUBLE_TAG };
union disjoint_union {
struct {
enum my_tag tag;
int an_int;
};
struct {
enum my_tag tag_s;
char *a_string;
};
struct {
enum my_tag tag_d;
double a_double;
};
};
union disjoint_union u = { .tag = INT_TAG, .an_int = 42 };
union disjoint_union u2 = { .tag = STRING_TAG, .a_string = "hello" };
union disjoint_union u3 = { .tag = DOUBLE_TAG, .a_double = 3.14159 };
This being C, the tag is provided manually and explicitly, and the language does not distinguish it specially. Also, it is up to the programmer to ensure that the union's content bears the correct tag.
You might use such a thing with a function like this, which relies on the tag to determine how to handle instances of the union type:
void print_union(union disjoint_union du) {
switch (du.tag) {
case INT_TAG:
printf("%d", du.an_int);
break;
case STRING_TAG:
printf("%s", du.a_string);
break;
case DOUBLE_TAG:
printf("%f", du.a_double);
break;
}
}
The tag is anything that tells you what member of the union is currently being used.
It's usually an enum but could be an integer, a boolean, or a bitfield based on one of these.
Example:
union my_union { char *string; void *void_ptr; long integer; };
struct my_tagged_union {
union my_union the_union;
enum { is_string, is_void_ptr, is_integer } the_tag;
};
C not forcing you to use a builtin-tag means you have more control over the layout and size of your data. For example you can use a bitfield tag and place it next to other bitfield information you store in your structure so that the bitfields get merged, yielding you space savings; or sometimes the union member currently in use is implicit from the context your code is in, in which case no tag is needed at all.
SML has a tag that you required to use it [...]. furthermore, SML will throw exception if we used it wrong,
Standard ML has algebraic data types which contain sum types and product types. The sum types build on top of unions (and the product types build on top of structs), but handle what you call a tagged or disjoint union automatically in the compiler; you specify the constructors, and the compiled code figures out how to differentiate between the different constructors via pattern matching. For example,
datatype pokemon = Pikachu of int
| Bulbasaur of string
| Charmander of bool * char
| Squirtle of pokemon list
So a sum type can have different constructors with different parameters, and the parameters can themselves be a product of other types, including sum types, and including the type being defined itself, making the data type definition recursive. This is implemented with tagged unions, but the abstractions on top provide for more syntactic convenience.
To clarify, Standard ML will not throw an exception if used wrong, but throw a type error during compilation. This is because of Standard ML's type system. So you can't accidentally have a (void *)-pointer that you cast to something it isn't, which is possible in C.
I have some code that a program generated for me, and I really do not understand why it does what it does. The language is plain C, and a struct is generated.
.h-file:
struct X_IMPL {
sint32 y;
};
struct X {
struct X_IMPL * IMPL;
};
.c-file:
#define _my_y self->IMPL->y
sint32 do_something(struct X * self)
{
return _my_y*13;
}
I do assume that _my_y now points to a variable inside the struct, and can be used to change the struct's variable. My question is, why would code be generated this way? Is there any advantage compared to just simply using the parameter's reference? When a reference is created with a define like that, do I really need that parameter at all?
It is just a matter of preferences, as you can do that in many ways, this one is not that sheer. On the first line, where the define is, it assigns nothing but define a macro for accessing a struct pointer through a struct pointer.
I think what you are seeing is "object oriented programming" in C. Note that it's not usually 1:1 equivalent to OOP in C++/Java/C#/whatever, because the OOP mechanisms are not built-in, but implemented explicitly. So different projects and different developers might write quite different code for same thing, while in some other language with built-in OOP features, they'd all just use the built-in features the same way.
The do_something in C++ might look like this:
// do_something is public member function AKA method of class X
sint32 X::do_something()
{
// y is this->y, private member variable of class X
return y * 13;
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Assuming I have to use C (no C++ or object oriented compilers) and I don't have dynamic memory allocation, what are some techniques I can use to implement a class, or a good approximation of a class? Is it always a good idea to isolate the "class" to a separate file? Assume that we can preallocate the memory by assuming a fixed number of instances, or even defining the reference to each object as a constant before compile time. Feel free to make assumptions about which OOP concept I will need to implement (it will vary) and suggest the best method for each.
Restrictions:
I have to use C and not an OOP
because I'm writing code for an
embedded system, and the compiler and
preexisting code base is in C.
There is no dynamic memory allocation
because we don't have enough memory
to reasonably assume we won't run out
if we start dynamically allocating
it.
The compilers we work with have no problems with function pointers
That depends on the exact "object-oriented" feature-set you want to have. If you need stuff like overloading and/or virtual methods, you probably need to include function pointers in structures:
typedef struct {
float (*computeArea)(const ShapeClass *shape);
} ShapeClass;
float shape_computeArea(const ShapeClass *shape)
{
return shape->computeArea(shape);
}
This would let you implement a class, by "inheriting" the base class, and implementing a suitable function:
typedef struct {
ShapeClass shape;
float width, height;
} RectangleClass;
static float rectangle_computeArea(const ShapeClass *shape)
{
const RectangleClass *rect = (const RectangleClass *) shape;
return rect->width * rect->height;
}
This of course requires you to also implement a constructor, that makes sure the function pointer is properly set up. Normally you'd dynamically allocate memory for the instance, but you can let the caller do that, too:
void rectangle_new(RectangleClass *rect)
{
rect->width = rect->height = 0.f;
rect->shape.computeArea = rectangle_computeArea;
}
If you want several different constructors, you will have to "decorate" the function names, you can't have more than one rectangle_new() function:
void rectangle_new_with_lengths(RectangleClass *rect, float width, float height)
{
rectangle_new(rect);
rect->width = width;
rect->height = height;
}
Here's a basic example showing usage:
int main(void)
{
RectangleClass r1;
rectangle_new_with_lengths(&r1, 4.f, 5.f);
printf("rectangle r1's area is %f units square\n", shape_computeArea(&r1));
return 0;
}
I hope this gives you some ideas, at least. For a successful and rich object-oriented framework in C, look into glib's GObject library.
Also note that there's no explicit "class" being modelled above, each object has its own method pointers which is a bit more flexible than you'd typically find in C++. Also, it costs memory. You could get away from that by stuffing the method pointers in a class structure, and invent a way for each object instance to reference a class.
I had to do it once too for a homework. I followed this approach:
Define your data members in a
struct.
Define your function members that
take a pointer to your struct as
first argument.
Do these in one header & one c.
Header for struct definition &
function declarations, c for
implementations.
A simple example would be this:
/// Queue.h
struct Queue
{
/// members
}
typedef struct Queue Queue;
void push(Queue* q, int element);
void pop(Queue* q);
// etc.
///
If you only want one class, use an array of structs as the "objects" data and pass pointers to them to the "member" functions. You can use typedef struct _whatever Whatever before declaring struct _whatever to hide the implementation from client code. There's no difference between such an "object" and the C standard library FILE object.
If you want more than one class with inheritance and virtual functions, then it's common to have pointers to the functions as members of the struct, or a shared pointer to a table of virtual functions. The GObject library uses both this and the typedef trick, and is widely used.
There's also a book on techniques for this available online - Object Oriented Programming with ANSI C.
C Interfaces and Implementations: Techniques for Creating Reusable Software, David R. Hanson
http://www.informit.com/store/product.aspx?isbn=0201498413
This book does an excellent job of covering your question. It's in the Addison Wesley Professional Computing series.
The basic paradigm is something like this:
/* for data structure foo */
FOO *myfoo;
myfoo = foo_create(...);
foo_something(myfoo, ...);
myfoo = foo_append(myfoo, ...);
foo_delete(myfoo);
you can take a look at GOBject. it's an OS library that give you a verbose way to do an object.
http://library.gnome.org/devel/gobject/stable/
I will give a simple example of how OOP should be done in C. I realize this thread is from 2009 but would like to add this anyway.
/// Object.h
typedef struct Object {
uuid_t uuid;
} Object;
int Object_init(Object *self);
uuid_t Object_get_uuid(Object *self);
int Object_clean(Object *self);
/// Person.h
typedef struct Person {
Object obj;
char *name;
} Person;
int Person_init(Person *self, char *name);
int Person_greet(Person *self);
int Person_clean(Person *self);
/// Object.c
#include "object.h"
int Object_init(Object *self)
{
self->uuid = uuid_new();
return 0;
}
uuid_t Object_get_uuid(Object *self)
{ // Don't actually create getters in C...
return self->uuid;
}
int Object_clean(Object *self)
{
uuid_free(self->uuid);
return 0;
}
/// Person.c
#include "person.h"
int Person_init(Person *self, char *name)
{
Object_init(&self->obj); // Or just Object_init(&self);
self->name = strdup(name);
return 0;
}
int Person_greet(Person *self)
{
printf("Hello, %s", self->name);
return 0;
}
int Person_clean(Person *self)
{
free(self->name);
Object_clean(self);
return 0;
}
/// main.c
int main(void)
{
Person p;
Person_init(&p, "John");
Person_greet(&p);
Object_get_uuid(&p); // Inherited function
Person_clean(&p);
return 0;
}
The basic concept involves placing the 'inherited class' at the top of the struct. This way, accessing the first 4 bytes in the struct also accesses the first 4 bytes in the 'inherited class' (assuming non-crazy optimizations). Now, when the pointer of the struct is cast to the 'inherited class', the 'inherited class' can access the 'inherited values' in the same way it would access its members normally.
This and some naming conventions for constructors, destructors, allocation, and deallocation functions (I recommend _init, _clean, _new, and _free) will get you a long way.
As for Virtual functions, use function pointers in the struct, possibly with Class_func(...); wrapper too.
As for (simple) templates, add a size_t parameter to determine size, require a void* pointer, or require a 'class' type with just the functionality you care about. (e.g. int GetUUID(Object *self); GetUUID(&p);)
Use a struct to simulate the data members of a class. In terms of method scope you can simulate private methods by placing the private function prototypes in the .c file and the public functions in the .h file.
GTK is built entirely on C and it uses many OOP concepts. I have read through the source code of GTK and it is pretty impressive, and definitely easier to read. The basic concept is that each "class" is simply a struct, and associated static functions. The static functions all accept the "instance" struct as a parameter, do whatever then need, and return results if necessary. For Example, you may have a function "GetPosition(CircleStruct obj)". The function would simply dig through the struct, extract the position numbers, probably build a new PositionStruct object, stick the x and y in the new PositionStruct, and return it. GTK even implements inheritance this way by embedding structs inside structs. pretty clever.
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <uchar.h>
/**
* Define Shape class
*/
typedef struct Shape Shape;
struct Shape {
/**
* Variables header...
*/
double width, height;
/**
* Functions header...
*/
double (*area)(Shape *shape);
};
/**
* Functions
*/
double calc(Shape *shape) {
return shape->width * shape->height;
}
/**
* Constructor
*/
Shape _Shape() {
Shape s;
s.width = 1;
s.height = 1;
s.area = calc;
return s;
}
/********************************************/
int main() {
Shape s1 = _Shape();
s1.width = 5.35;
s1.height = 12.5462;
printf("Hello World\n\n");
printf("User.width = %f\n", s1.width);
printf("User.height = %f\n", s1.height);
printf("User.area = %f\n\n", s1.area(&s1));
printf("Made with \xe2\x99\xa5 \n");
return 0;
};
In your case the good approximation of the class could be the an ADT. But still it won't be the same.
My strategy is:
Define all code for the class in a separate file
Define all interfaces for the class in a separate header file
All member functions take a "ClassHandle" which stands in for the instance name (instead of o.foo(), call foo(oHandle)
The constructor is replaced with a function void ClassInit(ClassHandle h, int x, int y,...) OR ClassHandle ClassInit(int x, int y,...) depending on the memory allocation strategy
All member variables are store as a member of a static struct in the class file, encapsulating it in the file, preventing outside files from accessing it
The objects are stored in an array of the static struct above, with predefined handles (visible in the interface) or a fixed limit of objects that can be instantiated
If useful, the class can contain public functions that will loop through the array and call the functions of all the instantiated objects (RunAll() calls each Run(oHandle)
A Deinit(ClassHandle h) function frees the allocated memory (array index) in the dynamic allocation strategy
Does anyone see any problems, holes, potential pitfalls or hidden benefits/drawbacks to either variation of this approach? If I am reinventing a design method (and I assume I must be), can you point me to the name of it?
Also see this answer and this one
It is possible. It always seems like a good idea at the time but afterwards it becomes a maintenance nightmare. Your code become littered with pieces of code tying everything together. A new programmer will have lots of problems reading and understanding the code if you use function pointers since it will not be obvious what functions is called.
Data hiding with get/set functions is easy to implement in C but stop there. I have seen multiple attempts at this in the embedded environment and in the end it is always a maintenance problem.
Since you all ready have maintenance issues I would steer clear.
My approach would be to move the struct and all primarily-associated functions to a separate source file(s) so that it can be used "portably".
Depending on your compiler, you might be able to include functions into the struct, but that's a very compiler-specific extension, and has nothing to do with the last version of the standard I routinely used :)
The first c++ compiler actually was a preprocessor which translated the C++ code into C.
So it's very possible to have classes in C.
You might try and dig up an old C++ preprocessor and see what kind of solutions it creates.
Do you want virtual methods?
If not then you just define a set of function pointers in the struct itself. If you assign all the function pointers to standard C functions then you will be able to call functions from C in very similar syntax to how you would under C++.
If you want to have virtual methods it gets more complicated. Basically you will need to implement your own VTable to each struct and assign function pointers to the VTable depending on which function is called. You would then need a set of function pointers in the struct itself that in turn call the function pointer in the VTable. This is, essentially, what C++ does.
TBH though ... if you want the latter then you are probably better off just finding a C++ compiler you can use and re-compiling the project. I have never understood the obsession with C++ not being usable in embedded. I've used it many a time and it works is fast and doesn't have memory problems. Sure you have to be a bit more careful about what you do but its really not that complicated.
C isn't an OOP language, as your rightly point out, so there's no built-in way to write a true class. You're best bet is to look at structs, and function pointers, these will let you build an approximation of a class. However, as C is procedural you might want to consider writing more C-like code (i.e. without trying to use classes).
Also, if you can use C, you can probally use C++ and get classes.