May I know the usage and logic behind the opaque pointer concept in C?
An opaque pointer is one in which no details are revealed of the underlying data (from a dictionary definition: opaque: adjective; not able to be seen through; not transparent).
For example, you may declare in a header file (this is from some of my actual code):
typedef struct pmpi_s *pmpi;
which declares a type pmpi which is a pointer to the opaque structure struct pmpi_s, hence anything you declare as pmpi will be an opaque pointer.
Users of that declaration can freely write code like:
pmpi xyzzy = NULL;
without knowing the actual "definition" of the structure.
Then, in the code that knows about the definition (ie, the code providing the functionality for pmpi handling, you can "define" the structure:
struct pmpi_s {
uint16_t *data; // a pointer to the actual data array of uint16_t.
size_t sz; // the allocated size of data.
size_t used; // number of segments of data in use.
int sign; // the sign of the number (-1, 0, 1).
};
and easily access the individual fields of it, something that users of the header file cannot do.
More information can be found on the Wikipedia page for opaque pointers..
The main use of it is to hide implementation details from users of your library. Encapsulation (despite what the C++ crowd will tell you) has been around for a long time :-)
You want to publish just enough details on your library for users to effectively make use of it, and no more. Publishing more gives users details that they may come to rely upon (such as the fact the size variable sz is at a specific location in the structure, which may lead them to bypass your controls and manipulate it directly.
Then you'll find your customers complaining bitterly when you change the internals. Without that structure information, your API is limited only to what you provide and your freedom of action regarding the internals is maintained.
Opaque pointers are used in the definitions of programming interfaces (API's).
Typically they are pointers to incomplete structure types, declared like:
typedef struct widget *widget_handle_t;
Their purpose is to provide the client program a way to hold a reference to an object managed by the API, without revealing anything about the implementation of that object, other than its address in memory (the pointer itself).
The client can pass the object around, store it in its own data structures, and compare two such pointers whether they are the same or different, but it cannot dereference the pointers to peek at what is in the object.
The reason this is done is to prevent the client program from becoming dependent on those details, so that the implementation can be upgraded without having to recompile client programs.
Because the opaque pointers are typed, there is a good measure of type safety. If we have:
typedef struct widget *widget_handle_t;
typedef struct gadget *gadget_handle_t;
int api_function(widget_handle_t, gadget_handle_t);
if the client program mixes up the order of the arguments, there will be a diagnostic from the compiler, because a struct gadget * is being converted to a struct widget * without a cast.
That is the reason why we are defining struct types that have no members; each struct declaration with a different new tag introduces a new type that is not compatible with previously declared struct types.
What does it mean for a client to become dependent? Suppose that a widget_t has width and height properties. If it isn't opaque and looks like this:
typedef struct widget {
short width;
short height;
} widget_t;
then the client can just do this to get the width and height:
int widget_area = whandle->width * whandle->height;
whereas under the opaque paradigm, it would have to use access functions (which are not inlined):
// in the header file
int widget_getwidth(widget_handle_t *);
int widget_getheight(widget_handle_t *);
// client code
int widget_area = widget_getwidth(whandle) * widget_getheight(whandle);
Notice how the widget authors used the short type to save space in the structure, and that has been exposed to the client of the non-opaque interface. Suppose that widgets can now have sizes that don't fit into short and the structure has to change:
typedef struct widget {
int width;
int height;
} widget_t;
Client code must be re-compiled now to pick up this new definition. Depending on the tooling and deployment workflow, there may even be a risk that this isn't done: old client code tries to use the new library and misbehaves by accessing the new structure using the old layout. That can easily happen with dynamic libraries. The library is updated, but the dependent programs are not.
The client which uses the opaque interface continues to work unmodified and so doesn't require recompiling. It just calls the new definition of the accessor functions. Those are in the widget library and correctly retrieve the new int typed values from the structure.
Note that, historically (and still currently here and there) there has also been a lackluster practice of using the void * type as an opaque handle type:
typedef void *widget_handle_t;
typedef void *gadget_handle_t;
int api_function(widget_handle_t, gadget_handle_t);
Under this scheme, you can do this, without any diagnostic:
api_function("hello", stdout);
The Microsoft Windows API is an example of a system in which you can have it both ways. By default, various handle types like HWND (window handle) and HDC (device context) are all void *. So there is no type safety; a HWND could be passed where a HDC is expected, by mistake. If you do this:
#define STRICT
#include <windows.h>
then these handles are mapped to mutually incompatible types to catch those errors.
Opaque as the name suggests is something we can’t see through. E.g. wood is opaque. Opaque pointer is a pointer which points to a data structure whose contents are not exposed at the time of its definition.
Example:
struct STest* pSTest;
It is safe to assign NULL to an opaque pointer.
pSTest = NULL;
Related
I want to create an API in C. My goal is to implement abstractions to access and mutate struct variables that are defined in the API.
API's header file:
#ifndef API_H
#define API_H
struct s_accessor {
struct s* s_ptr;
};
void api_init_func(struct s_accessor *foo);
void api_mutate_func(struct s_accessor *foo, int x);
void api_print_func(struct s_accessor *foo);
#endif
API' implementation file:
#include <stdio.h>
#include "api.h"
struct s {
int internal;
int other_stuff;
};
void api_init_func(struct s_accessor* foo) {
foo->s_ptr = NULL;
}
void api_print_func(struct s_accessor *foo)
{
printf("Value of member 'internal' = %d\n", foo->s_ptr->internal);
}
void api_mutate_func(struct s_accessor *foo, int x)
{
struct s bar;
foo->s_ptr = &bar;
foo->s_ptr->internal = x;
}
Client-side program that uses the API:
#include <stdio.h>
#include "api.h"
int main()
{
struct s_accessor foo;
api_init_func(&foo); // set s_ptr to NULL
api_mutate_func(&foo, 123); // change value of member 'internal' of an instance of struct s
api_print_func(&foo); // print member of struct s
}
I have the following questions regarding my code:
Is there a direct (non-hackish) way to hide the implementation of my API?
Is this the proper way to create abstractions for the client-side to use my API? If not, how can I improve this to make it better?
"Accessor" isn't a good terminology. This term is used in object oriented programming to denote a kind of method.
The structure type struct s_accessor is in fact something called a handle. It contains a pointer to the real object. A handle is a doubly indirect pointer: the application passes around pointers to handles, and the handles contain pointers to the objects.
An old adage says that "any problem in computer science can be solved by adding another layer of indirection", of which handles are a prime example. Handles allow objects to be moved from one address to another or to be replaced. Yet, to the application, the handle address represents the object, and so when the implementation object is relocated or replaced, as far as the application is concerned, it is still the same object.
With a handle we can do things like:
have a vector object that can grow
have OOP objects that can apparently change their class
relocate variable-length objects such as buffers and strings to compact their memory footprint
all without the object changing its memory address and thus identity. Because the handle stays the same when these changes occur, the application does not have to hunt down every copy of the object pointer and replace it with a new one; the handle effectively takes care of that in one place.
In spite of all of that, handles tend to be unusual in C API's, in particular lower-level ones. Given an API that does not use handles, you can whip up handles around it. Even if you think that the users of your object will benefit from handles, it may be good to split up the API into two: an internal one which only deals with s, and the external one with the struct s_handle.
If you're using threads, then handles require careful concurrent programming. So that is to say, even though from the application's point of view, you can change the handle-referenced object, which is convenient, it requires synchronization. Say we have a vector object referenced by a handle. Application code is working with it, so we can't just suddenly replace the vector with a pointer to a different one (in order to resize it). Another thread is just in the middle of working with the original pointer. The operations that access the vector or store values into it through the handle must be synchronized with the replacement operation. Even if all of that is done right, it's going to add a lot of overhead, and so then application people may notice some performance problems and ask for escape hatches in the API, like for some functions function to "pin" down a handle so that the object cannot move while an efficient operation works directly with the s object inside it.
For that reason, I would tend stay away from designing a handle API, and make that sort of thing the application's problem. It may well be easier for a multi-threaded application to just use a well-designed "just the s please" API correctly, than to write a completely thread-safe, robust, efficient struct s_handle layer.
Is there a direct (non-hackish) way to hide the implementation of my API?
Basically the "rule #1" of hiding the implementation of an API in C is not to allow an init operation whereby the client application declares some memory and your API initializes it. That said, it is possible like this:
typedef struct opaque opaque_t;
#ifndef OPAQUE_IMPL
struct opaque {
int dummy[42]; // big enough for all future extension
} opaque_t;
#endif
void opaque_init(opaque_t *o);
In this declaration, we have revealed nothing to the client, other than that our objects are buffers of memory that require int alignment, and are at least 42 int wide.
In actual fact, the objects are smaller; we have just added a reserve amount for future growth. We can make our actual object larger withotu having to re-compile the clients, as long as our object does not require more than int [42] bytes.
Why we have that #ifndef is that the implementation code will do something like this:
#define OPAQUE_IMPL // suppress the fake definition in the header
#include "opaque.h"
// actual definition
struct opaque {
int whatever;
char *name;
};
This kind of thing plays it loose with the "law" of ISO C, because effectively the client and implementation are using a different definition of the struct opaque type.
Allowing clients to allocate the objects themselves yields certain efficiencies, because allocating objects in automatic storage (i.e. declaring them as local variables) can place them in the stack with very little overhead compared to dynamic memory allocation.
The more common approach for opaqueness is not to provide an init operation at all, only an operation for allocating a new object and destroying it:
typedef struct opaque opaque_t; // incomplete struct
opaque_t *opaque_create(/* args .... */);
void opaque_destroy(opaque_t *o);
Now the caller knows nothing, other than that an "opaque" object is represented as a pointer, the same pointer over its entire lifetime.
Total opaqueness may not be worth it for an API which is internal to an application or application framework. It's useful for an API that has external clients, like application developers in a different team or organization.
Ask yourself the question: would the client of this API, and its implementation, ever be shipped and upgraded separately? If the answer is no, then that diminishes the need for total opaqueness.
this is the right way to do abstarction and encapsulation in C applications.
use the Incomplete Types in C Language for hiding structure details. You can define structures, unions, and enumerations without listing their members (or values, in the case of enumerations). Doing so results in an incomplete type. You can't declare variables of incomplete types, but you can work with pointer to those types
constness in c lang
in evrey function espicialy those that you are exposing in api, that do not change the pointer or the structure data pointed by pointer, better and shall be const pointer. this will ensure (somehow :-) you still can change it in c) to the api user that you are not changing structure data. you can also protect the datat and the address by double const the pointer, seee below:
#ifndef API_H
#define API_H
typedef struct s_accessor s_accessor, *p_s_accessor;
void api_init_func(p_s_accessor p_foo);
void api_mutate_func(p_s_accessor p_foo, int x);
void api_print_func(const p_s_accessor const p_foo);
#endif
in the api.c you can complete the structure type:
struct s {
int internal;
int other_stuff;
};
all auxilary functions should be static in api.c(limit the fucntions scope to api.c only!
minimise the includes in the api.h.
regarding question 1 idont think there is a way that you can hide the implementaion details!
I need to create a library in C and I am wondering how to manage objects: returning allocated (ex: fopen, opendir) or in-place initialization (ex: GNU hcreate_r).
I understand that it is mostly a question of taste, and I'm inclined to choose the allocating API because of the convenience when doing lazy initialization (by testing if the object pointer is NULL).
However, after reading Ulrich's paper (PDF), I'm wondering if this design will cause locality of reference problems, especially if I compose objects from others:
struct opaque_composite {
struct objectx *member1;
struct objecty *member2;
struct objectz *member2;
/* ... */
};
Allocation of such an object will make a cascade of other sub-allocations. Is this a problem in practice? And are there other issues that I should be aware of?
The thing to consider is whether the type of the object the function constructs is opaque. An opaque type is only forward-declared in the header file and the only thing you can do with it is having a pointer to it and passing that pointer to separately compiled API functions. FILE in the standard library is such an opaque type. For an opaque type, you have no option but have to provide an allocation and a deallocation function as the user has no other way to obtain a reference to an object of that type.
If the type is not opaque – that is, the definition of the struct is in the header file – it is more versatile to have a function that does only initialization – and, if required, another that does finalization – but no allocation and deallocation. The reason is that with this interface, the user can decide whether to put the objects on the stack…
struct widget w;
widget_init(&w, 42, "lorem ipsum");
// use widget…
widget_fini(&w);
…or on the heap.
struct widget * wp = malloc(sizeof(struct widget));
if (wp == NULL)
exit(1); // or do whatever
widget_init(wp, 42, "lorem ipsum");
// use widget…
widget_fini(wp);
free(wp);
If you think that this is too much typing, you – or your users themselves – can easily provide convenience functions.
inline struct widget *
new_widget(const int n, const char *const s)
{
struct widget wp = malloc(sizeof(struct widget));
if (wp != NULL)
widget_init(wp, n, s);
return wp;
}
inline void
del_widget(struct widget * wp)
{
widget_fini(wp);
free(wp);
}
Going the other way round is not possible.
Interfaces should always provide the essential building blocks to compose higher-level abstractions but not make legitimate uses impossible by being overly restrictive.
Of course, this leaves us with the question when to make a type opaque. A good rule of thumb – that I have first seen in the coding standards for the Linux kernel – might be to make types opaque only if there are no data members your users could meaningfully access. I think this rule should be refined a little to take into account that non-opaque types allow for “member” functions to be provided as inline versions in the header files which might be desirable from a performance point of view. On the other hand, opaque types provide better encapsulation (especially since C has no way to restrict access to a struct's members). I would also lean towards an opaque type more easily if making it not opaque would force me to #include headers into the header file of my library because they provide definitions of the types used as members in my type. (I'm okay with #includeing <stdint.h> for uint32_t. I'm a little less easy about #includeing a large header such as <unistd.h> and I'd certainly try to avoid having to #include a header from a third-party library such as <curses.h>.)
IMO the "cascade of sub-allocations" is not a problem if you keep the object opaque so you can keep it in a consistent state. The creation and destruction routines will have some added complexity dealing with an allocation failure part way through creation, but nothing too onerous.
Besides the option to have a static/stack-allocated copy (which I'm generally not fond of anyway), in my mind, the main advantage of a scheme like:
x = initThang(thangPtr);
is the ease of returning a variety of more specific error codes.
if you want to cut to the chase, please skip down to the last two paragraphs. If you're interested in my predicament and the steps I've taken to solve it, continue reading directly below.
I am currently developing portions of a C library as part of my internship. So naturally, there are some parts of code which should not be accessible to the user while others should be. I am basically developing several architecture-optimized random number generators (RNG's)(uniform, Gaussian, and exponential distributed numbers). The latter two RNG's depend on the uniform generator , which is in a different kernel (project). So, in the case that the user wants to use more than one RNG, I want to make sure I'm not duplicating code needlessly since we are constrained with memory (no point in having the same function defined multiple times at different addresses in the code segment).
Now here's where the problem arises. The convention for all other kernels in the library is that we have a two header files and two C files (one each for the natural C implementation and the optimized C version (which may use some intrinsic functions and assembly and/or have some restrictions to make it faster and better for our architecture). This is followed by another C file (a testbench) where our main function is located and it tests both implementations and compares the results. With that said, we cannot really add an additional header file for private or protected items nor can we add a global header file for all these generators.
To combat this restriction, I used extern functions and extern const int's in the C files which depend on the uniform RNG rather than #define's at the top of each C file in order to make the code more portable and easily modified in one place. This worked for the most part.
However, the tricky bit is that we are using an internal type within these kernels (which should not be seen by the user and should not be placed in the header file). Again, for portability, I would like to be able to change the definition of this typedef in one place rather than in multiple places in multiple kernels since the library may be used for another platform later on and for the algorithms to work it is critical that I use 32-bit types.
So basically I'm wondering if there's any way I can make a typedef "protected" in C. That is, I need it to be visible among all C files which need it, but invisible to the user. It can be in one of the header files, but must not be visible to the user who will be including that header file in his/her project, whatever that may be.
============================Edit================================
I should also note that the typedef I am using is an unsigned int. so
typedef unsigned int myType
No structures involved.
============================Super Edit==========================
The use of stdint.h is also forbidden :(
I am expanding on Jens Gustedt’s answer since the OP still has questions.
First, it is unclear why you have separate header files for the two implementations (“natural C” and “optimized C”). If they implement the same API, one header should serve for either.
Jens Gustedt’s recommendation is that you declare a struct foo in the header but define it only in the C source file for the implementation and not in the header. A struct declared in this way is an incomplete type, and source code that can only see the declaration, and not the definition, cannot see what is in the type. It can, however, use pointers to the type.
The declaration of an incomplete struct may be as simple as struct foo. You can also define a type, such as typedef struct foo foo; or typedef struct foo Mytype;, and you can define a type that is a pointer to the struct, such as typedef struct foo *FooPointer;. However, these are merely for convenience. They do not alter the basic notion, that there is a struct foo that API users cannot see into but that they can have pointers to.
Inside the implementation, you would fully define the struct. If you want an unsigned int in the struct, you would use:
struct foo
{
unsigned int x;
};
In general, you define the struct foo to contain whatever data you like.
Since the API user cannot define struct foo, you must provide functions to create and destroy objects of this type as necessary. Thus, you would likely have a function declared as extern struct foo *FooAlloc(some parameters);. The function creates a struct foo object (likely by calling malloc or a related function), initializes it with data from the parameters, and returns a pointer to the object (or NULL if the creation or initialization fails). You would also have a function extern void FooFree(struct foo *p); that frees a struct foo object. You might also have functions to reset, set, or alter the state of a foo object, functions to copy foo objects, and functions to report about foo objects.
Your implementations could also define some global struct foo objects that could be visible (essentially by address only) to API users. As a matter of good design, this should be done only for certain special purposes, such as to provide instances of struct foo objects with special meanings, such as a constant object with a permanent “initial state” for copying.
Your two implementations, the “natural C” and the “optimized C” implementations may have different definitions for the struct foo, provided they are not both used in a program together. (That is, each entire program is compiled with one implementation or the other, not both. If necessary, you could mangle both into a program by using a union, but it is preferable to avoid that.)
This is not a singleton approach.
Just do
typedef struct foo foo;
These are two declarations, a forward declaration of a struct and a type alias with the same name. Forward declared struct can be used to nothing else than to define pointers to them. This should give you enough abstraction and type safety.
In all your interfaces you'd have
extern void proc(foo* a);
and you'd have to provide functions
extern foo* foo_alloc(size_t n);
extern void foo_free(foo* a);
This would bind your users as well as your library to always use the same struct. Thereby the implementation of foo is completely hidden to the API users. You could even one day to decide to use something different than a struct since users should use foo without the struct keyword.
Edit: Just a typedef to some kind of integer wouldn't help you much, because these are only aliases for types. All your types aliased to unsigned could be used interchangeably. One way around this would be to encapsulate them inside a struct. This would make your internal code a bit ugly, but the generated object code should be exactly the same with a good modern compiler.
I have a header and a sample application using this header, all in C, I get almost all the logic of this software except for this; this the interesting part of the header:
struct A;
typedef struct A A;
in the C application this A is only used when declaring a pointer like this
A* aName;
I'm quite sure that this is a solution for just including A in the scope/namespace and give just a name to a basically void pointer, because this kind of pointer is only used to handle some kind of data, it is more like some namespace sugar.
What this could be for?
You're correct that it's like a void pointer, in that void is an incomplete type, and in this file A is also an incomplete type. About all you can do with incomplete types is pass around pointers to them.
It has one advantage over void* in this file, that it's a different and incompatible type from some other bit of code that has done the same thing with B. So you get a bit of type safety. If A is windowHandle and B is jpgHandle, then you can't pass the wrong one to a function.
It has an advantage over void* in the .c file that defines the functions that accept an A* -- that file can contain a definition of struct A, and give A whatever members it wants, that the first file doesn't need to know about.
However, you say there are no other mentions of A in any header file, which means there are no functions that accept or return it. You also say that the only use of A in your source file is to declare pointers -- I wonder where the values of those pointers come from, if any.
If all that happens if that someone defines an uninitialized A* and never uses it, then clearly this is a remnant of some old code, or the start of some code that never got written, and it shouldn't be in the file at all.
Finally, if the real type is called something a bit less stupid than A, then the name might give a clue to its use.
I assume struct A is a forward declaration. It most likely is defined in one of the .c-files.
Doing so struct A's members are private to the module defining it.
This is an example of an opaque pointer, which is useful for passing handles. See http://en.wikipedia.org/wiki/Opaque_pointer for some further info. What may be interesting here from a C++ perspective, is the notion that you can define a class with a member that is a pointer to an (as yet) undefined struct. Although this struct is thus not yet defined in the header, in some later cpp implementation this struct is given body, and the compiler does the rest. This strategy is also called the Pimpl idiom (more of which you will find LOTS on the internet). Microsoft discusses it briefly at http://msdn.microsoft.com/en-us/library/hh438477.aspx.
Can a typedef struct be used without knowing its type?
e.g. There is a module on another embedded microcontroller that expects a struct and the struct is sent from another board and the struct is a typedef struct. Can the expected struct be accessed. Can its data be read?
Another question that arises, is how are structs usually sent around systems and the developer using them needs to know the structs fields.
Are the modules that declared them just included and the developer needs to find out the fields?
Can structs data be accessed without knowing its fields?
If you have an incomplete struct type, you should not be accessing its data. However, you can pass around pointers to that type just fine, and code that knows the complete type of the struct can access the data the pointer points to.
If you want to manipulate the data of the struct in two different modules, you will need to have the complete type declaration in both of them. This is usually put into a header file.
mystruct.h
#ifndef _MYSTRUCT_H
#define _MYSTRUCT_H
typedef struct mystruct{
int a;
int b;
} mystruct;
#endif
foo.c
#include "mystruct.h"
int foo(mystruct m){
return m.a;
}
bar.c
#include "mystruct.h"
int bar(mystruct m){
return m.b;
}
To access any of the fields of a struct (whether or not it is typedefed) a complete declaration of the struct must be visible at the point where the code attempts to access the field. Which physical board produces the data is entirely irrelevant.
"Complete declaration" and "visible" are technical terms with precise definitions that are too lengthy to get into here. For what you're asking, this approximation should be good enough: a struct declaration is complete if and only if it has this form
struct foo {
/* list of fields */
};
And it's visible if it appears at top level, textually above the function(s) that attempt to access fields of the struct. Usually, the declaration would come from a header file, but there's no requirement that it do so (remember that #include operates on text, not on the symbol table, unlike say Java import).
By contrast, if all you have is a declaration like this
struct foo;
then the type is incomplete and the only thing you can do with the struct (to first order) is pass around pointers to it.
Can a typedef struct be used without
knowing its type?
Well, both yes and no. You only need to include a header file declaring the struct if you need to access the fields, but you don't need to include if you are just passing a pointer forward, i.e. relaying some parameter as you are are moving between the abstraction layers.
Another question that arises, is how
are structs usually sent around
systems and the developer using them
needs to know the structs fields.
When the struct has been declared and seen by the compiler, the compiler knows that struct of type X takes up so and so number of bytes in memory and how the data is ordered. If there are four 32 bits integers declared after each other, they will be aligned next to each other in memory for 128 bits or 16 bytes. The header file defines this like a contract. "If you include me, here's how many bytes I take up in memory and here are the different types that belong to me".
Are the modules that declared them
just included and the developer needs
to find out the fields?
I'm not really sure of what you mean here. The developer can also take a look at the header file (just like the compiler does) to see the SAME contract, but obviously explained through a higher abstraction layer, i.e. the human readable code. So he/she can know that the first field in the struct is called fooField. The developer then knows that he can access that field through that name or identifier, e.g.
NumberStruct someNumberStruct;
getSomeNumbers(&someNumberStruct);
int number = someNumberStruct.fooField;
Can structs data be accessed without
knowing its fields?
Here's the yes from the first question. A pointer is just pointing to some address in memory, as long as you have access to write and read that memory, you can do anything. You could in fact pass around stuff as a void* (i.e. type less pointer) and manually read bytes from that same contract, you "know" that the struct is so and so large in memory and the order of the fields, because you have taken a look at the code :) It's obviously a bit dangerous since you must be sure that the other side of that contract hasn't changed, then fun stuff could happen :) So as soon as ANYTHING in some struct has changed, you must update all code that utilize that contract without including the header file.
Hope this could shed some light onto your structs :)