What are the reasons for casting a void pointer? - c

I'm learning C++ from scratch, and as such I don't have an expert understanding of C. In C++, you can't cast a void pointer to whatever, and I understand the reasons behind that. However, I know that in C, you can. What are the possible reasons for this? It just seems like it's be a huge hole in type safety, which (to me) seems like a bad thing.

You can cast a void* to another pointer in both languages. Perhaps you meant implicitly.
It's very convenient in C to not have to be explicit about it. In C++ we have templates, so to write generic code there's no need for void* casting and whatnot. In C there is no choice. A generic container has to hold void* to objects, and it's much easier to repeatedly say mydata* d = node; then it is mydata* d = (mydata*)node;.
So it's pretty much like you said. In C type safety in general didn't receive as much emphasis as it did in C++, especially when it came to void* because it was suppose to be a simple generic pointer to whatever. There's no need for that in C++, so better make it explicit when you're dealing with it.

What are the possible reasons for [casting a void * pointer in C]? Isn't this a giant hole in type safety?
It's the only possible way to support polymorphism, aka generic programming. There's no other way to make, e.g., a generic hash table. Polymorphism in C is wildly unsafe, but it's the only polymorphism there is.
Be glad that C++ has parametric polymorphism (one of the many functions of templates).

One reason: if you use sort to sort an array of structs, and you have a comparison function for the two structs, you'll need to cast the void pointers to pointers to the structs to access members of the struct.

Related

In C, is it a fair coding practice to typedef an array type?

I guess there must be a duplicated question here but I couldn't find it. I'm recently working on a C project and, while trying to leave the code as concise as possible, I considered typedef-ing a consistently-used array with a certain type.
As an example, suppose the array of a structure type entry has always the fixed length of MAX_N_ENTRIES. I'd like to reduce the redundancy by rewriting the code;
struct entry ents[MAX_N_ENTRIES];
to this code;
typedef struct entry entry_arr_t[MAX_N_ENTRIES];
entry_arr_t ents;
What I'm concerning about is that, as the array type obviously should be handled in a different way to any primitive types in C, this kind of typedef-ing can cause confusion in the future, making it look like an alias of primitives.
Yes, it's possible to create a typedef for an array type -- and there's even an example in the Standard C library, namely the jmp_buf type that's used with setjmp and longjmp.
It's usually considered poor style, however, because type names are usually assumed to refer to first-class types that you can do every ordinary first-class-type thing with, and in particular: assign them. But of course you can't assign arrays in C, because they're not first-class types.
In other words, given the typedef in your question, a later programmer might assume that it would be possible to write
entry_arr_t ents1, ents2;
...
ents1 = ents2;
But of course that assignment would fail.
The fact that you've included "arr" in the typedef name does indeed mitigate this concern, making it less likely that the hypothetical later programmer would make the bad assumption.

Extending a struct in C

I recently came across a colleague's code that looked like this:
typedef struct A {
int x;
}A;
typedef struct B {
A a;
int d;
}B;
void fn(){
B *b;
((A*)b)->x = 10;
}
His explanation was that since struct A was the first member of struct B, so b->x would be the same as b->a.x and provides better readability.
This makes sense, but is this considered good practice? And will this work across platforms? Currently this runs fine on GCC.
Yes, it will work cross-platform(a), but that doesn't necessarily make it a good idea.
As per the ISO C standard (all citations below are from C11), 6.7.2.1 Structure and union specifiers /15, there is not allowed to be padding before the first element of a structure
In addition, 6.2.7 Compatible type and composite type states that:
Two types have compatible type if their types are the same
and it is undisputed that the A and A-within-B types are identical.
This means that the memory accesses to the A fields will be the same in both A and B types, as would the more sensible b->a.x which is probably what you should be using if you have any concerns about maintainability in future.
And, though you would normally have to worry about strict type aliasing, I don't believe that applies here. It is illegal to alias pointers but the standard has specific exceptions.
6.5 Expressions /7 states some of those exceptions, with the footnote:
The intent of this list is to specify those circumstances in which an object may or may not be aliased.
The exceptions listed are:
a type compatible with the effective type of the object;
some other exceptions which need not concern us here; and
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union).
That, combined with the struct padding rules mentioned above, including the phrase:
A pointer to a structure object, suitably converted, points to its initial member
seems to indicate this example is specifically allowed for. The core point we have to remember here is that the type of the expression ((A*)b) is A*, not B*. That makes the variables compatible for the purposes of unrestricted aliasing.
That's my reading of the relevant portions of the standard, I've been wrong before (b), but I doubt it in this case.
So, if you have a genuine need for this, it will work okay but I'd be documenting any constraints in the code very close to the structures so as to not get bitten in future.
(a) In the general sense. Of course, the code snippet:
B *b;
((A*)b)->x = 10;
will be undefined behaviour because b is not initialised to something sensible. But I'm going to assume this is just example code meant to illustrate your question. If anyone's concerned about it, think of it instead as:
B b, *pb = &b;
((A*)pb)->x = 10;
(b) As my wife will tell you, frequently and with little prompting :-)
I'll go out on a limb and oppose #paxdiablo on this one: I think it's a fine idea, and it's very common in large, production-quality code.
It's basically the most obvious and nice way to implement inheritance-based object oriented data structures in C. Starting the declaration of struct B with an instance of struct A means "B is a sub-class of A". The fact that the first structure member is guaranteed to be 0 bytes from the start of the structure is what makes it work safely, and it's borderline beautiful in my opinion.
It's widely used and deployed in code based on the GObject library, such as the GTK+ user interface toolkit and the GNOME desktop environment.
Of course, it requires you to "know what you're doing", but that is generally always the case when implementing complicated type relationships in C. :)
In the case of GObject and GTK+, there's plenty of support infrastructure and documentation to help with this: it's quite hard to forget about it. It might mean that creating a new class isn't something you do just as quickly as in C++, but that's perhaps to be expected since there's no native support in C for classes.
That's a horrible idea. As soon as someone comes along and inserts another field at the front of struct B your program blows up. And what is so wrong with b.a.x?
Anything that circumvents type checking should generally be avoided.
This hack rely on the order of the declarations and neither the cast nor this order can be enforced by the compiler.
It should work cross-platform, but I don't think it is a good practice.
If you really have deeply nested structures (you might have to wonder why, however), then you should use a temporary local variable to access the fields:
A deep_a = e->d.c.b.a;
deep_a.x = 10;
deep_a.y = deep_a.x + 72;
e->d.c.b.a = deep_a;
Or, if you don't want to copy a along:
A* deep_a = &(e->d.c.b.a);
deep_a->x = 10;
deep_a->y = deep_a->x + 72;
This shows from where a comes and it doesn't require a cast.
Java and C# also regularly expose constructs like "c.b.a", I don't see what the problem is. If what you want to simulate is object-oriented behaviour, then you should consider using an object-oriented language (like C++), since "extending structs" in the way you propose doesn't provide encapsulation nor runtime polymorphism (although one may argue that ((A*)b) is akin to a "dynamic cast").
I am sorry to disagree with all the other answers here, but this system is not compliant to standard C. It is not acceptable to have two pointers with different types which point to the same location at the same time, this is called aliasing and is not allowed by the strict aliasing rules in C99 and many other standards. A less ugly was of doing this would be to use in-line getter functions which then do not have to look neat in that way. Or perhaps this is the job for a union? Specifically allowed to hold one of several types, however there are a myriad of other drawbacks there too.
In short, this kind of dirty casting to create polymorphism is not allowed by most C standards, just because it seems to work on your compiler does not mean it is acceptable. See here for an explanation of why it is not allowed, and why compilers at high optimization levels can break code which does not follow these rules http://en.wikipedia.org/wiki/Aliasing_%28computing%29#Conflicts_with_optimization
Yes, it will work. And it is one of the core principle of Object Oriented using C. See this answer 'Object-orientation in C' for more examples about extending (i.e inheritance).
This is perfectly legal, and, in my opinion, pretty elegant. For an example of this in production code, see the GObject docs:
Thanks to these simple conditions, it is possible to detect the type
of every object instance by doing:
B *b;
b->parent.parent.g_class->g_type
or, more quickly:
B *b;
((GTypeInstance*)b)->g_class->g_type
Personally, I think that unions are ugly and tend to lead towards huge switch statements, which is a big part of what you've worked to avoid by writing OO code. I write a significant amount of code myself in this style --- typically, the first member of the struct contains function pointers that can be made to work like a vtable for the type in question.
I can see how this works but I would not call this good practice. This is depending on how the bytes of each data structure is placed in memory. Any time you are casting one complicated data structure to another (ie. structs), it's not a very good idea, especially when the two structures are not the same size.
I think the OP and many commenters have latched onto the idea that the code is extending a struct.
It is not.
This is and example of composition. Very useful. (Getting rid of the typedefs, here is a more descriptive example ):
struct person {
char name[MAX_STRING + 1];
char address[MAX_STRING + 1];
}
struct item {
int x;
};
struct accessory {
int y;
};
/* fixed size memory buffer.
The Linux kernel is full of embedded structs like this
*/
struct order {
struct person customer;
struct item items[MAX_ITEMS];
struct accessory accessories[MAX_ACCESSORIES];
};
void fn(struct order *the_order){
memcpy(the_order->customer.name, DEFAULT_NAME, sizeof(DEFAULT_NAME));
}
You have a fixed size buffer that is nicely compartmentalized. It sure beats a giant single tier struct.
struct double_order {
struct order order;
struct item extra_items[MAX_ITEMS];
struct accessory extra_accessories[MAX_ACCESSORIES];
};
So now you have a second struct that can be treated (a la inheritance) exactly like the first with an explicit cast.
struct double_order d;
fn((order *)&d);
This preserves compatibility with code that was written to work with the smaller struct. Both the Linux kernel (http://lxr.free-electrons.com/source/include/linux/spi/spi.h (look at struct spi_device)) and bsd sockets library (http://beej.us/guide/bgnet/output/html/multipage/sockaddr_inman.html) use this approach. In the kernel and sockets cases you have a struct that is run through both generic and differentiated sections of code. Not all that different than the use case for inheritance.
I would NOT suggest writing structs like that just for readability.
I think Postgres does this in some of their code as well. Not that it makes it a good idea, but it does say something about how widely accepted it seems to be.
Perhaps you can consider using macros to implement this feature, the need to reuse the function or field into the macro.

Private Keyword in a C Struct

Today, I have just noticed a statement in a C struct, and to be honest I was like WTF at first. It is like;
struct foo {
void *private;
//Some other members
};
Believe or not this struct is being compiled without any error. So what is the purpose of adding such a line (void *private)?
In pure C there's no private keyword, so the above is perfectly legal, albeit a very bad idea.
This would be invalid C++ though, and a C++ compiler would surely yield an error.
void* are in C often used to hide the actual data type used, effectively hiding some implementation details from the interface.
Actually you have stumbled upon an important difference between C and C++, the way structures are implemented.
In C, structures contains can contain only primitive and composite datatypes, whereas C++ structures gives more functionality, since the structures in C++ are similar to classes than structures in C, hence they provide additional functionality such as
Ability to classify members as private,public or protected.
Can contain member functions.
Structures in C++, can be used as a tool to enforce object oriented methods, since all OO functionality like inheritance, which is applicable to classes , holds good for structures as well.
So in short, the above code is valid C, but invalid C++.

Template in ansi C?

How I can create function with other types of data (some struct or sth)? In C++ exist templates, but in C?
I hear about void *, but i dont know if it works.
Any ideas?
Well, the way to do it is with void *. You might also need use function pointers, for example if you need to compare generic values.
The other way to do it is to use xmacros, but that's generally more for reducing code duplication for very similar structures.
void * is the solution in C as any pointer has the same sizeof() as void *. Of course, you get no type safety, but it's as good an abstraction as you can get with C. Furthermore, you could look at stdarg.h and variadic functions, but again, you should keep track yourself of what you're doing, since the compiler won't aid you one bit.

Genericity vs type-safety? Using void* in C

Coming from OO (C#, Java, Scala) I value very highly the principles of both code reuse and type-safety. Type arguments in the above languages do the job and enable generic data structures which are both type-safe and don't 'waste' code.
As I get stuck into C, I'm aware that I have to make a compromise and I'd like it to be the right one. Either my data structures have a void * in each node / element and I lose type safety or I have to re-write my structures and code for each type I want to use them with.
The complexity of the code is an obvious factor: iterating through an array or a linked-list is trivial and adding a *next to a struct is no extra effort; in these cases it makes sense not to try and re-use structures and code. But for more complicated structures the answer isn't so obvious.
There's also modularity and testability: separating out the type and its operations from the code that uses the structure makes testing it easier. The inverse is also true: testing the iteration of some code over a structure whilst it's trying to do other things gets messy.
So what's your advice? void * and reuse or type-safety and duplicated code? Are there any general principles? Am I trying to force OO onto procedural when it won't fit?
Edit: Please don't recommend C++, my question is about C!
I would say use void * so you can re-use the code. It's more work to re-implement e.g. a linked list, than to make sure you get/set the data in the list properly.
Take as many hints from glib as possible, I find their data structures very nice and easy to use, and have had little trouble because of the loss of type safety.
I think you'll have to strike a balance between the two, just as you suggest. If the code is only a few lines and trivial I would duplicate it but if it's more complex, I would consider working with void* to avoid having to do any potential bug fixing and maintenance in several places and also to reduce the code size.
If you look at the C runtime library, there's several "generic" functions that work with void*, one common example is sorting with qsort. It would be madness to duplicate this code for every type you'd like to sort.
There's nothing wrong with using void pointers. You don't even have to cast them when assigning them to a variable of type of pointer since the conversion is done internally. It migtht be worth having a look at this: http://www.cpax.org.uk/prg/writings/casting.php
The answer this question is the same as getting efficient templates for link list in C++.
a) Create an abstract version of the algorithm that uses void* or some Abstracted Type
b) Create a light weight public interface to call the Abstracted Type algorithms and caste between them.
For example.
typedef struct simple_list
{
struct simple_list* next;
} SimpleList;
void add_to_list( SimpleList* listTop, SimpleList* element );
SimpleList* get_from_top( SimpleList* listTop );
// the rest
#define ListType(x) \
void add_ ## x ( x* l, x* e ) \
{ add_to_list( (SimpleList*)l, (SimpleList*)x ); } \
void get_ ## x ( x* l, x* e ) \
{ return (x*) get_from_to( (SimpleList*)l ); } \
/* the rest */
typedef struct my_struct
{
struct my_struct* next;
/* rest of my stuff */
} MyStruct;
ListType(MyStruct)
MyStruct a;
MyStruct b;
add_MyStruct( &a, &b );
MyStruct* c = get_MyStruct(&a);
etc etc.
We use OO in C a lot here, but only for encapsulation and abstraction, no polymorphism or so.
Which means we have specific types, like FooBar(Foo a, ...) but, for our collection "classes", we use void *. Just use void * where multiple types could be used, BUT, by doing so, ensure you don't need the argument to be of a specific type. As per collection, having void * is alright, because the collection doesn't care about the type. But if your function can accept type a and type b but none other, make two variants, one for a and one for b.
The main point is to use a void * only when you don't care about the type.
Now, if you have 50 types with the same base structure (let's say, int a; int b; as first members of all types), and want a function to act upon those types, just make the common first members a type by itself, then make the function accept this, and pass object->ab or (AB*)object is your type is opaque, both will work if ab is the first field in your struct.
You can use macros, they will work with any type and the compiler will check statically the expanded code. The downside is that the code density (in the binary) will worsen and they are more difficult to debug.
I asked this question about generic functions some time ago and the answers could help you.
You can efficiently add type information, inheritance and polymorphism to C data structures, that's what C++ does. (http://www.embedded.com/97/fe29712.htm)
Definitely generic void*, never duplicate code!
Take into account that this dilemma was considered by many a C programmer, and many major C projects. All serious C projects I've ever encountered, whether open-source or commercial, picked the generic void*. When used carefully and wrapped into a good API, it is barely a burden on the user of the library. Moreover, void* is idiomatic C, recommended directly in K&R2. It is the way people expect code to be written, and anything else would be surprising and badly accepted.
You can build a (sort of) OO framework using C, but you miss out on a lot of the benefits ... like an OO type system that the compiler understands. If you insist on doing OO in a C-like language, C++ is a better choice. It is more complicated than vanilla C, but at least you get proper linguistic support for OO.
EDIT: Ok ... if you insist that we don't recommend C++, I recommend that you don't do OO in C. Happy? As far as your OO habits are concerned, you should probably think in terms of "objects", but leave inheritance and polymorphism out of your implementation strategy. Genericity (using function pointers) should be used sparingly.
EDIT 2: Actually, I think that use of void * in a generic C list is reasonable. It is just trying to build an mock OO framework using macros, function pointers, dispatching and that kind of nonsense that I think is a bad idea.
In Java all collections from java.util package in effect hold equivalent of void* pointer ( the Object ).
Yes, generics ( introduced in 1.5 ) add syntactic sugar and prevent you from coding unsafe assignments, however the storage type remains Object.
So, I think there is no OO crime commited when you use void* for generic framework type.
I would also add type-specific inlines or macro wrappers that assign/retrieve data from the generic structures if you do this often in your code.
P.S. The one thing that you should NOT do is to use void** to return allocated/reallocated generic types. If you check the signatures of malloc/realloc you will see that you can achieve correct memory allocations without dreaded void** pointer. I am only telling this because I've seen this in some open-source project, that I do not wish to name here.
A generic container can be wrapped with a little work so that it can be instantiated in type-safe versions. Here is an example, full headers linked below:
/* generic implementation */
struct deque *deque_next(struct deque *dq);
void *deque_value(const struct deque *dq);
/* Prepend a node carrying `value` to the deque `dq` which may
* be NULL, in which case a new deque is created.
* O(1)
*/
void deque_prepend(struct deque **dq, void *value);
From the header that can be used to instantiate specific wrapped types of deque
#include "deque.h"
#ifndef DEQUE_TAG
#error "Must define DEQUE_TAG to use this header file"
#ifndef DEQUE_VALUE_TYPE
#error "Must define DEQUE_VALUE_TYPE to use this header file"
#endif
#else
#define DEQUE_GEN_PASTE_(x,y) x ## y
#define DEQUE_GEN_PASTE(x,y) DEQUE_GEN_PASTE_(x,y)
#define DQTAG(suffix) DEQUE_GEN_PASTE(DEQUE_TAG,suffix)
#define DQVALUE DEQUE_VALUE_TYPE
#define DQREF DQTAG(_ref_t)
typedef struct {
deque_t *dq;
} DQREF;
static inline DQREF DQTAG(_next) (DQREF ref) {
return (DQREF){deque_next(ref.dq)};
}
static inline DQVALUE DQTAG(_value) (DQREF ref) {
return deque_value(ref.dq);
}
static inline void DQTAG(_prepend) (DQREF *ref, DQVALUE val) {
deque_prepend(&ref->dq, val);
}
deque.h: http://ideone.com/eDNBN
deque_gen.h: http://ideone.com/IkJRq

Resources