I recently came across a colleague's code that looked like this:
typedef struct A {
int x;
}A;
typedef struct B {
A a;
int d;
}B;
void fn(){
B *b;
((A*)b)->x = 10;
}
His explanation was that since struct A was the first member of struct B, so b->x would be the same as b->a.x and provides better readability.
This makes sense, but is this considered good practice? And will this work across platforms? Currently this runs fine on GCC.
Yes, it will work cross-platform(a), but that doesn't necessarily make it a good idea.
As per the ISO C standard (all citations below are from C11), 6.7.2.1 Structure and union specifiers /15, there is not allowed to be padding before the first element of a structure
In addition, 6.2.7 Compatible type and composite type states that:
Two types have compatible type if their types are the same
and it is undisputed that the A and A-within-B types are identical.
This means that the memory accesses to the A fields will be the same in both A and B types, as would the more sensible b->a.x which is probably what you should be using if you have any concerns about maintainability in future.
And, though you would normally have to worry about strict type aliasing, I don't believe that applies here. It is illegal to alias pointers but the standard has specific exceptions.
6.5 Expressions /7 states some of those exceptions, with the footnote:
The intent of this list is to specify those circumstances in which an object may or may not be aliased.
The exceptions listed are:
a type compatible with the effective type of the object;
some other exceptions which need not concern us here; and
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union).
That, combined with the struct padding rules mentioned above, including the phrase:
A pointer to a structure object, suitably converted, points to its initial member
seems to indicate this example is specifically allowed for. The core point we have to remember here is that the type of the expression ((A*)b) is A*, not B*. That makes the variables compatible for the purposes of unrestricted aliasing.
That's my reading of the relevant portions of the standard, I've been wrong before (b), but I doubt it in this case.
So, if you have a genuine need for this, it will work okay but I'd be documenting any constraints in the code very close to the structures so as to not get bitten in future.
(a) In the general sense. Of course, the code snippet:
B *b;
((A*)b)->x = 10;
will be undefined behaviour because b is not initialised to something sensible. But I'm going to assume this is just example code meant to illustrate your question. If anyone's concerned about it, think of it instead as:
B b, *pb = &b;
((A*)pb)->x = 10;
(b) As my wife will tell you, frequently and with little prompting :-)
I'll go out on a limb and oppose #paxdiablo on this one: I think it's a fine idea, and it's very common in large, production-quality code.
It's basically the most obvious and nice way to implement inheritance-based object oriented data structures in C. Starting the declaration of struct B with an instance of struct A means "B is a sub-class of A". The fact that the first structure member is guaranteed to be 0 bytes from the start of the structure is what makes it work safely, and it's borderline beautiful in my opinion.
It's widely used and deployed in code based on the GObject library, such as the GTK+ user interface toolkit and the GNOME desktop environment.
Of course, it requires you to "know what you're doing", but that is generally always the case when implementing complicated type relationships in C. :)
In the case of GObject and GTK+, there's plenty of support infrastructure and documentation to help with this: it's quite hard to forget about it. It might mean that creating a new class isn't something you do just as quickly as in C++, but that's perhaps to be expected since there's no native support in C for classes.
That's a horrible idea. As soon as someone comes along and inserts another field at the front of struct B your program blows up. And what is so wrong with b.a.x?
Anything that circumvents type checking should generally be avoided.
This hack rely on the order of the declarations and neither the cast nor this order can be enforced by the compiler.
It should work cross-platform, but I don't think it is a good practice.
If you really have deeply nested structures (you might have to wonder why, however), then you should use a temporary local variable to access the fields:
A deep_a = e->d.c.b.a;
deep_a.x = 10;
deep_a.y = deep_a.x + 72;
e->d.c.b.a = deep_a;
Or, if you don't want to copy a along:
A* deep_a = &(e->d.c.b.a);
deep_a->x = 10;
deep_a->y = deep_a->x + 72;
This shows from where a comes and it doesn't require a cast.
Java and C# also regularly expose constructs like "c.b.a", I don't see what the problem is. If what you want to simulate is object-oriented behaviour, then you should consider using an object-oriented language (like C++), since "extending structs" in the way you propose doesn't provide encapsulation nor runtime polymorphism (although one may argue that ((A*)b) is akin to a "dynamic cast").
I am sorry to disagree with all the other answers here, but this system is not compliant to standard C. It is not acceptable to have two pointers with different types which point to the same location at the same time, this is called aliasing and is not allowed by the strict aliasing rules in C99 and many other standards. A less ugly was of doing this would be to use in-line getter functions which then do not have to look neat in that way. Or perhaps this is the job for a union? Specifically allowed to hold one of several types, however there are a myriad of other drawbacks there too.
In short, this kind of dirty casting to create polymorphism is not allowed by most C standards, just because it seems to work on your compiler does not mean it is acceptable. See here for an explanation of why it is not allowed, and why compilers at high optimization levels can break code which does not follow these rules http://en.wikipedia.org/wiki/Aliasing_%28computing%29#Conflicts_with_optimization
Yes, it will work. And it is one of the core principle of Object Oriented using C. See this answer 'Object-orientation in C' for more examples about extending (i.e inheritance).
This is perfectly legal, and, in my opinion, pretty elegant. For an example of this in production code, see the GObject docs:
Thanks to these simple conditions, it is possible to detect the type
of every object instance by doing:
B *b;
b->parent.parent.g_class->g_type
or, more quickly:
B *b;
((GTypeInstance*)b)->g_class->g_type
Personally, I think that unions are ugly and tend to lead towards huge switch statements, which is a big part of what you've worked to avoid by writing OO code. I write a significant amount of code myself in this style --- typically, the first member of the struct contains function pointers that can be made to work like a vtable for the type in question.
I can see how this works but I would not call this good practice. This is depending on how the bytes of each data structure is placed in memory. Any time you are casting one complicated data structure to another (ie. structs), it's not a very good idea, especially when the two structures are not the same size.
I think the OP and many commenters have latched onto the idea that the code is extending a struct.
It is not.
This is and example of composition. Very useful. (Getting rid of the typedefs, here is a more descriptive example ):
struct person {
char name[MAX_STRING + 1];
char address[MAX_STRING + 1];
}
struct item {
int x;
};
struct accessory {
int y;
};
/* fixed size memory buffer.
The Linux kernel is full of embedded structs like this
*/
struct order {
struct person customer;
struct item items[MAX_ITEMS];
struct accessory accessories[MAX_ACCESSORIES];
};
void fn(struct order *the_order){
memcpy(the_order->customer.name, DEFAULT_NAME, sizeof(DEFAULT_NAME));
}
You have a fixed size buffer that is nicely compartmentalized. It sure beats a giant single tier struct.
struct double_order {
struct order order;
struct item extra_items[MAX_ITEMS];
struct accessory extra_accessories[MAX_ACCESSORIES];
};
So now you have a second struct that can be treated (a la inheritance) exactly like the first with an explicit cast.
struct double_order d;
fn((order *)&d);
This preserves compatibility with code that was written to work with the smaller struct. Both the Linux kernel (http://lxr.free-electrons.com/source/include/linux/spi/spi.h (look at struct spi_device)) and bsd sockets library (http://beej.us/guide/bgnet/output/html/multipage/sockaddr_inman.html) use this approach. In the kernel and sockets cases you have a struct that is run through both generic and differentiated sections of code. Not all that different than the use case for inheritance.
I would NOT suggest writing structs like that just for readability.
I think Postgres does this in some of their code as well. Not that it makes it a good idea, but it does say something about how widely accepted it seems to be.
Perhaps you can consider using macros to implement this feature, the need to reuse the function or field into the macro.
Related
I guess there must be a duplicated question here but I couldn't find it. I'm recently working on a C project and, while trying to leave the code as concise as possible, I considered typedef-ing a consistently-used array with a certain type.
As an example, suppose the array of a structure type entry has always the fixed length of MAX_N_ENTRIES. I'd like to reduce the redundancy by rewriting the code;
struct entry ents[MAX_N_ENTRIES];
to this code;
typedef struct entry entry_arr_t[MAX_N_ENTRIES];
entry_arr_t ents;
What I'm concerning about is that, as the array type obviously should be handled in a different way to any primitive types in C, this kind of typedef-ing can cause confusion in the future, making it look like an alias of primitives.
Yes, it's possible to create a typedef for an array type -- and there's even an example in the Standard C library, namely the jmp_buf type that's used with setjmp and longjmp.
It's usually considered poor style, however, because type names are usually assumed to refer to first-class types that you can do every ordinary first-class-type thing with, and in particular: assign them. But of course you can't assign arrays in C, because they're not first-class types.
In other words, given the typedef in your question, a later programmer might assume that it would be possible to write
entry_arr_t ents1, ents2;
...
ents1 = ents2;
But of course that assignment would fail.
The fact that you've included "arr" in the typedef name does indeed mitigate this concern, making it less likely that the hypothetical later programmer would make the bad assumption.
How to literally translate the following empty C struct inside struct to Delphi (from winnt.h):
typedef struct _TP_CALLBACK_ENVIRON_V3 {
...
struct _ACTIVATION_CONTEXT *ActivationContext;
...
} TP_CALLBACK_ENVIRON_V3;
I'm inclined to use just Pointer since this structure must not be manipulated and it's a pointer anyway. I'm just curious how would one translate it literally (if possible). I was thinking about something like this:
type
PActivationContext = ^TActivationContext;
TActivationContext = record
end;
TTPCallbackEnvironV3 = record
...
ActivationContext: PActivationContext;
...
end;
But, you know, an empty record... So, how would you literally translate the above structure to Delphi ?
The C struct is what is known as an incomplete type. The C code is a common technique used to implement an opaque pointer. By implementing it this way in C you have type safety in the sense that variables of type struct _ACTIVATION_CONTEXT* are not assignment compatible with other pointers. Well, apart from void* pointers which are assignment compatible with all pointer types.
In Delphi there is no such thing as an incomplete type. So I think that the best solution is exactly what you have proposed. It's not particularly important to mimic the C code exactly. What you are aiming for is to have the benefits, specifically type safety. And what you propose is probably as good as you will get.
On the other hand, it depends how visible this type is. If it is very private, perhaps declared only in the implementation section of a unit, and used sparingly, then you may take the stance that declaring an empty record is a little over the top. You may conclude that PActivationContext = Pointer is reasonable.
Today, I have just noticed a statement in a C struct, and to be honest I was like WTF at first. It is like;
struct foo {
void *private;
//Some other members
};
Believe or not this struct is being compiled without any error. So what is the purpose of adding such a line (void *private)?
In pure C there's no private keyword, so the above is perfectly legal, albeit a very bad idea.
This would be invalid C++ though, and a C++ compiler would surely yield an error.
void* are in C often used to hide the actual data type used, effectively hiding some implementation details from the interface.
Actually you have stumbled upon an important difference between C and C++, the way structures are implemented.
In C, structures contains can contain only primitive and composite datatypes, whereas C++ structures gives more functionality, since the structures in C++ are similar to classes than structures in C, hence they provide additional functionality such as
Ability to classify members as private,public or protected.
Can contain member functions.
Structures in C++, can be used as a tool to enforce object oriented methods, since all OO functionality like inheritance, which is applicable to classes , holds good for structures as well.
So in short, the above code is valid C, but invalid C++.
everyone. I actually have two questions, somewhat related.
Question #1: Why is gcc letting me declare variables after action statements? I thought the C89 standard did not allow this. (GCC Version: 4.4.3) It even happens when I explicitly use --std=c89 on the compile line. I know that most compilers implement things that are non-standard, i.e. C compilers allowing // comments, when the standard does not specify that. I'd like to learn just the standard, so that if I ever need to use just the standard, I don't snag on things like this.
Question #2: How do you cope without objects in C? I program as a hobby, and I have not yet used a language that does not have Objects (a.k.a. OO concepts?) -- I already know some C++, and I'd like to learn how to use C on it's own. Supposedly, one way is to make a POD struct and make functions similar to StructName_constructor(), StructName_doSomething(), etc. and pass the struct instance to each function - is this the 'proper' way, or am I totally off?
EDIT: Due to some minor confusion, I am defining what my second question is more clearly: I am not asking How do I use Objects in C? I am asking How do you manage without objects in C?, a.k.a. how do you accomplish things without objects, where you'd normally use objects?
In advance, thanks a lot. I've never used a language without OOP! :)
EDIT: As per request, here is an example of the variable declaration issue:
/* includes, or whatever */
int main(int argc, char *argv[]) {
int myInt = 5;
printf("myInt is %d\n", myInt);
int test = 4; /* This does not result in a compile error */
printf("Test is %d\n", test);
return 0;
}
c89 doesn't allow this, but c99 does. Although it's taken a long time to catch on, some compilers (including gcc) are finally starting to implement c99 features.
IMO, if you want to use OOP, you should probably stick to C++ or try out Objective C. Trying to reinvent OOP built on top of C again just doesn't make much sense.
If you insist on doing it anyway, yes, you can pass a pointer to a struct as an imitation of this -- but it's still not a good idea.
It does often make sense to pass (pointers to) structs around when you need to operate on a data structure. I would not, however, advise working very hard at grouping functions together and having them all take a pointer to a struct as their first parameter, just because that's how other languages happen to implement things.
If you happen to have a number of functions that all operate on/with a particular struct, and it really makes sense for them to all receive a pointer to that struct as their first parameter, that's great -- but don't feel obliged to force it just because C++ happens to do things that way.
Edit: As far as how you manage without objects: well, at least when I'm writing C, I tend to operate on individual characters more often. For what it's worth, in C++ I typically end up with a few relatively long lines of code; in C, I tend toward a lot of short lines instead.
There is more separation between the code and data, but to some extent they're still coupled anyway -- a binary tree (for example) still needs code to insert nodes, delete nodes, walk the tree, etc. Likewise, the code for those operations needs to know about the layout of the structure, and the names given to the pointers and such.
Personally, I tend more toward using a common naming convention in my C code, so (for a few examples) the pointers to subtrees in a binary tree are always just named left and right. If I use a linked list (rare) the pointer to the next node is always named next (and if it's doubly-linked, the other is prev). This helps a lot with being able to write code without having to spend a lot of time looking up a structure definition to figure out what name I used for something this time.
#Question #1: I don't know why there is no error, but you are right, variables have to be declared at the beginning of a block. Good thing is you can declare blocks anywhere you like :). E.g:
{
int some_local_var;
}
#Question #2: actually programming C without inheritance is sometimes quite annoying. but there are possibilities to have OOP to some degree. For example, look at the GTK source code and you will find some examples.
You are right, functions like the ones you have shown are common, but the constructor is commonly devided into an allocation function and an initialization function. E.G:
someStruct* someStruct_alloc() { return (someStruct*)malloc(sizeof(someStruct)); }
void someStruct_init(someStruct* this, int arg1, arg2) {...}
In some libraries, I have even seen some sort of polymorphism, where function pointers are stored within the struct (which have to be set in the initializing function, of course). This results in a C++ like API:
someStruct* str = someStruct_alloc();
someStruct_init(str);
str->someFunc(10, 20, 30);
Regarding OOP in C, have you looked at some of the topics on SO? For instance, Can you write object oriented code in C?.
I can't put my finger on an example, but I think they enforce an OO like discipline in Linux kernel programming as well.
In terms of learning how C works, as opposed to OO in C++, you might find it easier to take a short course in some other language that doesn't have an OO derivative -- say, Modula-2 (one of my favorites) or even BASIC (if you can still find a real BASIC implementation -- last time I wrote BASIC code it was with the QBASIC that came with DOS 5.0, later compiled in full Quick BASIC).
The methods you use to get things done in Modula-2 or Pascal (barring the strong typing, which protects against certain types of errors but makes it more complicated to do certain things) are exactly those used in non-OO C, and working in a language with different syntax might (probably will, IMO) make it easier to learn the concepts without your "programming reflexes" kicking in and trying to do OO operations in a nearly-familiar language.
In C if I declare a struct/union/enum:
struct Foo { int i ... }
when I want to use my structure I need to specify the tag:
struct Foo foo;
To loose this requirement, I have to alias my structure using typedef:
typedef struct Foo Foo;
Why not have all types/structs/whatever in the same "namespace" by default? What is the rationale behind the decision of requiring the declaration tag at each variable declaration (unless typdefe'd) ???
Many other languages do not make this distinction, and it seems that it's only bringing an extra level of complexity IMHO.
Structures/records were a very early pre-C addition to B, just after Dennis Ritchie added a the basic 'typed' structure. I believe that the original struct syntax did not have a tag at all, for every variable you made an anonymous struct:
struct {
int i;
char a[5];
} s;
Later, the tag was added to enable reuse of structure layout, but it wasn't really regarded as real 'type'. Also, removing the struct/union would make parsing impossible:
/* is Foo a union or a struct? */
Foo { int i; double x; };
Foo s;
or break the 'declaration syntax mimics expression syntax' paradigm that is so fundamental to C.
I suspect that typedef was added much later, possible a few years after the 'birth' of C.
The argument "C was the highest level language at the time." does not seem true. Algol-68 predates it and has records as proper types. The same holds for Pascal.
If you like to know more about the history of C you might find Ritchie's "The Development of the C Language" an interesting read.
Well, other languages also usually support namespaces. C doesn't.
It probably isn't the reason, but it makes sense to have at least this natural namespace.
Interesting question! Here are my thoughts.
When C was created, little abstraction existed over assembly language. There was FORTRAN, B, and others, but when C came to be it was arguably the highest level language in existence. It's goal was to provide functionality and syntax powerful enough to create and maintain an operating system, and it succeed remarkably.
Think that, at the time, porting a system to a new platform meant rewriting and adapting components to the platform's assembly language. With the release of C, it eventually came down to porting the C compiler, and recompiling existent code.
It was probably an asset back then that the very syntax of the language forced you to differentiate between types that could fit in a register, and types that couldn't.
Language syntax has evolved a lot since then, and most of the things we're used to see in modern languages are missing in C. User-defined namespaces is only one of them, and I don't think the concept of "syntax sugar" even existed back then. Or rather, C was the peak of syntax sugar.
We're surrounded with things like this. I mean, take a look at your keyboard: why do we ave a PAUSE/BREAK key? I don't think I've pressed that key for years.
It's inheritance from a time in which it made sense.