I am trying to declare a generic variable type in C (I can't us C++), and I have in my mind the following options.
Option1
typedef struct
{
void *value;
ElementType_e type;
} Data_t;
Option 2
typedef struct {
ElementType_e type;
union {
int a;
float b;
char c;
} my_union;
} my_struct;
where ElementType_e is an enum that holds all the possible type of variables (e.g. int, char, unsigned int, etc..). I am kinda leaning toward option 1, because I don't believe casting will add extra computational time, compared to switch, right?
I am just wondering which type is more useful? I know option 1 will require casting every-time being used/accessed. is there any possible issues that could happen with casting ( especially with running/compiling the code on different platform, e.g 32 bits and 16 bits micro)
While option2 require a switch () to do any operation (e.g. addition, ...).
The following link explained that Option 2 is better ( from readability point of view), but i mainly concern about the code size and computational cost.
Generic data type in C [ void * ]
is there any possible issues that could happen with casting
No, as you do not want cast, as there is no need to cast when assigning from/to a void-pointer (in C).
I am just wondering which type is more useful?
Both do, so it depends, as
1 is for the lazy (as it's few typing, and few different variables' names to remember).
2 is for the cautious (as it's type-save, as opposed to option 1, where the "real" type info is lost, so you can even assign a variable's address of a type not in ElementType_e).
Referring a comment:
Regarding performance I expect no major difference between both approaches (if implemented sanely), as both options need condtional statments on assigning to/from (exception here are pointer variables, which for option 1 go without conditonal statements for assignment).
I'd recommend using a union. In fact, I've used one myself in a similar situation:
union sockaddr_u {
struct sockaddr_storage ss;
struct sockaddr_in sin;
struct sockaddr_in6 sin6;
};
I use this union in socket code where I could be working with either IPv4 or IPv6 addresses. In this particular case, the "type" field is actually the first field in each of the inner structs (ss.ss_family, sin.sin_family, and sin6.sin6_family).
I think the problem is not well posed, since there are infinite possible data types definable by the programmer. Consider for example the following sequence:
typedef char S0_t;
typedef struct { S0_t x; } S1_t;
typedef struct { S1_t x; } S2_t;
typedef struct { S2_t x; } S3_t;
It's pretty clear that it's possible to follow indefinitely in order to define as many new types as we want.
So, there is not a straight manner to handle this possibilities.
On the other hand, as pointers are of more flexible nature, you can take the decision of defining a generic type concerned only with pointer types.
Thus, the types used in your project will have to be only pointers.
In this way, probably something very simple like the following definition could work:
typedef void* generic_t;
Related
For two (or more) structs: Base and Sub with a common first (unnamed) struct, is it safe to convert/cast from Base to Sub and vice versa?
struct Base{
struct{
int id;
// ...
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
// same '...'
};
// actual data
};
Are these functions guaranteed to be safe and technically correct? (Also: is the zero-length char data[] member necessary and useful?)
struct Base * subToBase(struct Sub * s){
return (struct Base*)s;
}
struct Sub * baseToSub(struct Base * b){
if(b->id != SUB_ID){
return NULL;
}
return (struct Sub*)b;
}
Edit
I have no plans to nest any further than Base within Sub, but rather leave the possibility to add other sub-types (directly under Base) later without needing to change Base. My main concern is whether pointers to the structs can be safely converted back and forth between Base and any sub. References to the (C11) standard would be most appreciated.
Edit v2
Changed the wording slightly to discourage OOP/inheritance discussions. What I want is a tagged-union, without the union so it can be extended later. I have no plans for doing additional nesting. Sub-types that need other sub-types' functionality can do so explicitly, without doing any further nesting.
Context
For a script interpreter1 I have made a pseudo object-oriented tagged-union type system, without the union. It has an (abstract) generic base type Object with several (specific) sub-types, such as String, Number, List etc. Every type-struct has the following unnamed struct as the first member:
#define OBJHEAD struct{ \
int id; \
int line; \
int column; \
}
The id identifies the type of object, line and column should (also) be self-explanatory. A simplified implementation of various objects:
typedef struct Object{
OBJHEAD;
char data[]; // necessary?
} Object;
typedef struct Number{
OBJHEAD;
int value; // only int for simplicity
} Number;
typedef struct String{
OBJHEAD;
size_t length;
char * string;
} String;
typedef struct List{
OBJHEAD;
size_t size;
Object * elements; // may be any kind and mix of objects
} List;
Object * Number_toObject(Number * num){
return (Object*)num;
}
Number * Number_fromObject(Object * obj){
if(obj->type != TYPE_NUMBER){
return NULL;
}
return (Number*)obj;
}
I know that the most elegant and technically correct way to do this would be to use an enum for the id and a union for the various sub-types. But I want the type-system to be extensible (through some form of type-registry) so that types can be added later without changing all the Object-related code.
A later/external addition could be:
typedef struct File{
OBJHEAD;
FILE * fp;
} File;
without needing to change Object.
Are these conversions guaranteed to be safe?
(As for the small macro-abuse: the OBJHEAD will of course be extensively documented so additional implementers will know what member-names not to use. The idea is not to hide the header, but to save pasting it every time.)
Converting a pointer to one object type to a pointer to a different object type (via a cast, for instance) is permitted, but if the resulting pointer is not correctly aligned then behavior is undefined (C11 6.3.2.3/7). Depending on the members of Base and Sub and on implentation-dependent behavior, it is not necessarily the case that a Base * converted to a Sub * is correctly aligned. For example, given ...
struct Base{
struct{
int id;
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
};
long long int value;
};
... it may be that the implementation permits Base objects to be aligned on 32-bit boundaries but requires Sub objects to be aligned on 64-bit boundaries, or even on stricter ones.
None of this is affected by whether Base has a flexible array member.
It is a different question whether it is safe to dereference a pointer value of one type that was obtained by casting a pointer value of a different type. For one thing, C places rather few restrictions on how implementations choose to lay out structures: members must be laid out in the order they are declared, and there must not be any padding before the first one, but otherwise, implementations have free reign. To the best of my knowledge, in your case there is no requirement that the anonymous struct members of your two structures must be laid out the same way as each other if they have more than one member. (And if they have only one member then why use an anonumous struct?) It is also not safe to assume that Base.data starts at the same offset as the first element following the anonymous struct in Sub.
In practice, dereferencing the result of your subToBase() is probably ok, and you can certainly implement tests to verify that. Also, if you have a Base * that was obtained by conversion from a Sub *, then the result of converting it back, for instance via baseToSub(), is guaranteed to be the same as the original Sub * (C11 6.3.2.3/7 again). In that case, the conversion to Base * and back has no effect on the safety of dereferencing the the pointer as a Sub *.
On the other hand, though I'm having trouble finding a reference for it in the standard, I have to say that baseToSub() is very dangerous in the general context. If a Base * that does not actually point to a Sub is converted to Sub * (which in itself is permitted), then it is not safe to dereference that pointer to access members not shared with Base. In particular, given my declarations above, if the referenced object is in fact a Base, then Base.data being declared in no way prevents ((Sub *)really_a_Base_ptr)->value from producing undefined behavior.
To avoid all undefined and implementation-defined behavior, you want an approach that avoids casting and ensures consistent layout. #LoPiTaL's suggestion to embed a typed Base structure inside your Sub structures is a good approach in that regard.
No it is not safe, at least not under all circumstances. If your compiler sees two pointers p and q that have different base type, it may always assume that they don't alias, or stated in other words it may always assume that *p and *q are different objects.
Your cast punches a hole in that assumption. That is if you have a function
double foo(struct A* p, struct B* q) {
double b = q->field0;
*p = (struct A*){ 0 };
return b + q->field0; // compiler may return 2*b
}
the optimizer is allowed to avoid the additional read from memory.
If you'd know that no function will ever see the same object through differently typed pointers, you would be safe. But such an assertion is not made easily, so you'd better avoid such hackery.
It is correct, since it is guaranteed to have the same alignment on the first member of the struct, so you can cast from one struct to another.
Nevertheless, the common way to implement your behaviour is to "inherit" the base class:
//Base struct definition
typedef struct Base_{
int id;
// ...
//char data[]; //This is not needed.
}Base;
//Subclass definition
typedef struct Sub_{
Base base; //Note: this is NOT a pointer
// actual data
}Sub;
So now, you can cast a Sub struct into a Base struct, or just return the first member, which already is of type Base, so there is no need of casting anymore.
One word of caution: do not abuse MACROS. MACROS are nice and good for a lot of things, but abusing them may lead to difficult to read and maintain code.
In this case, the macro is easily replaced with the base member.
One final word, your macro is error prone, since the member names are now hidden. On the end, you may be adding new members with the same name, and getting weird errors without knowing why.
When you further expand your hierarchy into sub-subclasses, you will end up having to write ALL the base classes MACRO, while if you use the "inherit" aproach, you will have to write only the direct base.
Neither of these solutions actually solve your problem: inheritance. The only real solution you would have (the preferred) would be to change to a trully OO language. Due to similarity to C, the best match would be C++, but could do any other language.
I have two physical entities which have same structure. What is the ideal way to handle this scenario.
For example:
struct graphlet_vector {
int a;
int* data;
}
struct group_vector {
int a;
int *data;
}
group vectors are defined as aggregated graphlet vectors over a range hence the structure is bound to remain same over the period of time. Hence having two structure for the same thing seems redundant.
I cannot use something like this:
typedef struct graphlet_vec struct group_vec(although the following works fine: typedef struct graphlet_vec group_vec).
Any suggestions?
If you really want them to be the same type, you can use two typedefs:
struct g_vector { /* ... */ };
typedef struct g_vector graph_vector;
typedef struct g_vector group_vector;
If you prefer to refer to the types as struct graph_vector and struct group_vector rather than just by a single identifier for each, there's no really clean way to do it (struct foo and struct bar are always distinct types) -- but you could use the preprocessor:
struct g_vector { /* ... */ };
#define graph_vector g_vector
#define group_vector g_vector
On the other hand, you should consider whether you really want them to be the same type. They have different names, so they obviously have different meanings to you; otherwise you would have defined a single type. Is there going to be code that can operate on an object of either type? If not, keep them distinct.
If you want to treat them as distinct types in some cases, but as the same type in others, that's a bit trickier. You can either define them as the same type and be careful that you don't mix up the operations in cases where you want them to be distinct (you lose some compile-time type checking that way). Or you can define them as distinct types and implement the common functions using void*. (Or you can switch from C to C++ and derive them from a common parent type using class inheritance.)
I saw some C code like this:
// A:
typedef uint32_t in_addr_t;
struct in_addr { in_addr_t s_addr; };
And I always prefer like this:
// B:
typedef uint32_t in_addr;
So my question is: what's the difference / benefit of doing it in A from B?
It's a layer to introduce type safety, and it can be helpful 'for future expansion'.
One problem with the former is that it's easy to 'convert' a value of a type represented by a typedefed builtin to any of several other types or typedefed builtins.
consider:
typedef int t_millisecond;
typedef int t_second;
typedef int t_degrees;
versus:
// field notation could vary greatly here:
struct t_millisecond { int ms; };
struct t_second { int s; };
struct t_degrees { int f; };
In some cases, it makes it a little clearer to use a notation, and the compiler will also forbid erroneous conversions. Consider:
int a = millsecond * second - degree;
this is a suspicious program. using typedefed ints, that's a valid program. Using structs, it's ill-formed -- compiler errors will require your corrections, and you can make your intent explicit.
Using typedefs, arbitrary arithmetic and conversions may be applied, and they may be assigned to each other without warning, which can can become a burden to maintain.
Consider also:
t_second s = millisecond;
that would also be a fatal conversion.
It's just another tool in the toolbox -- use at your discretion.
Justin's answer is essentially correct, but I think some expansion is needed:
EDIT: Justin expanded his answer significantly, which makes this one somewhat redundant.
Type safety - you want to provide your users with API functions which manipulate the data, not let it just treat it as an integer. Hiding the field in a structure makes it harder to use it the wrong way, and pushes the user towards the proper API.
For future expansion - perhaps a future implementation would like to change things. Maybe add a field, or break the existing field into 4 chars. With a struct, this can be done without changing APIs.
What's your benefit? That your code won't break if implementation changes.
typedef intptr_t ngx_int_t;
typedef uintptr_t ngx_uint_t;
typedef intptr_t ngx_flag_t;
What can we benifit from this ?I can't think of one to be honest...
The above code are from the famous nginx project,check it if interested.
One of the typedef purposes is portability. E.g. different compilers and platforms have various type sizes, e.g. sizeof(int) on x86, Linux, gcc is not the same as on Texas Instrument's processors :) But it's still int.
So,
typedef int INT32
saves one when porting the code.
Another purpose of typedef, is to declare types in order to make shorter declarations.
typedef sharted_ptr<MyClass> MyClassPtr;
And now, you can use MyClassPtr as a type, instead of writing the whole shared_ptr... string.
And the very common usage of typedef with structures:
typedef struct {
int x;
int y;
} Point;
or
struct point {
int x;
int y;
}
typedef struct point Point;
Both typedefs let you avoid typing struct keyword every time.
It's often done for code portability, and is particularly relevant for embedded systems.
Suppose you have some integer values that absolutlely MUST be 32-bits long. Maybe they need to map to network/disk structures, maybe they need to hold values of that magnitude, or whatever.
Now, suppose you develop your code on a compiler where 'int' is 32 bits. You write...
struct s {
int a,b,c,d;
}
...and it works fine. But, if you need to switch to a compiler where int is only 16-bits, but long is 32, you would need to change all those declarations to
struct s {
long a,b,c,d;
}
Worse yet, you can't do just search/replace, because some of the 'ints' you probably don't care about the size. So, the best approach is to to this:
typedef long INT32; // change this typedef according to compiler
struct s {
INT32 a,b,c,d;
}
Then, all you need to is change the typedefs.
I know two reasons:
Aliasing, turning complex declaration something simpler
Portability, at different architecture, a type could be differently just, as very simple example: u32, where at some places could be defined as unsigned int, other unsigned long type.
The reason could be that they wish to change the pointer type when porting the code. On another system, there might be different addressing modes ("banking" etc), and then they might need to use non-standard syntax, like
typedef far intptr_t ngx_int_t;
If they never port the code to any system with more than one addressing mode on the same system, portability of pointers would never be an issue and the typedef would be redundant.
One of the reason I have seen people do this is that they think the if there is a need to change the actual type they just have to change the typedef. I am not too convined about that argument though.
I'm writing a dynamically-typed language. Currently, my objects are represented in this way:
struct Class { struct Class* class; struct Object* (*get)(struct Object*,struct Object*); };
struct Integer { struct Class* class; int value; };
struct Object { struct Class* class; };
struct String { struct Class* class; size_t length; char* characters; };
The goal is that I should be able to pass everything around as a struct Object* and then discover the type of the object by comparing the class attribute. For example, to cast an integer for use I would simply do the following (assume that integer is of type struct Class*):
struct Object* foo = bar();
// increment foo
if(foo->class == integer)
((struct Integer*)foo)->value++;
else
handleTypeError();
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
There are alternatives to this approach:
struct Object
{
struct Class* class;
union Value
{
struct Class c;
int i;
struct String s;
} value;
};
The problem here is that the union uses up as much space as the size of the largest thing that can be stored in the union. Given that some of my types are many times as large as my other types, this would mean that my small types (int) would take up as much space as my large types (map) which is an unacceptable tradeoff.
struct Object
{
struct Class* class;
void* value;
};
This creates a level of redirection that will slow things down. Speed is a goal here.
The final alternative is to pass around void*s and manage the internals of the structure myself. For example, to implement the type test mentioned above:
void* foo = bar();
// increment foo
if(*((struct Class*) foo) == integer)
(*((int*)(foo + sizeof(struct Class*))))++;
else
handleTypeError();
This gives me everything I want (portability, different sizes for different types, etc.) but has at least two downsides:
Hideous, error-prone C. The code above only calculates a single-member offset; it will get much worse with types more complex than integers. I might be able to alleviate this a bit using macros, but this will be painful no matter what.
Since there is no struct that represents the object, I don't have the option of stack allocations (at least without implementing my own stack on the heap).
Basically, my question is, how can I get what I want without paying for it? Is there a way to be portable, have variance in size for different types, not use redirection, and keep my code pretty?
EDIT: This is the best response I've ever received for an SO question. Choosing an answer was hard. SO only allows me to choose one answer so I chose the one that lead me to my solution, but you all received upvotes.
See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C. The Python solution can be directly applied to your problem. Essentially you want to do this:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.
C gives you sufficient guarantees that your first approach will work. The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:
union allow_aliasing {
struct Class class;
struct Object object;
struct Integer integer;
struct String string;
};
(You don't need to ever use the union for anything - it just has to be in scope)
I believe the relevant part of the standard is this:
[#5] With one exception, if the value
of a member of a union object is used
when the most recent store to the
object was to a different member, the
behavior is implementation-defined.
One special guarantee is made in order
to simplify the use of unions: If a
union contains several structures that
share a common initial sequence (see
below), and if the union object
currently contains one of these
structures, it is permitted to inspect
the common initial part of any of them
anywhere that a declaration of the
completed type of the union is
visible. Two structures share a common
initial sequence if corresponding
members have compatible types (and,
for bit-fields, the same widths) for a
sequence of one or more initial
members.
(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).
There are 3 major approaches for implementing dynamic types and which one is best depends on the situation.
1) C-style inheritance: The first one is shown in Josh Haberman's answer. We create a type-hierarchy using classic C-style inheritance:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
Functions with dynamically typed arguments receive them as Object*, inspect the class member, and cast as appropriate. The cost to check the type is two pointer hops. The cost to get the underlying value is one pointer hop. In approaches like this one, objects are typically allocated on the heap since the size of objects is unknown at compile time. Since most `malloc implementations allocate a minimum of 32 bytes at a time, small objects can waste a significant amount of memory with this approach.
2) Tagged union: We can remove a level of indirection for accessing small objects using the "short string optimization"/"small object optimization":
struct Object {
struct Class* class;
union {
// fundamental C types or other small types of interest
bool as_bool;
int as_int;
// [...]
// object pointer for large types (or actual pointer values)
void* as_ptr;
};
};
Functions with dynamically typed arguments receive them as Object, inspect the class member, and read the union as appropriate. The cost to check the type is one pointer hop. If the type is one of the special small types, it is stored directly in the union, and there is no indirection to retrieve the value. Otherwise, one pointer hop is required to retrieve the value. This approach can sometimes avoid allocating objects on the heap. Although the exact size of an object still isn't known at compile time, we now know the size and alignment (our union) needed to accommodate small objects.
In these first two solutions, if we know all the possible types at compile time, we can encode the type using an integer type instead of a pointer and reduce type check indirection by one pointer hop.
3) Nan-boxing: Finally, there's nan-boxing where every object handle is only 64 bits.
double object;
Any value corresponding to a non-NaN double is understood to simply be a double. All other object handles are a NaN. There are actually large swaths of bit values of double precision floats that correspond to NaN in the commonly used IEEE-754 floating point standard. In the space of NaNs, we use a few bits to tag types and the remaining bits for data. By taking advantage of the fact that most 64-bit machines actually only have a 48-bit address space, we can even stash pointers in NaNs. This method incurs no indirection or extra memory use but constrains our small object types, is awkward, and in theory is not portable C.
Section 6.2.5 of ISO 9899:1999 (the C99 standard) says:
A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.
Section 6.7.2.1 also says:
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[...]
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
This guarantees what you need.
In the question you say:
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works.
This will work on all platforms. It also means that your first alternative - what you are currently using - is safe enough.
But on another platform struct StringInteger might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
No compliant compiler is allowed to do that. [I replaced String by Integer assuming you were referring to the first set of declarations. On closer examination, you might have been referring to the structure with an embedded union. The compiler still isn't allowed to reorder class and value.]
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
I believe you're wrong here. First, because your struct String doesn't have a value member. Second, because I believe C does guarantee the layout in memory of your struct's members. That's why the following are different sizes:
struct {
short a;
char b;
char c;
}
struct {
char a;
short b;
char c;
}
If C made no guarantees, then compilers would probably optimize both of those to be the same size. But it guarantees the internal layout of your structs, so the natural alignment rules kick in and make the second one larger than the first.
I appreciate the pedantic issues raised by this question and answers, but I just wanted to mention that CPython has used similar tricks "more or less forever" and it's been working for decades across a huge variety of C compilers. Specifically, see object.h, macros like PyObject_HEAD, structs like PyObject: all kinds of Python Objects (down at the C API level) are getting pointers to them forever cast back and forth to/from PyObject* with no harm done. It's been a while since I last played sea lawyer with an ISO C Standard, to the point that I don't have a copy handy (!), but I do believe that there are some constraints there that should make this keep working as it has for nearly 20 years...