C: Why is casting from void pointer to function pointer undefined? [duplicate]

C: Why is casting from void pointer to function pointer undefined? [duplicate] - c

I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?

An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.

Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.
The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.
It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.
The C99 Rationale says:
6.3.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of these
architectures feature uniform pointers which are the size of some integer type, maximally
portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.
The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.
Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.

For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.
So for instance for the Compact model (small code, large data):
sizeof(void *) > sizeof(void(*)())
and conversely in the Medium model (large code, small data):
sizeof(void *) < sizeof(void(*)())
In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).
Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.

Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).

In addition to what is already said here, it is interesting to look at POSIX dlsym():
The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:
fptr = (int (*)(int))dlsym(handle, "my_function");
Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

C++11 has a solution to the long-standing mismatch between C/C++ and POSIX with regard to dlsym(). One can use reinterpret_cast to convert a function pointer to/from a data pointer so long as the implementation supports this feature.
From the standard, 5.2.10 para. 8, "converting a function pointer to an object pointer type or vice versa is conditionally-supported." 1.3.5 defines "conditionally-supported" as a "program construct that an implementation is not required to support".

Depending on the target architecture, code and data may be stored in fundamentally incompatible, physically distinct areas of memory.

undefined doesn't necessarily mean not allowed, it can mean that the compiler implementor has more freedom to do it how they want.
For instance it may not be possible on some architectures - undefined allows them to still have a conforming 'C' library even if you can't do this.

Another solution:
Assuming POSIX guarantees function and data pointers to have the same size and representation (I can't find the text for this, but the example OP cited suggests they at least intended to make this requirement), the following should work:
double (*cosine)(double);
void *tmp;
handle = dlopen("libm.so", RTLD_LAZY);
tmp = dlsym(handle, "cos");
memcpy(&cosine, &tmp, sizeof cosine);
This avoids violating the aliasing rules by going through the char [] representation, which is allowed to alias all types.
Yet another approach:
union {
double (*fptr)(double);
void *dptr;
} u;
u.dptr = dlsym(handle, "cos");
cosine = u.fptr;
But I would recommend the memcpy approach if you want absolutely 100% correct C.

They can be different types with different space requirements. Assigning to one can irreversibly slice the value of the pointer so that assigning back results in something different.
I believe they can be different types because the standard doesn't want to limit possible implementations that save space when it's not needed or when the size could cause the CPU to have to do extra crap to use it, etc...

The only truly portable solution is not to use dlsym for functions, and instead use dlsym to obtain a pointer to data that contains function pointers. For example, in your library:
struct module foo_module = {
.create = create_func,
.destroy = destroy_func,
.write = write_func,
/* ... */
};
and then in your application:
struct module *foo = dlsym(handle, "foo_module");
foo->create(/*...*/);
/* ... */
Incidentally, this is good design practice anyway, and makes it easy to support both dynamic loading via dlopen and static linking all modules on systems that don't support dynamic linking, or where the user/system integrator does not want to use dynamic linking.

A modern example of where function pointers can differ in size from data pointers: C++ class member function pointers
Directly quoted from https://blogs.msdn.microsoft.com/oldnewthing/20040209-00/?p=40713/
class Base1 { int b1; void Base1Method(); };
class Base2 { int b2; void Base2Method(); };
class Derived : public Base1, Base2 { int d; void DerivedMethod(); };
There are now two possible this pointers.
A pointer to a member function of Base1 can be used as a pointer to a
member function of Derived, since they both use the same this
pointer. But a pointer to a member function of Base2 cannot be used
as-is as a pointer to a member function of Derived, since the this
pointer needs to be adjusted.
There are many ways of solving this. Here's how the Visual Studio
compiler decides to handle it:
A pointer to a member function of a multiply-inherited class is really
a structure.
[Address of function]
[Adjustor]
The size of a pointer-to-member-function of a class that uses multiple inheritance is the size of a pointer plus the size of a size_t.
tl;dr: When using multiple inheritance, a pointer to a member function may (depending on compiler, version, architecture, etc) actually be stored as
struct {
void * func;
size_t offset;
}
which is obviously larger than a void *.

On most architectures, pointers to all normal data types have the same representation, so casting between data pointer types is a no-op.
However, it's conceivable that function pointers might require a different representation, perhaps they're larger than other pointers. If void* could hold function pointers, this would mean that void*'s representation would have to be the larger size. And all casts of data pointers to/from void* would have to perform this extra copy.
As someone mentioned, if you need this you can achieve it using a union. But most uses of void* are just for data, so it would be onerous to increase all their memory use just in case a function pointer needs to be stored.

I know that this hasn't been commented on since 2012, but I thought it would be useful to add that I do know an architecture that has very incompatible pointers for data and functions since a call on that architecture checks privilege and carries extra information. No amount of casting will help. It's The Mill.

Related

Why exactly it is invalid to convert a function pointer to a pointer to void, or vice versa? [duplicate]

I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?

An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.

Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.
The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.
It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.
The C99 Rationale says:
6.3.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of these
architectures feature uniform pointers which are the size of some integer type, maximally
portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.
The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.
Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.

For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.
So for instance for the Compact model (small code, large data):
sizeof(void *) > sizeof(void(*)())
and conversely in the Medium model (large code, small data):
sizeof(void *) < sizeof(void(*)())
In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).
Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.

Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).

In addition to what is already said here, it is interesting to look at POSIX dlsym():
The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:
fptr = (int (*)(int))dlsym(handle, "my_function");
Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

C++11 has a solution to the long-standing mismatch between C/C++ and POSIX with regard to dlsym(). One can use reinterpret_cast to convert a function pointer to/from a data pointer so long as the implementation supports this feature.
From the standard, 5.2.10 para. 8, "converting a function pointer to an object pointer type or vice versa is conditionally-supported." 1.3.5 defines "conditionally-supported" as a "program construct that an implementation is not required to support".

Depending on the target architecture, code and data may be stored in fundamentally incompatible, physically distinct areas of memory.

undefined doesn't necessarily mean not allowed, it can mean that the compiler implementor has more freedom to do it how they want.
For instance it may not be possible on some architectures - undefined allows them to still have a conforming 'C' library even if you can't do this.

Another solution:
Assuming POSIX guarantees function and data pointers to have the same size and representation (I can't find the text for this, but the example OP cited suggests they at least intended to make this requirement), the following should work:
double (*cosine)(double);
void *tmp;
handle = dlopen("libm.so", RTLD_LAZY);
tmp = dlsym(handle, "cos");
memcpy(&cosine, &tmp, sizeof cosine);
This avoids violating the aliasing rules by going through the char [] representation, which is allowed to alias all types.
Yet another approach:
union {
double (*fptr)(double);
void *dptr;
} u;
u.dptr = dlsym(handle, "cos");
cosine = u.fptr;
But I would recommend the memcpy approach if you want absolutely 100% correct C.

They can be different types with different space requirements. Assigning to one can irreversibly slice the value of the pointer so that assigning back results in something different.
I believe they can be different types because the standard doesn't want to limit possible implementations that save space when it's not needed or when the size could cause the CPU to have to do extra crap to use it, etc...

The only truly portable solution is not to use dlsym for functions, and instead use dlsym to obtain a pointer to data that contains function pointers. For example, in your library:
struct module foo_module = {
.create = create_func,
.destroy = destroy_func,
.write = write_func,
/* ... */
};
and then in your application:
struct module *foo = dlsym(handle, "foo_module");
foo->create(/*...*/);
/* ... */
Incidentally, this is good design practice anyway, and makes it easy to support both dynamic loading via dlopen and static linking all modules on systems that don't support dynamic linking, or where the user/system integrator does not want to use dynamic linking.

A modern example of where function pointers can differ in size from data pointers: C++ class member function pointers
Directly quoted from https://blogs.msdn.microsoft.com/oldnewthing/20040209-00/?p=40713/
class Base1 { int b1; void Base1Method(); };
class Base2 { int b2; void Base2Method(); };
class Derived : public Base1, Base2 { int d; void DerivedMethod(); };
There are now two possible this pointers.
A pointer to a member function of Base1 can be used as a pointer to a
member function of Derived, since they both use the same this
pointer. But a pointer to a member function of Base2 cannot be used
as-is as a pointer to a member function of Derived, since the this
pointer needs to be adjusted.
There are many ways of solving this. Here's how the Visual Studio
compiler decides to handle it:
A pointer to a member function of a multiply-inherited class is really
a structure.
[Address of function]
[Adjustor]
The size of a pointer-to-member-function of a class that uses multiple inheritance is the size of a pointer plus the size of a size_t.
tl;dr: When using multiple inheritance, a pointer to a member function may (depending on compiler, version, architecture, etc) actually be stored as
struct {
void * func;
size_t offset;
}
which is obviously larger than a void *.

On most architectures, pointers to all normal data types have the same representation, so casting between data pointer types is a no-op.
However, it's conceivable that function pointers might require a different representation, perhaps they're larger than other pointers. If void* could hold function pointers, this would mean that void*'s representation would have to be the larger size. And all casts of data pointers to/from void* would have to perform this extra copy.
As someone mentioned, if you need this you can achieve it using a union. But most uses of void* are just for data, so it would be onerous to increase all their memory use just in case a function pointer needs to be stored.

I know that this hasn't been commented on since 2012, but I thought it would be useful to add that I do know an architecture that has very incompatible pointers for data and functions since a call on that architecture checks privilege and carries extra information. No amount of casting will help. It's The Mill.

Does a union of a pointer and an array of pointers guarantee that the single pointer has the same address as the first element of the array?

With the struct:
struct stree {
union {
void *ob;
struct stree *strees[256];
};
};
Do the C standards (particularly C11) guarantee that ob is an alias of stree[0]?

Maybe. The C standard only guarantees the following, 6.3.2.3:
A pointer to void may be converted to or from a pointer to any object type. A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
In theory/formally, a void* doesn't need to have the same memory representation as a pointer to object. Whenever it encounters such a conversion, it could in theory do some tricks like keeping track of the original value, to restore it later.
In practice, void* and object pointers have the same size and format on any sane system, because that's by far the easiest way to sate the above requirement. Systems with extended addressing modes for objects exist, but they don't work as the C standard intended. Instead they use non-standard extensions by inventing near and far pointer qualifiers, which could then be applied either to void* or an object pointer. (Some examples of this are low-end microcontrollers and old MS DOS.)
Bottom line: designing for portability to insane or fictional systems is a huge waste of time. Toss in a static_assert(sizeof(void*) == sizeof(int*), "...") and that should be enough. Because in practice you're not going to find a system with magical or mysterious void* format. If you do, deal with it then.
What you have to keep in mind though, is that void* in itself is a different type, so you can't generally access the memory where the void* is stored through another pointer type (strict pointer aliasing). But there are a few exceptions to the strict aliasing rule, type punning through unions is one of them.
So regarding your union, the void* and the first object pointer are guaranteed to be allocated starting at the same address, but again their respective sizes an internal formats could in theory be different. In practice you can assume that they are the same, and because you use a union you aren't breaking strict aliasing.
As for best practice, it is to avoid void* in the first place, as there's very few places where you actually need them. For the few cases where you need to pass some generic pointer around, it is often better to use uintptr_t and store the result as an integer.

How to make two otherwise identical pointer types incompatible

On certain architectures it may be necessary to have different pointer types for otherwise identical objects. Particularly for a Harvard architecture CPU, you may need something like:
uint8_t const ram* data1;
uint8_t const rom* data2;
Particularly this is how the definition of pointers to ROM / RAM looked like in MPLAB C18 (now discontinued) for PICs. It could define even things like:
char const rom* ram* ram strdptr;
Which means a pointer in RAM to pointers in RAM pointing to strings in ROM (using ram is not necessary as by default things are in RAM by this compiler, just added all for clarity).
The good thing in this syntax is that the compiler is capable to alert you when you try to assign in an incompatible manner, like the address of a ROM location to a pointer to RAM (so something like data1 = data2;, or passing a ROM pointer to a function using a RAM pointer would generate an error).
Contrary to this, in avr-gcc for the AVR-8, there is no such type safety as it rather provides functions to access ROM data. There is no way to distinguish a pointer to RAM from a pointer to ROM.
There are situations where this kind of type safety would be very beneficial to catch programming errors.
Is there some way to add similar modifiers to pointers in some manner (such as by preprocessor, expanding to something which could mimic this behavior) to serve this purpose? Or even something which warns on improper access? (in case of avr-gcc, trying to fetch values without using the ROM access functions)

One trick is to wrap the pointers in a struct. Pointers to struct have better type safety than pointers to the primitive data types.
typedef struct
{
uint8_t ptr;
} a_t;
typedef struct
{
uint8_t ptr;
} b_t;
const volatile a_t* a = (const volatile a_t*)0x1234;
const volatile b_t* b = (const volatile b_t*)0x5678;
a = b; // compiler error
b = a; // compiler error

You could encapsulate the pointer in different struct for RAM and ROM, making the type incompatible, but containing the same type of values.
struct romPtr {
void *addr;
};
struct ramPtr {
void *addr;
};
int main(int argc, char **argv) {
struct romPtr data1 = {NULL};
struct romPtr data3 = data1;
struct ramPtr data2 = data1; // <-- gcc would throw a compilation error here
}
During compilation :
$ cc struct_test.c
struct_test.c: In function ‘main’:
struct_test.c:12:24: error: invalid initializer
struct ramPtr data2 = data1;
^~~~~
You could of course typedefs the struct for brevity

Since I received several answers which offer different compromises on providing a solution, I decided to merge them in one, outlining the benefits and drawbacks of each. So you can choose the most appropriate for your particular situation
Named Address Spaces
For the particular problem of solving this, and only this case of ROM and RAM pointers on an AVR-8 micro, the most appropriate solution is this.
This was a proposal for C11 which didn't make it into the final standard, however there are C compilers which support it, including avr-gcc used for 8 bit AVRs.
The related documentation can be accessed here (part of the online GCC manual, also including other architectures using this extension). It is recommendable over other solutions (such as function-like macros in pgmspace.h for the AVR-8) as with this, the compiler can make the appropriate checks, while otherwise accessing the data pointed by remains clear and simple.
In particular, if you have a similar problem of porting something from a compiler which offered some sort of named address spaces, like MPLAB C18, this is likely the fastest and cleanest way to do it.
The ported pointers from above would look like as follows:
uint8_t const* data1;
uint8_t const __flash* data2;
char const __flash** strdptr;
(If possible, one could simplify the process using appropriate preprocessor definitions)
(Original answer by Olaf)
Struct encapsulation, pointer inside
This method aims to strenghten typing of pointers by wrapping them in structures. The intended usage is that you pass the structures themselves across interfaces, by which the compiler can perform type checks on them.
A "pointer" type to byte data could look like this:
typedef struct{
uint8_t* ptr;
}bytebuffer_ptr;
The pointed data can be accessed as follows:
bytebuffer_ptr bbuf;
(...)
bbuf.ptr = allocate_bbuf();
(...)
bbuf.ptr[index] = value;
A function prototype accepting such a type and returning one could look like as follows:
bytebuffer_ptr encode_buffer(bytebuffer_ptr inbuf, size_t len);
(Original answer by dvhh)
Struct encapsulation, pointer outside
Similar to the method above, it aims to strenghten typing of pointers by wrapping them in structures, but in a different manner, providing a more robust constraint. The data type to be pointed to is which is encapsulated.
A "pointer" type to byte data could look like this:
typedef struct{
uint8_t val;
}byte_data;
The pointed data can be accessed as follows:
byte_data* bbuf;
(...)
bbuf = allocate_bbuf();
(...)
bbuf[index].val = value;
A function prototype accepting such a type and returning one could look like as follows:
byte_data* encode_buffer(byte_data* inbuf, size_t len);
(Original answer by Lundin)
Which should I use?
Named Address Spaces in this regard don't need much discussion: They are the most appropriate solution if you only want to deal with a pecularity of your target handling address spaces. The compiler will provide you the compile-time checks you need, and you don't have to try to invent anything further.
If, however for other reasons you are interested in structure wrapping, these are matters which you may want to consider:
Both methods can be optimized just fine: at least GCC will generate identical code from either to using plain pointers. So you don't really have to consider performance: they should work.
Pointer inside is useful if you have either third-party interfaces to serve which demand pointers, or maybe if you are refactoring something so large which you can't do in one pass.
Pointer outside provides more robust type safety as you reinforce the pointed type itself with it: you have a true distinct type which you can't easily (accidentally) convert (implicit cast).
Pointer outside allows you to use modifiers on the pointer, such as adding const, which is important for creating robust interfaces (you can make data intended to be read only by a function const).
Keep in mind that some people might not like either of these, so if you are working in a group, or are creating code which might be reused by known parties, discuss the matter with them first.
Should be obvious, but keep in mind that encapsulating doesn't solve the problem of requiring special access code (such as by the pgmspace.h macros on an AVR-8), assuming no Named Address Spaces are used alongside with the method. It only provides a method to produce a compile error if you try to use a pointer by functions operating on a different address space than what it intends to point into.
Thank you for all the answers!

True harvard architectures use different instructions to access different types of memory like code (Flash on AVR), data (RAM), hardware peripheral registers (IO) and possibly others. The values of addresses in the ranges typically overlap, i.e. the same value accesses different internal devices, depending on the instruction.
Comming to C, if you want to use a unified pointer, this means you not only have to encode the address (value), but also the access type ("address space" in the following) in the pointer value. This can either be done using additional bits in a pointer's value, but also select the appropriate instruction at run-time for every access. This constitutes a significant overhead to the generated code. Additionally, often there are no spare bits in the "natural" value for at least some address spaces (e.g. all 16 bits of the pointer are used already for the address). So additional bits are required, at least a byte worth. This blows up memory usage (mostly RAM), too.
Both are typically unacceptable on typical MCUs using this architecture, because they are already quite limited. Fortunately, for most applications, it is absolutely unnecessary (or easily avoidable at least) to determine the address space at run-time.
To solve this problem all compilers for such a platform support some way to tell the compiler in which address space and object resides. Standard draft N1275 for the then-upcoming C11 proposed a standard way using "named address spaces". Unfortunately it did not make it into the final version, so we are left with compiler-extensions.
For gcc (see the documentation for other compilers), the developers implemented the original standard proposal. As the address spaces are target-specific, the code is not portable between different archittectures, but that is normally true for bare-metal embedded code anyway, nothing really lost.
Reading the documentation for AVR, an address space is simply used similar to a standard qualifier. The compiler will automatically emit the correct instructions to access the correct space. Also there is a unified address space which determines the area at run-time as explained above.
Address spaces work similar to qualifiers, there are stronger constraints to determine compatibility, i.e. when assigning pointers of different address spaces to each other. For a detailed description, see the proposal, chapter 5.
Conclusion:
named address spaces is what you want. They solve two problems:
Ensure pointers to incompatible address spaces can't be assigned to each other unnoticed.
Tell the compiler how to access the object, i.e. which instructions to use.
With regard to the other answers proposing structs, you have to specify the address space (and the type for void *) anyway once you acces the data. Usign the address space in the declaration keeps the rest of the code clean and even allows to change it lateron at a single location in the source code.
If you are after portability betwen tool-chains, rread their documentation and use macros. It is most likely you just will have to adopt the actual names of the address spaces.
Sidenote: The PIC18 example you cite actually uses the syntax for named address spaces. Just the names are deprecated, because an implementation should leave all non-standard names free for the application code. Hence the underscore-qualified names in gcc.
Disclaimer: I did not test the features, but relied on the documentation. Helpful feedback in comments appreciated.

Type punning and Unions in C

I'm currently working on a project to build a small compiler just for the heck of it.
I've decided to take the approach of building an extremely simple virtual machine to target so I don't have to worry about learning the ins and outs of elf, intel assembly, etc.
My question is about type punning in C using unions. I've decided to only support 32 bit integers and 32 bit float values in the vm's memory. To facilitate this, the "main memory" of the vm is set up like this:
typedef union
{
int i;
float f;
}word;
memory = (word *)malloc(mem_size * sizeof(word));
So I can then just treat the memory section as either an int or a float depending on the instruction.
Is this technically type punning? It certainly would be if I were to use ints as the words of memory and then use a float* to treat them like floats. My current approach, while syntactically different, I don't think is semantically different. In the end I'm still treating 32 bits in memory as either an int or a float.
The only information I could come up with online suggests that this is implementation dependent. Is there a more portable way to acheive this without wasting a bunch of space?
I could do the following, but then I would be taking up more than 2 times as much memory and "reinventing the wheel" with respect to unions.
typedef struct
{
int i;
float f;
char is_int;
}
Edit
I perhaps didn't make my exact question clear. I am aware that I can use either a float or an int from a union without undefined behavior. What I'm after is specifically a way to have a 32 bit memory location that I can safely use as an int or float without knowing what the last value set was. I want to account for the situation where the other type is used.

Yes, storing one member of union and reading another is type punning (assuming the types are sufficiently different). Moreover, this is the only kind of universal (any type to any type) type punning that is officially supported by C language. It is supported in a sense that the language promises that in this case the type punning will actually occur, i.e. that a physical attempt to read an object of one type as an object of another type will take place. Among other things it means that writing one member of the union and reading another member implies a data dependency between the write and the read. This, however, still leaves you with the burden of ensuring that the type punning does not produce a trap representation.
When you use casted pointers for type punning (what is usually understood as "classic" type punning), the language explicitly states that in general case the behavior is undefined (aside from reinterpreting object's value as an array of chars and other restricted cases). Compilers like GCC implement so called "strict aliasing semantics", which basically means that the pointer-based type punning might not work as you expect it to work. For example, the compiler might (and will) ignore the data dependency between type-punned reads and writes and rearrange them arbitrarily, thus completely ruining your intent. This
int i;
float f;
i = 5;
f = *(float *) &i;
can be easily rearranged into actual
f = *(float *) &i;
i = 5;
specifically because a strict-aliased compiler deliberately ignores the possibility of data dependency between the write and the read in the example.
In a modern C compiler, when you really need to perform physical reinterpretation of one objects value as value of another type, you are restricted to either memcpy-ing bytes from one object to another or to union-based type punning. There are no other ways. Casting pointers is no longer a viable option.

As long as you only access the member (int or float) which was most recently stored, there's no problem and no real implementation dependency. It's perfectly safe and well-defined to store a value in a union member and then read that same member.
(Note that there's no guarantee that int and float are the same size, though they are on every system I've seen.)
If you store a value in one member and then read the other, that's type punning. Quoting a footnote in the latest C11 draft:
If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). This might be a
trap representation.

What is guaranteed about the size of a function pointer?

In C, I need to know the size of a struct, which has function pointers in it. Can I be guaranteed that on all platforms and architectures:
the size of a void* is the same size as a function pointer?
the size of the function pointer does not differ due to its return type?
the size of the function pointer does not differ due to its parameter types?
I assume the answer is yes to all of these, but I want to be sure. For context, I'm calling sizeof(struct mystruct) and nothing more.

From C99 spec, section 6.2.5, paragraph 27:
A pointer to void shall have the same
representation and alignment
requirements as a pointer to a
character type. Similarly, pointers
to qualiﬁed or unqualiﬁed versions of
compatible types shall have the same
representation and alignment
requirements. All pointers to
structure types shall have the same
representation and alignment
requirements as each other. All
pointers to union types shall have the
same representation and alignment
requirements as each other. Pointers
to other types need not have the same
representation or alignment
requirements.
So no; no guarantee that a void * can hold a function pointer.
And section 6.3.2.3, paragraph 8:
A pointer to a function of one type
may be converted to a pointer to a
function of another type and back
again; the result shall compare equal
to the original pointer.
implying that one function pointer type can hold any other function pointer value. Technically, that's not the same as guaranteeing that function-pointer types can't vary in size, merely that their values occupy the same range as each other.

No, no, no.
C doesn't favour Harvard architectures with different code and data pointer sizes, because ideally when programming for such an architecture you want to store data in program memory (string literals and the like), and to do that you'd need object pointers into the code space. But it doesn't forbid them, so as far as the standard is concerned function pointers can refer to an address space which has a different size from the data address space.
However, any function pointer can be cast to another function pointer type[*] and back without trashing the value, in the same way that any object pointer can be cast to void* and back. So it would be rather surprising for function pointers to vary in size according to their signature. There's no obvious "use" for the extra space, if you have to be able to somehow store the same value in less space and then retrieve it when cast back.
[*] Thanks, schot.

In addition to the other answers, Wikipedia says this:
http://en.wikipedia.org/wiki/Function_pointer
Although function pointers in C and
C++ can be implemented as simple
addresses, so that typically
sizeof(Fx)==sizeof(void *), member
pointers in C++ are often implemented
as "fat pointers", typically two or
three times the size of a simple
function pointer, in order to deal
with virtual inheritance.
A function pointer is an abstraction. As long as the requirements of the standard are fulfilled, anything is possible. I.e. if you have less than 256 functions in your program, function pointers could be implemented by using a single byte with the value 0 for NULL and the values 1 to 255 as the index into a table with the physical addresses. If you exceed 255 functions, it could be extended to use 2 bytes.

There is a practical example of differing sizes that used to be common. In the MS-DOS & early Windows C programming, in the "medium" memory model you had 16-bit data pointers but 32-bit function pointers, and the "compact" memory model was the other way round.

Categories

HOME

reactjs

cxf

snowflake-cloud-data-p...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight