Can I assign any integer value to a pointer variable directly? - c

Since addresses are numbers and can be assigned to a pointer variable, can I assign any integer value to a pointer variable directly, like this:
int *pPtr = 60000;

You can, but unless you're developing for an embedded device with known memory addresses with a compiler that explicitly allows it, attempting to dereference such a pointer will invoke undefined behavior.
You should only assign the address of a variable or the result of a memory allocation function such as malloc, or NULL.

Yes you can.
You should only assign the address of a variable or the result of a memory allocation function such as malloc, or NULL.

According to the pointer conversion rules, e.g. as described in this online c++ standard draft, any integer may be converted to a pointer value:
6.3.2.3 Pointers
(5) An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not
be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.
Allowing the conversion does, however, not mean that you are allowed to dereference the pointer then.

You can but there's a lot of considerations.
1) What does that mean?
The only really useful abstraction when this actually gets used is that you need to access a specific memory location because something is mapped to a specific point, generally hardware control registers (less often: a specific area in flash or from the linker table). The fact that you are assigning 60000 (a decimal number rather than a hexadecimal address or a symbolic mnemonic) makes me quite worried.
2) Do you have "odd" pointers?
Some microcontrollers have pointers with strange semantics (near vs far, tied to a specific memory page, etc.) You may have to do odd things to make the pointer make sense. In addition, some pointers can do strange things depending upon where they point. For example, the PIC32 series can point to the exact same data but with different upper bits that will retrieve a cached copy or an uncached copy.
3) Is that value the correct size for the pointer?
Different architectures need different sizes. The newer data types like intptr_t are meant to paper over this.

Related

What's behind NULL?

In many C implementations (but not all), NULL corresponds to a process's logical address 0. When we try to dereference NULL, we get a segmentation fault because NULL "maps" to a memory segment that is neither readable nor writable.
Are there systems with real, physical addresses that correspond to the virtual NULL address? If so, what might be stored there? Or is NULL purely a virtual address with no physical counterpart?
NULL is implementation defined.
It is a macro which expands to an implementation defined null pointer constant.
"it depends" #I3x
Is there a real, physical address that corresponds to the virtual NULL address?
On some platforms: yes, on others: no. It is implementation defined.
If so, what might be stored there?
On some platforms: a value, maybe 0. This value may/may not be accessible. On others: there is simple no access. It is implementation defined.
Or is NULL purely a virtual address with no physical counterpart?
On some platforms: it is a virtual address with no counterpart. On others it may be a real address. It is implementation defined.
From C-standard paper ISO/IEC 9899:201x (C11) §6.3.2.3 Pointers:
An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer
constant. If a null pointer constant is converted to a pointer
type, the resulting pointer, called a null pointer, is guaranteed to
compare unequal to a pointer to any object or function
This means that a NULL pointer should give a result TRUE if compared to a 0 value, and also that it will resolve to a value=0 if converted to an arithmetic type.
But it also mean that the compiler must generate a value for the null pointer that is guaranteed not to be an address existing (compare equal) in the code running environment (this should be more standardish).
The scope is to make the code portable imposing that each operation on NULL pointer must give same result on any CPU or architecture.
Note: on practical side to guarantee that the NULL pointer doesn't equals any address in the running environment there are two ways. The first will apply to those platforms where invalid address exists, getting advantage of such invalid addresses. The second is to conventionally reserve an address for the scope, typically 0. The latter means that the platform, the OS mainly, have to take care to make that address invalid in the sense that it isn't usable and then can never equal any other pointer, else there will always be an object that can legally use that address and consequently make valid its address comparison to the NULL pointer. This point is important because missing any infrastructure, as an OS could be, that takes care of the invalid value we can have a conflicting condition. An example is when using C compiler for base programming, OS and kernel, if the processor use the memory from address 0 to hold interrupt service addresses a pointer to the object will have a value of 0. In this case the compiler should use a different value of NULL pointer, but if the OS can access the whole addressing space. to allow full Memory Management, there will be no value applicable for NULL pointer.
On user or kernel HL code this is not a problem because there arealways some addresses that are normally not allowed and can then be used for the NULL pointer value.
Yes, most (all?) systems have a real physical address zero, it is just inaccessible from a process running behind a virtual address-space which remaps virtual addresses to (different) physical addresses. In those processes the address zero is simply left unmapped to anything, just as most of the rest of their address space.
However, as a matter of fact, all the x86 based processors till this day boot into the 'real mode' at first. This is a 8086 compatibility mode. Under this processor mode, not only there is a physical addressable address zero, but this address is actually where the interrupt vector table located at, and this is where the processor reads from when it handles interrupts.
Certainly on some computers, there is actual memory at address 0. On others, there may not be. There are even some hardware designs for which the answer may depend on the temporary state of a memory-mapping chip or chipset.
But in all cases, dereferencing a NULL pointer is undefined behavior.
In many C implementations (but not all), NULL corresponds to a process's logical address 0.
No, that is the wrong way to look at it. Null pointers do not correspond to anything other than themselves, by definition, and C in no way ties null pointers to any particular address, in any sense of the term.
At best, you could say that some implementations' representations of null pointers have the form that a representation of a pointer to an object residing at machine address 0 would have. Since null pointers compare unequal to every pointer to an object or function, however, such a situation requires that it be impossible for any object or function that the C program can reference to reside at that address.
When we try to dereference NULL, we get a segmentation fault
That is one of the multitude of observed behaviors that are possible when you invoke undefined behavior. Empirically, it is one of the more likely to be observed when undefined behavior is invoked in the particular way you describe. It is by no means guaranteed by C, and indeed, there are widely-used implementations which, empirically, exhibit different behavior.
because NULL "maps" to a memory segment that is neither readable nor writable.
But there you're delving deep into implementation details.
Is there a real, physical address that corresponds to the virtual NULL address?
No, because a null pointer is not an address. It is a pointer value that is explicitly invalid to dereference. Technically, although C refers to some pointer values as "addresses", it neither requires nor depends on any pointer value to be a machine address, neither a physical nor a virtual one. The value of a null pointer, however, is not an address even in C's sense of the term. The & operator never produces this value, and it is invalid to dereference -- it explicitly fails to correspond to any object or function.
If so, what might be stored there? Or is NULL purely a virtual address with no physical counterpart?
Although it is indeed common for null pointer values to be represented in the way that a pointer to an object / function residing at machine address 0 would be, this is irrelevant, because in that case a C program can never access anything residing at that address. Thus, as far as the program is concerned, nothing resides there. As far as the hardware and OS kernel are concerned, on the other hand, external programs' view of memory is largely irrelevant. And if one nevertheless insists on considering the kernel's view of what is stored in physical memory at an address derived from an external program's null pointer representation, it is highly system-dependent.

C standard compliant way to access null pointer address?

In C, deferencing the null pointer is Undefined Behavior, however the null pointer value has a bit representation that in some architectures make it points to a valid address (e.g the address 0).
Let's call this address the null pointer address, for the sake of clarity.
Suppose I want to write a piece of software in C, in an environment with unrestrained access to memory. Suppose further I want to write some data at the null pointer address: how would I achieve that in a standard compliant way?
Example case (IA32e):
#include <stdint.h>
int main()
{
uintptr_t zero = 0;
char* p = (char*)zero;
return *p;
}
This code when compiled with gcc with -O3 for IA32e gets transformed into
movzx eax, BYTE PTR [0]
ud2
due to UB (0 is the bit representation of the null pointer).
Since C is close to low level programming, I believe there must be a way to access the null pointer address and avoid UB.
Just to be clear
I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way.
I know the answer for the latter.
I read (part of) the C99 standard to clear my mind. I found the sections that are of interest for my own question and I'm writing this as a reference.
DISCLAIMER
I'm an absolute beginner, 90% or more of what I have written is wrong, makes no sense, or may break you toaster. I also try to make a rationale out of the standard, often with disastrous and naive results (as stated in the comment).
Don't read.
Consult #Olaf, for a formal and professional answer.
For the following, the term architectural address designed a memory address as seen by the processor (logical, virtual, linear, physical or bus address). In other word the addresses that you would use in assembly.
In section 6.3.2.3. it reads
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
and regarding integer to pointer conversion
An integer may be converted to any pointer type. Except as previously specified [i.e. for the case of null pointer constant], the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation†.
These imply that the compiler, to be compliant, need only to implement a function int2ptr from integer to pointers that
int2ptr(0) is, by definition, the null pointer.
Note that int2ptr(0) is not mandated to be 0. It can be any bit representation.
*int2ptr(n != 0) has no constraints.
Note that this means that int2ptr needs not to be the identity function, nor a function that return valid pointers!
Given the code below
char* p = (char*)241;
The standard makes absolute no guarantee that the expression *p = 56; will write to the architectural address 241.
And so it gives no direct way to access any other architectural address (including int2ptr(0), the address designed by a null pointer, if valid).
Simply put the standard does not deal with architectural addresses, but with pointers, their comparison, conversions and their operations‡.
When we write code like char* p = (char*)K we are not telling the compiler to make p point to the architectural address K, we are telling it to make a pointer out of the integer K, or in other term to make p point to the (C abstract) address K.
Null pointer and the (architectural) address 0x0 are not the same (cit.) and so is true for any other pointer made from the integer K and the (architectural) address K.
For some reasons, childhood heritages, I thought that integer literals in C could be used to express architectural addresses, instead I was wrong and that only happen to be (sort of) correct in the compilers I was using.
The answer to my own question is simply: There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one1.
Note about return *(volatile char*)0;
The standard says that
If an
invalid value [a null pointer value is an invalid value] has been assigned to the pointer, the behavior of the unary * operator is undefined.
and that
Therefore any expression referring
to such an [volatile] object shall be evaluated strictly according to the rules of the abstract machine.
The abstract machine says that * is undefined for null pointer values, so that code shouldn't differ from this one
return *(char*)0;
which is also undefined.
Indeed they don't differ, at least with GCC 4.9, both compile to the instructions stated in my question.
The implementation defined way to access the 0 architectural address is, for GCC, the use of the -fno-isolate-erroneous-paths-dereference flag which produces the "expected" assembly code.
†The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
‡Unfortunately it says that the & yields the address of its operand, I believe this is a bit improper, I would say that it yields a pointer to its operand. Consider a variable a that is known to resides at address 0xf1 in a 16 bit address space and consider a compiler that implements int2ptr(n) = 0x8000 | n. &a would yield a pointer whose bit representation is 0x80f1 which is not the address of a.
1Which was special to me because it was the only one, in my implementations, that couldn't be accessed.
As OP has correctly concluded in her answer to her own question:
There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one.
However, a situation where one would want to access memory directly is likely one where a custom linker script is employed. (I.e. some kind of embedded systems stuff.) So I would say, the standard compliant way of doing what OP asks would be to export a symbol for the (architectural) address in the linker script, and not bother with the exact address in the C code itself.
A variation of that scheme would be to define a symbol at address zero and simply use that to derive any other required address. To do that add something like the following to the SECTIONS portion of the linker script (assuming GNU ld syntax):
_memory = 0;
And then in your C code:
extern char _memory[];
Now it is possible to e.g. create a pointer to the zero address using for example char *p = &_memory[0]; (or simply char *p = _memory;), without ever converting an int to a pointer. Similarly, int addr = ...; char *p_addr = &_memory[addr]; will create a pointer to the address addr without technically casting an int to a pointer.
(This of course avoids the original question, because the linker is independent from the C standard and C compiler, and every linker might have a different syntax for their linker script. Also, the generated code might be less efficient, because the compiler is not aware of the address being accessed. But I think this still adds an interesting perspective to the question, so please forgive the slightly off-topic answer..)
Whatever solution is going to be implementation-dependent. Needfully. ISO C does not describe the environment a C programs runs on; rather, what a conforming C program looks like among a variety of environments («data-processing systems»). The Standard cannot indeed guarantee what you would get by accessing an address that is not an array of objects, i.e. something you visibly allocated, not the environment.
Therefore, I would use something the standard leaves as implementation-defined (and even as conditionally-supported) rather than undefined behavior*: Inline assembly. For GCC/clang:
asm volatile("movzx 0, %%eax;") // *(int*)0;
It also worth mentioning freestanding environments, the one you seem to be in. The standard says about this execution model (emphasis mine):
§ 5.1.2
Two execution environments are defined: freestanding and hosted. [...]
§ 5.1.2.1, comma 1
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined. [...]
Notice it doesn't say you can access any address at will.
Whatever that could mean. Things are a bit different when you are the implementation the standard delegates control to.
All quotes are from the draft N. 1570.
The C Standard does not require that implementations have addresses that resemble integers in any way shape or form; all it requires is that if types uintptr_t and intptr_t exist, the act of converting a pointer to uintptr_t or intptr_t will yield a number, and converting that number directly back to the same type as the original pointer will yield a pointer equal to the original.
While it is recommended that platforms which use addresses that resemble integers should define conversions between integers and addresses in a fashion that would be unsurprising to someone familiar with such mapping, that is not a requirement, and code relying upon such a recommendation would not be strictly conforming.
Nonetheless, I would suggest that if a quality implementation specifies that it performs integer-to-pointer conversion by a simple bitwise mapping, and if there may be plausible reasons why code would want to access address zero, a it should regard statements like:
*((uint32_t volatile*)0) = 0x12345678;
*((uint32_t volatile*)x) = 0x12345678;
as a request to write to address zero and address x, in that order even if
x happens to be zero, and even if the implementation would normally trap on
null pointer accesses. Such behavior isn't "standard", insofar as the
Standard says nothing about the mapping between pointers and integers, but
a good quality implementation should nonetheless behave sensibly.
I'm assuming the question you are asking is:
How do I access memory such that a pointer to that memory has the same representation as the null pointer?
According to a literal reading of the Standard, this is not possible. 6.3.2.3/3 says that any pointer to an object must compare unequal to the null pointer.
Therefore this pointer we are talking about must not point to an object. But the deference operator *, applied to an object pointer, only specifies the behaviour in the case that it points to an object.
Having said that, the object model in C has never been specified rigorously, so I would not put too much weight into the above interpretation. Nevertheless, it seems to me that whatever solution you come up with is going to have to rely on non-standard behaviour from whichever compiler is in use.
We see an example of this in the other answers in which gcc's optimizer detects an all-bits-zero pointer at a late stage of processing and flags it as UB.

Pointers in C: Data type or operation

Would we call the pointers in C a data type (of type void*) or just an operation done on an unsigned integer which makes its value get interpreted as a virtual memory address ? If it is a data type, which hardware and software factors impact its range ?
I think you are mixing two things together - Indirection operator and pointer variables.
Indirection operator (*) - The indirection operator is a unary operator that can be used to obtain the value stored at the memory location referenced by a pointer variable.
pointer variables - In C pointers are variables of specific type (e.g. int,char, void) that store addresses and can be null.
The size of pointers depends upon different factors. You can go to below link. They have explained it well.
What is the size of a pointer? What exactly does it depend on?
C thinks your whole computer is one massive array of bytes. Obviously this isn't very useful, but then C layers on top of this massive array of bytes the concept of types and sizes of those types.
Creating a block of memory inside your computer.
"Pointing" the name ptr at the beginning of that block().
This indirectly means it needs a type specifier which will be same as the data type it is pointing to.
Yes they are data types. Like void *.
intptr_t integers can store any pointers except function pointers. So you can think of such pointers to data objects as special integers. (But remember integer and pointer are different types.)
Your compiler generates code for your target environment. E.g. if you compile with the option -m32 then pointer size is probably 32 bits. It can be run on 32-bit architecture where the CPU register can store 32-bit memory address. Although 64-bit architecture may still support 32-bit application, you can use the option -m64 for compilation to benefit from bigger pointers (probably 64 bits).
You're asking about a lot of ideas. I'm going to try and cite the C standard as much as possible to give the official idea, then break it down into something digestible.
Would we call the pointers in C a data type (of type void *) or just an operation done on an unsigned integer which makes its value get interpreted as a virtual memory address?
From the C standard:
6.2.5 Types
A pointer type may be derived from a function type, an object type, or an incomplete type, called the referenced type. A pointer type describes an object whose value provides a reference to an entity of the referenced type. A pointer type derived from the referenced type T is sometimes called ‘‘pointer to T’’. The construction of a pointer type from a referenced type is called ‘‘pointer type derivation’’.
6.3.2.3 Pointers
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
A pointer isn't a data type (at least in the traditional sense of the phrase) or an integer - it's a C programmer's way of denoting the location of a certain object. While we tend to think of that as being a number - i.e., 0xf000b3a8 - there's no reason it has to be a number. It just has to mark the location of a particular object. Furthermore, the type of the object it points to dictates its own pointer type. So a pointer with type of "float *" is equivalent to "pointer to float". It is not a float type - it's a pointer type.
For a concrete example of this, think about how you organize your bedroom - perhaps you have a bed, a desk, a set of drawers, a closet, and a bedside table. Before you go to bed, you pull out a sticky note and write "Remember headphones on desk". You then put that sticky note on your backpack. That sticky note is a pointer to your headphones! It represents a reminder to you of where those headphones are located - your desk, not a number.
The reason we tend to think of pointers as integers is from sections 6.3.2.3.5 and 6.3.2.3.6. Computers - as they're currently designed - don't understand the concept of "a desk" or "a closet". They only understand numbers - so we organize them accordingly. As a result, the C standard dictates that integers must be able to be converted into pointers, and vice-versa.
If it is a data type, which hardware and software factors impact its range?
Again, a pointer isn't really a data type - it denotes the location of a certain object (that in turn has a certain type that dictates the pointer's type). So we can't really talk about the range of a pointer, per se. What we can talk about is the range of possible memory addresses (and thus the range of possible locations that a pointer could point to!).
Back in the day, when dinosaurs roamed the earth and Nixon was in office (I'm a young'un, what can I say), Intel came out with the Intel 8008 microprocessor1. This puppy was the world's first 8-bit CPU - that is, it could perform mathematical operations on 8-bit values. Assuming 2's complement notation, this allows unsigned values of 0 to 28-1 (0 to 255) and signed values of -27 to 27-1 (-128 to 127). Furthermore, it had an external 14-bit address bus that could address up to 16KB of memory. This equates to 214 (16384) memory locations, each containing 8 bits. The full range is thus from byte 0 to byte 16383.
Fast-forward to today. Most modern desktop and laptop PCs run on 64-bit processors. These support (theoretically) 264 bytes of memory - a whopping 18,446,744,073,709,551,616 locations. This equates to approximately 16 exabytes - where one exabyte is 1,000,000 terabytes. (It's speculated that Google stores approximately this much data in their datacenters). Currently, however, most architectures only use the lower 48-bits2.
These represent the total number of physical locations that a pointer in C could theoretically point to. In reality, because of how virtual memory works, the range that you as a programmer will see is far more limited. That's a story for another day, however.
A pointer is a variable whose value is the address of another variable. Generally speaking, a pointer in C can point to any address within the virtual memory, and NULL.
A pointer is generally a variable whose value is the address of another variable, e.g. direct address of the memory location. Like any variable or constant, you must declare a pointer before you can use it to store any variable address.
The actual data type of the value of all pointers, whether integer, float, character, or otherwise, is the same, a long hexadecimal number that represents a memory address.
e.g: int *ip; /* pointer to an integer */
The run-time representation of a pointer value is typically a raw memory address (perhaps augmented by an offset-within-word field), but since a pointer's type includes the type of the thing pointed to, expressions including pointers can be type-checked at compile time. Pointer arithmetic is automatically scaled by the size of the pointed-to data type.
AS for void pointers (void *) point to objects of unspecified type, and can therefore be used as "generic" data pointers. Since the size and type of the pointed-to object is not known, void pointers cannot be dereferenced, nor is pointer arithmetic on them allowed, although they can easily be (and in many contexts implicitly are) converted to and from any other object pointer type.
See wiki

What is the result of NULL + int?

I have seen the following macro being used in OpenGL VBO implementations:
#define BUFFER_OFFSET(i) ((char *)NULL + (i))
//...
glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(x));
Could you provide a little detail on how this macro works? Can it be replaced with a function?
More exactly, what is the result of incrementing a NULL pointer?
Let's take a trip back through the sordid history of OpenGL. Once upon a time, there was OpenGL 1.0. You used glBegin and glEnd to do drawing, and that was all. If you wanted fast drawing, you stuck things in a display list.
Then, somebody had the bright idea to be able to just take arrays of objects to render with. And thus was born OpenGL 1.1, which brought us such functions as glVertexPointer. You might notice that this function ends in the word "Pointer". That's because it takes pointers to actual memory, which will be accessed when one of the glDraw* suite of functions is called.
Fast-forward a few more years. Now, graphics cards have the ability to perform vertex T&L on their own (up until this point, fixed-function T&L was done by the CPU). The most efficient way to do that would be to put vertex data in GPU memory, but display lists are not ideal for that. Those are too hidden, and there's no way to know whether you'll get good performance with them. Enter buffer objects.
However, because the ARB had an absolute policy of making everything as backwards compatible as possible (no matter how silly it made the API look), they decided that the best way to implement this was to just use the same functions again. Only now, there's a global switch that changes glVertexPointer's behavior from "takes a pointer" to "takes a byte offset from a buffer object." That switch being whether or not a buffer object is bound to GL_ARRAY_BUFFER.
Of course, as far as C/C++ is concerned, the function still takes a pointer. And the rules of C/C++ do not allow you to pass an integer as a pointer. Not without a cast. Which is why macros like BUFFER_OBJECT exist. It's one way to convert your integer byte offset into a pointer.
The (char *)NULL part simply takes the NULL pointer (which is usually a void* in C and the literal 0 in C++) and turns it into a char*. The + i just does pointer arithmetic on the char*. Because the null pointer usually has a zero address, adding i to it will increment the byte offset by i, thus generating a pointer who's value is the byte offset you passed in.
Of course, the C++ specification lists the results of BUFFER_OBJECT as undefined behavior. By using it, you're really relying on the compiler to do something reasonable. After all, NULL does not have to be zero; all the specification says is that it is an implementation-defined null pointer constant. It doesn't have to have the value of zero at all. On most real systems, it will. But it doesn't have to.
That's why I just use a cast.
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 0, (void*)48);
It's not guaranteed behavior either way (int->ptr->int conversions are conditionally supported, not required). But it's also shorter than typing "BUFFER_OFFSET". GCC and Visual Studio seem to find it reasonable. And it doesn't rely on the value of the NULL macro.
Personally, if I were more C++ pedantic, I'd use a reinterpret_cast<void*> on it. But I'm not.
Or you can ditch the old API and use glVertexAttribFormat et. al., which is better in every way.
#define BUFFER_OFFSET(i) ((char *)NULL + (i))
Technically the result of this operation is undefined, and the macro actually wrong. Let me explain:
C defines (and C++ follows it), that pointers can be casted to integers, namely of type uintptr_t, and that if the integer obtained that way, casted back into the original pointer type it came from, would yield the original pointer.
Then there's pointer arithmetic, which means if I have two pointers pointing so the same object I can take the difference of them, resulting in a integer (of type ptrdiff_t), and that integer added or subtracted to either of the original pointers, will yield the other. It is also defines, that by adding 1 to a pointer, the pointer to the next element of an indexed object is yielded. Also the difference of two uintptr_t, divided by sizeof(type pointed to) of pointers of the same object must be equal to the pointers themself being subtracted. And last but not least, the uintptr_t values may be anything. They could be opaque handles as well. They're not required to be the addresses (though most implementations do it that way, because it makes sense).
Now we can look at the infamous null pointer. C defines the pointer which is casted to for from type uintptr_u value 0 as the invalid pointer. Note that this is always 0 in your source code. On the backend side, in the compiled program, the binary value used for actually representing it to the machine may be something entirely different! Usually it is not, but it may be. C++ is the same, but C++ doesn't allow for as much implicit casting than C, so one must cast 0 explicitly to void*. Also because the null pointer does not refer to an object and therefore has no dereferenced size pointer arithmetic is undefined for the null pointer. The null pointer referring to no object also means, there is no definition for sensibly casting it to a typed pointer.
So if this is all undefined, why does this macro work after all? Because most implementations (means compilers) are extremely gullible and compiler coders lazy to the highest degree. The integer value of a pointer in the majority of implementations is just the value of the pointer itself on the backend side. So the null pointer is actually 0. And although pointer arithmetic on the null pointer is not checked for, most compilers will silently accept it, if the pointer got some type assigned, even if it makes no sense. char is the "unit sized" type of C if you want to say so. So then pointer arithmetic on cast is like artihmetic on the addresses on the backend side.
To make a long story short, it simply makes no sense to try doing pointer magic with the intended result to be a offset on the C language side, it just doesn't work that way.
Let's step back for a moment and remember, what we're actually trying to do: The original problem was, that the gl…Pointer functions take a pointer as their data parameter, but for Vertex Buffer Objects we actually want to specify a byte based offset into our data, which is a number. To the C compiler the function takes a pointer (a opaque thing as we learned). The correct solution would have been the introduction of new functions especially for the use with VBOs (say gl…Offset – I think I'm going to ralley for their introduction). Instead what was defined by OpenGL is a exploit of how compilers work. Pointers and their integer equivalent are implemented as the same binary representation by most compilers. So what we have to do, it making the compiler call those gl…Pointer functions with our number instead of a pointer.
So technically the only thing we need to do is telling to compiler "yes, I know you think this variable a is a integer, and you are right, and that function glVertexPointer only takes a void* for it's data parameter. But guess what: That integer was yielded from a void*", by casting it to (void*) and then holding thumbs, that the compiler is actually so stupid to pass the integer value as it is to glVertexPointer.
So this all comes down to somehow circumventing the old function signature. Casting the pointer is the IMHO dirty method. I'd do it a bit different: I'd mess with the function signature:
typedef void (*TFPTR_VertexOffset)(GLint, GLenum, GLsizei, uintptr_t);
TFPTR_VertexOffset myglVertexOffset = (TFPTR_VertexOffset)glVertexPointer;
Now you can use myglVertexOffset without doing any silly casts, and the offset parameter will be passed to the function, without any danger, that the compiler may mess with it. This is also the very method I use in my programs.
openGL vertex attribute data is assigned through the same function (glVertexAttribPointer) as either pointers in memory or offsets located within a Vertex Buffer Object depending on context.
the BUFFER_OFFSET() macro appears to convert an integer byte offset into a pointer simply to allow the compiler to pass it as a pointer argument safely.
The "(char*)NULL+i" expresses this conversion through pointer-arithmetic; the result should be the same bit-pattern assuming sizeof(char)==1, without which, this macro would fail.
it would also be possible through simple re-casting, but the macro might make it stylistically clearer what is being passed; it would also be a convenient place to trap overflows for 32/64bit safety/futureproofing
struct MyVertex { float pos[3]; u8 color[4]; }
// general purpose Macro to find the byte offset of a structure member as an 'int'
#define OFFSET(TYPE, MEMBER) ( (int)&((TYPE*)0)->MEMBER)
// assuming a VBO holding an array of 'MyVertex',
// specify that color data is held at an offset 12 bytes from the VBO start, for every vertex.
glVertexAttribPointer(
colorIndex,4, GL_UNSIGNED_BYTE, GL_TRUE,
sizeof(MyVertex),
(GLvoid*) OFFSET(MyVertex, color) // recast offset as pointer
);
That's not "NULL+int", that's a "NULL cast to the type 'pointer to char'", and then increments that pointer by i.
And yes, that could be replaced by a function - but if you don't know what it does, then why do you care about that? First understand what it does, then consider if it would be better as a function.

Is casting an integer value to a void* an often used paradigm in callbacks?

Rather than sending an actual pointer to a value, the value is cast to a pointer. I found these examples in the GUI interface code of a GTK program.
g_signal_connect (pastebutton[pane],
"clicked",
G_CALLBACK(on_paste_button_pressed),
(void*)((long)pane<<4));
In the above example, I am referring to the last parameter of g_signal_connect. When on_paste_button_pressed is called by GTK2, on_paste_button_pressed casts the user_data void pointer back like so:
int pane = ((long)user_data) >> 4;
Actually, I added this particular example to the code, but I based it upon what was already there. I added the bit-shifting so as to avoid warnings about casting. The program itself has four panes containing quite a number of widgets, the copy and paste buttons allow you to copy all the values from one pane to another.
Is this way of casting a value to a pointer address often used, and are there reasons why this should not be used?
edit:
The cast from an integer to a void pointer can also be achieved like so:
void* void_ptr = some_int - NULL;
It is used. It is used quite commonly when it does what's required.
One reason not to use it is that theoretically pointer size might be smaller than the source integer size.
Another reason not to use it is that it allows you to pass only one piece of integer data to the callback. If in the future you'll need to add another piece of data (or switch to a non-integer), you'll have to locate and rewrite every single place where the callback makes access to the passed data. So, if there's a chance that you'd have to extend the data in the future, it is better to create a struct (even if it holds just a single int at this time) and pass a pointer to that struct.
But if you are sure that you'll never have to pass anything other than that single integer and that your integer fits into a void *, then this technique is not in any way broken.
P.S. Pedantically speaking, neither C nor C++ appear to have the roundtrip guarantee for integer-to-void *-to-integer conversion, i.e. they don't guarantee that it will work and restore the original integral value.
You should use macros GINT_TO_POINTER() and GPOINTER_TO_INT() to cast between pointers and integers.
glib: Type Conversion Macros
Casting an integer to a pointer is used to pass a value by-value. This is the preferred way if you do need a by-reference parameter because the compiler then does not need to dereference the pointer.
The bit-shifting is a bad idea because it can cause overflows.
For really portable code, you should use intptr_t as your integer type cause it will nicely fit into a pointer.
It does see use in these kinds of cases, yes. It works on many platforms, but might fail because an arbitrary integer is not always a valid pointer value, though the shifting you do should get around that. It is also possible that a pointer cannot hold all the values that an integer can; this would be the case if you're using a 64-bit long on a platform where pointers are 32 bits, and of course since you are shifting the value, it could also fail even if pointers and integers are the same size. If you want to use this trick, you should probably check sizeof (void*) against the size of the integer type you use, and at runtime check against the actual value if the pointer isn't big enough. It's probably not worth it to make this fully portable so that you would use an actual pointer on platforms where that's needed, so either limit yourself to platforms where this trick works or abandon the trick altogether.
I find it just fine. Don't have statistics about use frequency, but do you really care?
I'm not sure I understand how bit-shifting helps, though.
It is used but it's by no means portable so you have to be careful.
The C standard does not mandate that pointer types have at least as many bits as integer types, so you may not always be able to do it.
But I can't recall any platform in which pointers have actually been smaller than integers so you're probably safe (despite not being technically safe).
The only reason I can think that the casting may be there for is to remove the possibility of alignment warnings. Section 6.3.2.3 of the C1x draft states:
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
This is from the C-FAQ:
Q: How are integers converted to and from pointers? Can I temporarily stuff an integer into a pointer, or vice versa?
A: Once upon a time, it was guaranteed that a pointer could be converted to an integer (though one never knew whether an int or a long might be required), and that an integer could be converted to a pointer, and that a pointer remained unchanged when converted to a (large enough) integer and back again, and that the conversions (and any mapping) were intended to be ``unsurprising to those who know the addressing structure of the machine.'' In other words, there is some precedent and support for integer/pointer conversions, but they have always been machine dependent, and hence nonportable. Explicit casts have always been required (though early compilers rarely complained if you left them out).
The ANSI/ISO C Standard, in order to ensure that C is widely implementable, has weakened those earlier guarantees. Pointer-to-integer and integer-to-pointer conversions are implementation-defined (see question 11.33), and there is no longer any guarantee that pointers can be converted to integers and back, without change.
Forcing pointers into integers, or integers into pointers, has never been good practice. When you need a generic slot that can hold either kind of data, a union is a much better idea.
See also questions 4.15, 5.18, and 19.25.
References: K&R1 Sec. A14.4 p. 210
K&R2 Sec. A6.6 p. 199
ISO Sec. 6.3.4
Rationale Sec. 3.3.4
H&S Sec. 6.2.3 p. 170, Sec. 6.2.7 pp. 171-2

Resources