Is there a way to subtract one pointer from another in C11 and have the result be always defined?
The standard says the behavior is undefined if the result is not representable as type ptrdiff_t.
I am open to a solution relying on static assertions that are expected to pass on a reasonable implementation in a modern general purpose 32 or 64 bit environment. I would like to avoid solutions that rely on any sort of runtime checks.
If the pointed to type has size greater than 1, I can static assert size_t and ptrdiff_t to have the same number of nonpadding bits. This partial solution relies on two things I am not sure about, so any feedback on this would provide a partial answer:
It can be expected that ptrdiff_t has at most one fewer value bit than size_t in a reasonable implementation in a modern general purpose 32 or 64 bit environment.
I am correct in my understanding of the standard, in that the difference between two pointers to objects of size greater than 1 is defined, even when the same difference would be undefined if the pointers were cast to character pointers. This understanding seems inconsistent with footnote 106 in the committee draft, but it is my understanding that footnotes are not normative.
According to the Standard
You can only subtract pointers if both pointers point to the same object, which includes the "one-past-the-end" pointer.
Subtracting uintptr_t or intptr_t is not necessarily meaningful, because, again, according to the standard, there is no particular way that the conversion from pointer to integer has to be defined. In particular,
Consider far pointers in a segmented memory model, where there may be more than one way to represent a given address (segment + offset, for example, on x86).
Consider pointers with bits that are ignored by processor. (For example, the Motorola 68000 processor, which has 32-bit pointers but the top 8 bits are ignored.)
So, unfortunately, there is no way to do this portably, according to the standard.
Remember: size_t is the maximum size of an object. It is not the size of your address space. It is entirely legal for size_t to have less range that uintptr_t and friends. Same with ptrdiff_t: it is entirely legal for ptrdiff_t to have less range than uintptr_t. Imagine, for example, a segmented memory model where you cannot allocate anything larger than a segment, in this case, size_t and ptrdiff_t might be able to represent the size of a segment but not the size of your address space.
According to Practice
On the computers which you use (modern 32-bit and 64-bit computers), a uintptr_t will just contain the pointer address. Subtract away. This is implementation-defined but not undefined behavior.
Do not subtract the original pointers without casting unless they point to the same object, or to the address past that object. Compilers can and will make aliasing assumptions when you use pointer arithmetic. Not only is your program "technically" wrong, but there is a long history of compilers producing bad code here.
There is a bit of an argument going on right now about what, exactly, it means for a pointer to point to the same object, but this argument was unresolved last time I checked.
Related
From my understanding, according to C standard, casting an int pointer to int is unadvised for portable code. A simple example would be doing such a cast on a 64-bit architecture where pointers are 64 bits but integer types are 32 bits, in this case a cast would truncate information, ie an actual physical example of how things can go wrong
The same is true for casting an integer to an int pointer. However, I cannot find an example as to why exactly this is considered to be UB/implementation specific. I get that C standard advises against it, but what exactly can go wrong? The only vague example I found was somebody mentioning possible alignment issues, how exactly would those arise?
The C standard is fairly detailed in listing possible problems, C17 6.3.2.3/5:
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
So the various potential problems are:
Different sizes. This is the most obvious issue. A pointer address might not fit inside an integer, or the other way around.
Alignment. An integer containing some random number might cause a misaligned address when converted to a pointer.
Incorrect addresses, such as misaligned ones or addresses pointing into executable code rather than data, may cause implementation defined "traps"/hardware exceptions.
Far less likely, integers can in theory contain trap representations too, but that's likely only relevant for exotic/fictional one's complement and signed magnitude systems. The C standard allows for such systems, but very few such systems have actually existed in the history of computers.
Wrong type. If we lie to the compiler when converting to/from pointers and tell it that there's another type stored at a location than what's actually stored there, we might get into all manner of problems. We may screw up the compiler's internal track of what types that are stored where, so called "strict pointer aliasing violations". This in turn might cause optimization-related bugs.
What is the strict aliasing rule?
We may also, once again, cause problems with misalignment and traps, or just by the program not making any sense of what's stored at a certain location.
Pointer arithemtic on unknown addresses. There may be issues with casting to a physical address where the compiler doesn't know what's stored (no known type) and then perform pointer arithmetic from there. Because pointer arithmetic is only well-defined when pointing at an array of a known type. Strictly speaking, doing so is undefined behavior, so it might cause some poor compiler implementations to bug out and produce random behaving code. Hosted system compilers are known to do this - it's a quality of implementation problem. In particular, be very afraid of such bugs when using the gcc compiler for embedded systems programming.
Exotic pointer formats. Some systems utilize extended addressing modes, that go beyond the default address bus width. This is very common in low-end embedded systems with 8-/16 bit addresses, but also existed in the PC world back in the MS DOS days. Typically such extended addresses are using a non-standard pointer type (a common non-standard extension is the far keyword). Converting to/from these pointers types and integers will be very system-specific.
The most correct type to use for converting to/from pointer types is uintptr_t. This is defined to be "large enough" and suitable to hold the representation of a pointer.
On some exotic systems we may also use intptr_t which is the signed equivalent. That one only makes sense if the OS has some weird internal virtual addressing, such as placing kernel space at negative addresses.
Not all platforms have the same size int and int*. This can lead to truncation and alignment problems among others. It can also seemingly work without problems.
For portable behavior, it is advisable to use the fixed-width integers defined by the C99 standard in <stdint.h>.
You would use an uintptr_t as the variable to hold a pointer-address.
See this answer as well: When is casting between pointer types not undefined behavior in C?
I have been going through some posts and noticed that pointers can be different sizes according to sizeof depending on the architecture the code is compiled for and running on. Seems reasonable enough to me (ie: 4-byte pointers on 32-bit architectures, 8-byte on 64-bit, makes total sense).
One thing that surprises me is that the size of a pointer can different based on the data type it points to. I would have assumed that, on a 32-bit architecture, all pointers would be 4-bytes in size, but it turns out that function pointers can be a different size (ie: larger than what I would have expected). Why is this, in the C programming language? I found an article that explains this for C++, and how the program may have to cope with virtual functions, but this doesn't seem to apply in pure C. Also, it seems the use of "far" and "near" pointers is no longer necessary, so I don't see those entering the equation.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
Thanks!
The C standard lays down the law on what's required:
All data pointers can be converted to void* and back without loss of information.
All struct-pointers have the same representation+alignment and can thus be converted to each other.
All union-pointers have the same representation+alignment and can thus be converted to each other.
All character pointers and void pointers have the same representation+alignment.
All pointers to qualified and unqualified compatible types shall have the same representation+alignment. (For example unsigned / signed versions of the same type are compatible)
All function pointers have the same representation+alignment and can be converted to any other function pointer type and back again.
Nothing more is required.
The committee arrived at these guarantees by examining all current implementations and machines and codifying as many guarantees as they could.
On architectures where pointers are naturally word pointers instead of character pointers, you get data pointers of different sizes.
On architectures with different size code / data spaces (many micro-processors), or where additional info is needed for properly invoking functions (like itanium, though they often hide that behind a data-pointer), you get code pointers of different size from data pointers.
So, in C, what justification, standard, or documentation describes why not all pointers are the same size on the same architecture?
C11 : 6.2.5 p(28):
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
6.3.2.3 Pointers p(8):
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
This clarifies that pointers to data and pointers to functions are not of the same size.
One additional point:
Q: So, is it safe to say that, while I don't have to explicitly use the far/near keywords when defining a pointer, this is handled automatically "under the hood" by the compiler?
A: http://www.unix.com/programming/45002-far-pointer.html
It's a historical anachronism from segmented architectures such as the
8086.
Back in the days of yore there was the 8080, this was an 8 bit
processor with 16 bit address bus, hence 16 bit pointers.
Along came the 8086, in order to support some level of backward
compatiblity it adopted a segmented architecture which let use use
either 16 bit, 20 bit or 32 bit pointers depending on the day of the
week. Where a pointer was a combination of 16 bit segment register and
16 bit near offset. This lead to the rise of tiny, small, medium,
large and huge memory models with near, far and huge pointers.
Other architectures such as 68000 did not adopt this scheme and had
what is called a flat memory model.
With the 80386 and true 32 bit mode, all pointers are 32 bit, but
ironically are now really near pointers but 32 bit wide, the operating
system hides the segments from you.
I compiled this on three different platforms; the char * pointer was identical to the function pointer in every case:
CODE:
#include <stdio.h>
int main (int argc, char *argv[]) {
char * cptr = NULL;
void (*fnptr)() = NULL;
printf ("sizeof cptr=%ld, sizeof fnptr=%ld\n",
sizeof (cptr), sizeof (fnptr));
return 0;
}
RESULTS:
char ptr fn ptr
-------- ------
Win8/MSVS 2013 4 4
Debian7/i686/GCC 4 4
Centos/amd64/GCC 8 8
Some architecture support multiple kinds of address spaces. While nothing in the Standard would require that implementations provide access to all address spaces supported by the underlying platform, and indeed the Standard offers no guidance as to how such support should be provided, the ability to support multiple address spaces may make it possible for a programmer who is aware of them to write code that works much better than would otherwise be possible.
On some platforms, one address space will contain all the others, but accessing things in that address space will be slower (sometimes by 2x or more) than accessing things which are known to be in a particular part of it. On other platforms, there won't be any "master" address space, so different kinds of pointers will be needed to access things in different spaces.
I disagree with the claim that the existence of multiple address spaces should be viewed as a relic. On a number of ARM processors, it would be possible for a program to have up to 1K-4K (depending upon the exact chip) of globals which could be accessed twice as quickly as--and with less code than--"normal" global variables. I don't know of any ARM compilers that would exploit that, but there's no reason a compiler for the ARM couldn't do so.
This might be a very basic question that is already asked but I was not quite sure if the answer here Casting an int pointer to a char ptr and vice versa is applicable in my case.
So essentially I have something as follows:
void* head = sbrk(1024); //allocate 1024 bytes in heap
*((int*)(head+size)) = value; //value and size are int with valoues between 1 and 1023
I would like to know if for an arbitrary value of size the above does not work then what are the restrictions on the value of size? Does it have to be divisible by 4?
First of all, you can't do pointer arithmetic on void pointers. That code should not even compile.
For the sake of discussion, let us assume that you have a char pointer instead. Then formally, such casts followed by an access is undefined behavior. In the real world however, your code will always work if you can manually ensure alignment. You will have to ensure that the address where you write is at an aligned memory position, or there are no guarantees that the code will work.
EDIT with relevant quotes from the ISO 9899:2011 standard why pointer arithmetic on a void pointer is undefined behavior:
6.3.2.2 void
The (nonexistent) value of a void expression (an expression that has
type void) shall not be used in any way, and implicit or explicit
conversions (except to void) shall not be applied to such an
expression.
.
6.5.6 Additive operators
/--/
For addition, either both operands shall have arithmetic type, or one
operand shall be a pointer to a complete object type and the other
shall have integer type. (Incrementing is equivalent to adding 1.)
.
4 Conformance
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a
constraint or runtimeconstraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe ‘‘behavior
that is undefined’’.
Whether code violating normative text in the standard "should compile" or not can certainly be debated, but I don't think that discussion is of benefit to the OP. Simply don't write code relying on undefined behavior, ever.
Use memcpy():
memcpy((char*)head + size, &value, sizeof(value));
On many systems, in this circumstance, it is required that size be a multiple of four (subject to additional conditions detailed below, including that the size of int be four bytes on your system). On systems that do not require this, it is usually preferred.
First, the type of head is void *, and the C standard does not define what happens when you do pointer arithmetic with void *.
Some compilers, notably GCC and its heirs, will treat this arithmetic as if the type were char *. I will proceed on this basis.
Second, I am not aware of a guarantee that sbrk returns an address with any particular alignment.
Let us suppose that sbrk does return a well-aligned address, and that your C implementation does the plain thing to evaluate * (int *) (head + size) = value, which is to issue a store instruction to write the value of value (converted to an int) to the address head + size.
Then your question becomes: What does my computing platform do with an int store to this address?
As long as head + size is an address suitably aligned for int on your platform, the store will execute as expected. On most platforms, four-byte integers prefer four-byte alignment, and eight-byte integers prefer eight-byte alignment. As long as head is aligned to a multiple of this preference and size is a multiple of this preference, then the store will execute normally.
Otherwise, what happens depends on your platform. On some platforms, the hardware executes the store but may do it more slowly than normal store instructions, because it breaks it into two separate writes to memory. (This also means that other processes sharing the same memory might be able to read memory while one part of the value has been stored but the other part has not. Again, this depends on the characteristics of your computing platform.)
On some platforms, the hardware signals an exception that interrupts program execution and transfers control to the operating system. Some operating systems fix up misaligned stores by analyzing the failing instruction and executing alternate instructions that perform the intended store (or the operating system relays the exception to special code in your program, possibly in automatically included libraries, that do this fix-up work). On these platforms, misaligned stores will be very slow; they can hugely degrade the performance of a program.
On some platforms, the hardware signals an exception, and the operating system does not fix up the misaligned store. Instead, the operating system either terminates your process or sends it a signal about the problem, which often results in your process terminating. (Other possibilities include triggering a debugger or entering special code you have included in your program to handle signals.)
This question is almost a duplicate of some others I've found, but this specifically concerns POSIX, and a very common example in pthreads that I've encountered several times. I'm mostly concerned with the current state of affairs (i.e., C99 and POSIX.1-2008 or later), but any interesting historical information is of course interesting as well.
The question basically boils down to whether b will always take the same value as a in the following code:
long int a = /* some valid value */
void *ptr = (void *)a;
long int b = (long int)ptr;
I am aware that this usually works, but the question is whether it is a proper thing to do (i.e., does the C99 and/or POSIX standards guarantee that it will work).
When it comes to C99 it seems it does not, we have 6.3.2.3:
5 An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.56)
6 Any pointer type may be
converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented
in the integer type, the behavior is undefined. The result need not be
in the range of values of any integer type.
Even using intptr_t the standard seems to only guarantee that any valid void* can be converted to intptr_t and back again, but it does not guarantee that any intptr_t can be converted to void* and back again.
However it is still possible that the POSIX standard allows this.
I have no great desire to use a void* as a storage space for any variable (I find it pretty ugly even if POSIX should allow it), but I feel I have to ask because of the common example use of the pthreads_create function where the argument to start_routine is an integer, and it is passed in as void* and converted to int or long int in the start_routine function. For example this manpage has such an example (see link for full code):
//Last argument casts int to void *
pthread_create(&tid[i], NULL, sleeping, (void *)SLEEP_TIME);
/* ... */
void * sleeping(void *arg){
//Casting void * back to int
int sleep_time = (int)arg;
/* ... */
}
I've also seen a similar example in a textbook (An Introduction to Parallel Programming by Peter S. Pacheco). Considering that it seems to be a common example used by people who should know this stuff much better than me, I'm wondering if I'm wrong and this is actually a safe and portable thing to be doing.
As you say, C99 doesn't guarantee that any integer type may be converted to void* and back again without loss of information. It does make a similar guarantee for intptr_t and uintptr_t defined in <stdint.h>, but those types are optional. (The guarantee is that a void* may be converted to {u,}intptr_t and back without loss of information; there's no such guarantee for arbitrary integer values.)
POSIX doesn't appear to make any such guarantee either.
The POSIX description of <limits.h> requires int and unsigned int to be at least 32 bits. This exceeds the C99 requirement that they be at least 16 bits. (Actually, the requirements are in terms of ranges, not sizes, but the effect is that int and unsigned int must be at least 32 (under POSIX) or 16 (under C99) bits, since C99 requires a binary representation.)
The POSIX description of <stdint.h> says that intptr_t and uintptr_t must be at least 16 bits, the same requirement imposed by the C standard. Since void* can be converted to intptr_t and back again without loss of information, this implies that void* may be as small as 16 bits. Combine that with the POSIX requirement that int is at least 32 bits (and the POSIX and C requirement that long is at least 32 bits), and it's possible that a void* just isn't big enough to hold an int or long value without loss of information.
The POSIX description of pthread_create() doesn't contradict this. It merely says that arg (the void* 4th argument to pthread_create()) is passed to start_routine(). Presumably the intent is that arg points to some data that start_routine() can use. POSIX has no examples showing the usage of arg.
You can see the POSIX standard here; you have to create a free account to access it.
The focus in answers so far seems to be on the width of a pointer, and indeed as #Nico points out (and #Quantumboredom also points out in a comment), there is a possibility that intptr_t may be wider than a pointer. #Kevin's answer hints at the other important issue, but doesn't completely describe it.
Also, though I'm not sure of the exact paragraph in the standard, Harbison & Steele point out that intptr_t and uintptr_t are optional types too and may not even exist in a valid C99 implementation. OpenGroup says that XSI-conformant systems must support both types, but that means plain POSIX therefore does does not require them (at least as of the 2003 edition).
The part that's really been missed here though is that pointers need not always have a simple numerical representation that matches the internal representation of an integer. This has always been so (since K&R 1978), and I'm pretty sure POSIX is careful not to overrule this possibility either.
So, C99 does require that it be possible to convert a pointer to an intptr_t IFF that type exists, and then back to a pointer again such that the new pointer will still point at the same object in memory as the old pointer, and indeed if pointers have a non-integer representation this implies that an algorithm exists which can convert a a specific set of integer values into valid pointers. However this also means that not all integers between INTPTR_MIN and INTPTR_MAX are necessarily valid pointer values, even if the width of intptr_t (and/or uintptr_t) is exactly the same as the width of a pointer.
So, the standards cannot guarantee that any intptr_t or uintptr_t can be converted to a pointer and back to the same integer value, or even which set of integer values can survive such conversion, because they cannot possibly define all of the possible rules and algorithms for converting integer values into pointer values. Doing so even for all known architectures could still prevent the applicability of the standard to novel types of architectures yet to be invented.
(u)intptr_t are only guarateed to be large enough to hold a pointer, but they may also be "larger", which is why the C99 standard only guarantees (void*)->(u)intptr_t->(void*), but in the other case loss of data may occur (and is considered undefined).
Not sure what you mean by "always". It's not written anywhere in the standard that this is okay, but there are no systems it fails on.
If your integers are really small (say limited to 16bit) you can make it strictly conforming by declaring:
static const char dummy_base[65535];
and then passing dummy_base+i as the argument and recovering it as i=(char *)start_arg-dummy_base;
I think your answer is in the text you quoted:
If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
So, not necessarily. Say you had a 64-bit long and cast it to a void* on a 32-bit machine. The pointer is likely 32 bits, so either you lose the top 32 bits or get INT_MAX back. Or, potentially, something else entirely (undefined, as the standard says).
Rather than sending an actual pointer to a value, the value is cast to a pointer. I found these examples in the GUI interface code of a GTK program.
g_signal_connect (pastebutton[pane],
"clicked",
G_CALLBACK(on_paste_button_pressed),
(void*)((long)pane<<4));
In the above example, I am referring to the last parameter of g_signal_connect. When on_paste_button_pressed is called by GTK2, on_paste_button_pressed casts the user_data void pointer back like so:
int pane = ((long)user_data) >> 4;
Actually, I added this particular example to the code, but I based it upon what was already there. I added the bit-shifting so as to avoid warnings about casting. The program itself has four panes containing quite a number of widgets, the copy and paste buttons allow you to copy all the values from one pane to another.
Is this way of casting a value to a pointer address often used, and are there reasons why this should not be used?
edit:
The cast from an integer to a void pointer can also be achieved like so:
void* void_ptr = some_int - NULL;
It is used. It is used quite commonly when it does what's required.
One reason not to use it is that theoretically pointer size might be smaller than the source integer size.
Another reason not to use it is that it allows you to pass only one piece of integer data to the callback. If in the future you'll need to add another piece of data (or switch to a non-integer), you'll have to locate and rewrite every single place where the callback makes access to the passed data. So, if there's a chance that you'd have to extend the data in the future, it is better to create a struct (even if it holds just a single int at this time) and pass a pointer to that struct.
But if you are sure that you'll never have to pass anything other than that single integer and that your integer fits into a void *, then this technique is not in any way broken.
P.S. Pedantically speaking, neither C nor C++ appear to have the roundtrip guarantee for integer-to-void *-to-integer conversion, i.e. they don't guarantee that it will work and restore the original integral value.
You should use macros GINT_TO_POINTER() and GPOINTER_TO_INT() to cast between pointers and integers.
glib: Type Conversion Macros
Casting an integer to a pointer is used to pass a value by-value. This is the preferred way if you do need a by-reference parameter because the compiler then does not need to dereference the pointer.
The bit-shifting is a bad idea because it can cause overflows.
For really portable code, you should use intptr_t as your integer type cause it will nicely fit into a pointer.
It does see use in these kinds of cases, yes. It works on many platforms, but might fail because an arbitrary integer is not always a valid pointer value, though the shifting you do should get around that. It is also possible that a pointer cannot hold all the values that an integer can; this would be the case if you're using a 64-bit long on a platform where pointers are 32 bits, and of course since you are shifting the value, it could also fail even if pointers and integers are the same size. If you want to use this trick, you should probably check sizeof (void*) against the size of the integer type you use, and at runtime check against the actual value if the pointer isn't big enough. It's probably not worth it to make this fully portable so that you would use an actual pointer on platforms where that's needed, so either limit yourself to platforms where this trick works or abandon the trick altogether.
I find it just fine. Don't have statistics about use frequency, but do you really care?
I'm not sure I understand how bit-shifting helps, though.
It is used but it's by no means portable so you have to be careful.
The C standard does not mandate that pointer types have at least as many bits as integer types, so you may not always be able to do it.
But I can't recall any platform in which pointers have actually been smaller than integers so you're probably safe (despite not being technically safe).
The only reason I can think that the casting may be there for is to remove the possibility of alignment warnings. Section 6.3.2.3 of the C1x draft states:
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
This is from the C-FAQ:
Q: How are integers converted to and from pointers? Can I temporarily stuff an integer into a pointer, or vice versa?
A: Once upon a time, it was guaranteed that a pointer could be converted to an integer (though one never knew whether an int or a long might be required), and that an integer could be converted to a pointer, and that a pointer remained unchanged when converted to a (large enough) integer and back again, and that the conversions (and any mapping) were intended to be ``unsurprising to those who know the addressing structure of the machine.'' In other words, there is some precedent and support for integer/pointer conversions, but they have always been machine dependent, and hence nonportable. Explicit casts have always been required (though early compilers rarely complained if you left them out).
The ANSI/ISO C Standard, in order to ensure that C is widely implementable, has weakened those earlier guarantees. Pointer-to-integer and integer-to-pointer conversions are implementation-defined (see question 11.33), and there is no longer any guarantee that pointers can be converted to integers and back, without change.
Forcing pointers into integers, or integers into pointers, has never been good practice. When you need a generic slot that can hold either kind of data, a union is a much better idea.
See also questions 4.15, 5.18, and 19.25.
References: K&R1 Sec. A14.4 p. 210
K&R2 Sec. A6.6 p. 199
ISO Sec. 6.3.4
Rationale Sec. 3.3.4
H&S Sec. 6.2.3 p. 170, Sec. 6.2.7 pp. 171-2