Does C have more macros like NULL? - c

Background:
When deleting a cell in a hash table that uses linear probing you have to indicate that a value once existed at that cell but can be skipped during a search. The easiest way to solve this is to add another variable to store this information, but this extra variable can be avoided if an guaranteed invalid memory address is known and is used to represent this state.
Question:
I assume that since 0 is a guaranteed invalid memory address (more often than not), there must be more than just NULL. So my question is, does C provide a standard macro for any other guaranteed invalid memory addresses?

Technically, NULL is not guaranteed to be invalid. It is only guaranteed not to be the address of any object (C11 6.3.2.3:3):
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant(66). If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
(66) The macro NULL is defined in (and other headers) as a null pointer constant
Your usage does not require the special address value to be invalid either: obviously, you are not accessing it, unless segfaulting is part of the normal behavior of your program.
So you could use the addresses of as many objects as you like, as long as the addresses of these objects are not intended to be part of the normal contents of a cell.
For instance, for an architecture where converting between pointers to objects preserve the representation, you could use:
char a, b, …;
#define NULL1 (&a)
#define NULL2 (&b)
…

Strictly speaking, NULL is not required to be numerically zero at runtime. C translates 0 and NULL, in a pointer context, into an implementation-defined invalid address. That address is often numerically zero, but that is not guaranteed by the C standard. To the best of my knowledge, C itself does not provide any invalid addresses guaranteed to be distinct from NULL.

You can also create your own 'invalid' address pointer:
const void* const SOME_MARKER = (void*) &x;
If you make sure that x (or its address) can never be actually used where you want to use SOME_MARKER you should be safe and 100% portable.

Related

Does this avoid UB

This question is more of an academic one, seeing as there is no valid reason to write your own offsetof macro anymore. Nevertheless, I've seen this home-grown implementation pop-up here and there:
#define offsetof(s, m) ((size_t) &(((s *)0)->m))
Which is, technically speaking, dereferencing a NULL pointer (AFAIKT):
C11(ISO/IEC 9899:201x) §6.3.2.3 Pointers Section 3
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant
So the above implementation is, according to how I read the standard, the same as writing:
#define offsetof(s, m) ((size_t) &(((s *)NULL)->m))
It does make me wonder that, by changing one tiny detail, the following definition of offsetof would be completely legal, and reliable:
#define offsetof(s, m) (((size_t)&(((s *) 1)->m)) - 1)
Seeing as, instead of 0, 1 is used as a pointer, and I subtract 1 at the end, the result should be the same. I'm no longer using a NULL pointer. As far as I can tell the results are the same.
So basically: is there any reason why using 1 instead of 0 in this offsetof definition might not work? Can it still cause UB in certain cases, and if so: when and how? Basically, what I'm asking here is: Am I missing anything here?
Both definitions are undefined behavior: in the first definition a null pointer is dereferenced and in your second definition you are dereferencing an invalid pointer (the pointer does not point to a valid object). It is not possible in C to write a portable version of offsetof macro.
Defect Report #44 says:
"In particular, this is why the offsetof macro exists: there was otherwise no portable means to compute such translation-time constants."
(DR#44 is for C89 but nothing has changed in the language in C99 and C11 that would allow a portable implementation.)
I believe the behaviour is implementation-defined. In 6.3.2.3 of n1256:
5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
One problem is that your created pointer does not point to an object.
6.2.4 Storage durations of objects
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, 33) and retains
its last-stored value throughout its lifetime. 34) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
and
J.2 Undefined behaviour
- The value of a pointer to an object whose lifetime has ended is used (6.2.4).
3.19.2 indeterminate value: either an unspecified value or a trap representation
When you convert 1 to a pointer, and the created pointer does not point to an object, the value of the pointer becomes indeterminate. You then use the pointer. Both of those cause undefined behavior.
The conversion of an integer to a pointer is also problematic:
6.3.2.3 Pointers
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation. 67)
The implementation of offsetof with dereferencing a NULL pointer invokes undefined behavior. In this implementation it is assumed that the hypothetical structure begins at address 0. You may assume it to be 1, and yes it will invoke UB too because you are dereferencing a null pointer, but because an uninitialized pointer is dereferenced.
Nothing in any version of the C standard would forbid a compiler from doing anything it wanted with any macro that would attempt to achieve the effect without defining a storage location to hold the indicated object. Nonetheless, a form like:
#define offsetof(s, m) ((char*)&((((s)*)0)->m)-(char*)0)
would probably be pretty safe for pre-C99 compilers. Note that it generates an integer by subtracting one char* from another. That is specified to work and yield the a constant value when the pointers access parts of the same valid object, and will in practice work on any compiler which doesn't notice that a null pointer isn't a valid object. By contrast, the effect of casting a pointer to an integer or vice versa will vary on different platforms and there are many platforms where (int)(((char*)&foo)+1) - (int)(char*)&foo may not yield 1.
Note also that the meaning of "Undefined Behavior" has changed recently. It used to be that Undefined Behavior meant that the specification didn't say what compilers had to do, but most compilers would generally choose (sometimes arbitrarily) behavior that was mathematically correct or would make sense on the underlying platform. For example, on a 32-bit processor, int32_t foo=2147483647; foo+=(unsigned char)x; if (foo > 100) ... a compiler might determine that for any possible value of x the mathematically-correct value assigned to foo would be in the range 2147483647 to 2147483903, and thus greater than 100 in any case. Or it might perform the operation using two's-complement arithmetic and perform the comparison on a possibly-wrapped-around value. Newer compilers, however, may do something even more interesting.
A new compiler may look at an expression like the example with foo and infer that if x is zero then foo must remain 2147483647, and if x is non-zero the compiler would be allowed to do whatever it likes, so it may infer that as a consequence that the LSB of x must equal zero when the statement is executed, so if the code is preceded by a test for (unsigned char)x==0, that expression would always be true. Given code like the offsetof macro, which would generate Undefined Behavior regardless of the values of any variables, a compiler would be entitled to eliminate not just any code using it, but also any preceding code which could not by any defined means cause program execution to terminate.
Note that casting a non-zero integer literal to a pointer only Undefined Behavior if there does not exist any object whose address has been taken and cast to an integer so as yield that same value. Thus, a compiler would not be able to recognize a variant of the pointer-difference-based offsetof macro which cast some non-zero value to a pointer as exhibiting Undefined Behavior unless it could determine that the number in question did not correspond to any pointer. On the other hand, an attempt to cast a non-zero integer to a pointer would on some systems perform a validation check to ensure that the pointer is valid; such a system may then trap if it isn't.
You're not actually dereferencing the pointer, what you're doing is more akin to pointer addition, so using zero should be fine.

If pointers are just addresses, why can't I assign a specific int address to a pointer?

As I've been explained, a pointer is a variable that holds an address. Why am I unable to do int *p = 5; then, wouldn't that be correctly pointing to the memory address 5?
Pointers are data types, not just addresses, and int is a different, not compatible, data type, so you cannot assign an int to a pointer.
If you really need that, and you know what you're doing, you can cast an int (i.e. char * p = (char*)5; ) to the specific data type, but you need such things only in rare cases.
a pointer is a variable that holds an address
No no no no no.
So many questions in the C tag on StackOverflow could be avoided if people were actually taught the basics. The rules are:
A variable is a storage location.
A storage location holds a value.
A pointer is a particular kind of value.
A pointer may be dereferenced. Dereferencing a pointer produces a storage location.
A storage location may be addressed. Addressing a storage location produces a pointer.
So let's review your sentence:
a pointer is a variable ...
No, a pointer is a value. There might be a variable that holds a pointer, just like there might be a variable that holds an int. But an int isn't a variable, and neither is a pointer. Similarly, a pointer might be dereferenced to produce the storage location associated with a variable, but that is not saying that the pointer is the variable.
... that holds an address
A pointer by definition is an address, but what an address actually consists of is an implementation detail. A pointer holds something that when dereferenced produces a storage location. A common choice for implementers of C compilers is to make the value of a pointer be a numeric offset into a large block of virtual memory, but there is no requirement that an author of a C compiler use integer offsets into virtual memory as pointers.
There is a requirement that addresses be convertible to integers and that integers be convertible back to addresses, and that these mappings obey certain restrictions on their behaviour. But for all you know the address could be the string "5" and the conversion to integer could be to call atoi on that thing. That would be silly, but it would be legal.
Why am I unable to do int *p = 5;
Who says that you are unable?
The C11 specification says:
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation. The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment. Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
So there you go. If the compiler you are using gives meaning when mapping 5 to a particular address, you go be awesome. If it does not, then you're in for the worst sort of undefined behaviour. Basically there is no requirement on the developer of a C compiler to make any conversions between pointers and integers except that zero is always a null pointer, and no valid object has the address zero.
Yes, theoretically that would be correct
(except address 5 is probably nothing you should use).
Maybe you´ll get a warning/error from the compiler
because assigning a plain number to a pointer is not usual,
but with a cast it will accept it:
int *p = (int*)5;
You can't assign an address to a pointer like that, because you haven't the right. Maybe the address is already used by some other program or the OS. You have to allocate an address:
int *n;
n = malloc(sizeof(int));

Why this redefinition of sizeof works

I'm redefining sizeof as:
#undef sizeof
#define sizeof(type) ((char*)((type*)(0) + 1) - (char*)((type*)(0)))
For this to work, the 2 '0' in the definition need to be the same entity in memory, or in other words, need to have the same address. Is this always guaranteed, or is it compiler/architecture/run-time dependent?
The 0 here is not an object – it is an address. So the question you ask is something of a non-sequitur.
You are thinking that the zero's are discreet pieces of data that need to be stored somewhere. They aren't.. they are being cast as pointers to memory location zero.
When you increment a pointer to a type, it is actually incremented by the size of the type it points to. This is how C array arithmetic works.
In practice, a null pointer of a certain type always refers to the same location in memory (especially when constructed the same way, as you do above), simply because any other implementation would be senseless.
However, The standard actually does not guarantee a lot about this:
"[...] is guaranteed to compare unequal to a pointer to any object or function." 6.3.2.3§3
"[...] Any two null pointers shall compare equal." 6.3.2.3§4
This leaves a lot of lee-way. Assume a memory model with two distinctive regions. Each region could have a region of null pointers (say the first 128 bytes). It is easy to see, that even in that weird case, the basic assumptions about null pointers can indeed hold! Well, given a proper compiler that makes weird null tests...
So, what else do we know about pointers in general...
What you are trying to do is first, increment a pointer
"one operand shall be a pointer to a complete object type and the other shall have integer type. (Incrementing is equivalent to adding 1.)" [6.5.6§2]
and then a pointer difference
"both operands are pointers to qualified or unqualified versions of compatible complete object types" [6.5.6§3]
OK, they are (well, assuming type is a complete object type). But what about semantics?
"For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type." [6.5.6§7]
This is actually a bit of a problem: The null pointer need not point to an actual object! (Otherwise you could dereference it safely...) Therefore, incrementing it or subtracting it from another pointer is UB!
To conclude: 0 does not point to an object, and therefore the answer to your question is No.
A strictly standards-conforming compiler could reject this, or return some nonsense. On "typical" machines and pointers have the same size, and casting an integer to a pointer just takes that bit pattern and looks at it as a pointer. There are machines where words contain extra data (type perhaps, permission bits). Some addresses might be forbidden for certain objects (i.e., nothing can have address 0), and so on. While it is guaranteed that sizeof(char) == 1, on e.g. Crays a character is actually 32 bits.
Besides, the C standard guarantees that the expresison in sizeof(expression) is not evaluated at all, just its type is taken. I.e., ^sizeof(x++)doesn't incrementx`.

Is NULL in C required/defined to be zero?

NULL appears to be zero in my GCC test programs, but wikipedia says that NULL is only required to point to unaddressable memory.
Do any compilers make NULL non-zero? I'm curious whether if (ptr == NULL) is better practice than if (!ptr).
NULL is guaranteed to be zero, perhaps casted to (void *)1.
C99, §6.3.2.3, ¶3
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.(55) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
And note 55 says:
55) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant.
Notice that, because of how the rules for null pointers are formulated, the value you use to assign/compare null pointers is guaranteed to be zero, but the bit pattern actually stored inside the pointer can be any other thing (but AFAIK only few very esoteric platforms exploited this fact, and this should not be a problem anyway since to "see" the underlying bit pattern you should go into UB-land anyway).
So, as far as the standard is concerned, the two forms are equivalent (!ptr is equivalent to ptr==0 due to §6.5.3.3 ¶5, and ptr==0 is equivalent to ptr==NULL); if(!ptr) is also quite idiomatic.
That being said, I usually write explicitly if(ptr==NULL) instead of if(!ptr) to make it extra clear that I'm checking a pointer for nullity instead of some boolean value.
Notice that in C++ the void * cast cannot be present due to the stricter implicit casting rules that would make the usage of such NULL cumbersome (you would have to explicitly convert it to the compared pointer's type every time).
From the language standard:
6.3.2.3 Pointers
...
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
...
55) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.17.
Given that language, the macro NULL should evaluate to a zero-valued expression (either an undecorated literal 0, an expression like (void *) 0, or another macro or expression that ultimately evaluates to 0). The expressions ptr == NULL and !ptr should be equivalent. The second form tends to be more idiomatic C code.
Note that the null pointer value doesn't have to be 0. The underlying implementation may use any value it wants to represent a null pointer. As far as your source code is concerned, however, a zero-valued pointer expression represents a null pointer.
In practice is the same, but NULL is different to zero. Since zero means there's a value and NULL means there isn't any. So, theoretically they are different, NULL having a different meaning and in some cases that difference should be of some use.
in practice no, !ptr is correct

How to set a pointer to zero'th location?

As per my knowledge, all occurrences of NULL in code are replaced by a 0 during the preprocessing stage. Then during compilation, all occurrences of 0 in pointer context are replaced by an appropriate value which represents NULL on that machine. Thus the compiler has to know that NULL value for that particular machine.
Now, this means that whenever I use 0 in pointer context, it is replaced by the appropriate value representing NULL on that machine, which may or may not be 0. So, how can I tell the compiler that I actually mean 0 and not NULL, when I use 0 in pointer context?
Sorry for the long description. Correct me if I am wrong
One way is to store all-bits-zero into your pointer:
void* zero;
memset(&zero, 0, sizeof(zero));
Well, there's no portable way to achieve that in C. C language provides no portable features specifically intended for aiming a pointer to a specific numerical address. However, the "secret" intent of the explicit integer-to-pointer conversion is actually just that: to implement the "natural" mapping between the integers and pointers, where "natural" normally means that the numerical value of the integer is preserved intact (if possible) when converting it to pointer type. In other words, all you need is an integer with 0 value. You just need to make sure that this integer does not qualify as an Integral Constant Expression, since otherwise it will be recognized as null-pointer constant. (In simple terms, you need a run-time integral zero, not a compile-time integral zero).
For example, this
uintptr_t i = 0;
void *p = (void *) i;
will normally produce a pointer pointing to address 0. Moreover, in C language (as opposed to C++) this
const uintptr_t i = 0;
void *p = (void *) i;
will also normally produce a pointer pointing to address 0. While this
void *p = 0; /* compile-time 0 will not do */
will produce the implementation-specific null-pointer value.
You can also take a look at C FAQ here, which covers this very issue.

Resources