It may sound like a silly question, but since in C, NULL is literally defined as
#define NULL 0
why can't it be a valid memory address? Why can't I dereference it, and why would it be impossible for any data to be at the memory address 0?
I'm sure the answer to this is something like the "the first n bytes of memory are always reserved by the kernel", or something like that, but I can't find anything like this on the internet.
Another part of my reasoning is that, wouldn't this be platform independent? Couldn't I invent a new architecture where the memory address 0 is accessible to processes?
Dereferencing NULL is undefined behavior. Anything could happen, and most of the time bad things happen. So be scared.
Some old architectures (VAX ...) permitted you to derefence NULL.
The C11 standard specification (read n1570) does not require the NULL pointer to be all zero bits ( see C FAQ Q5.17); it could be something else, but it should be an address which is never valid so is not obtainable by a successful malloc or by the address-of operator (unary &), in the sense of C11. But it is more convenient to have it so, and in practice most (but not all) C implementations do so.
IIRC, on Linux, you might mmap(2) the page containing (void*)0 with MAP_FIXED, but it is not wise to do so (e.g. because a conforming optimizing compiler is allowed to optimize dereference of NULL).
So (void*)0 is not a valid address in practice (on common processors with some MMU and virtual memory running a good enough operating system!), because it is convenient to decide that it is NULL, and it is convenient to be sure that derefencing it gives a segmentation fault. But that is not required by the C standard (and would be false on cheap microcontrollers today).
A C implementation has to provide some way to represent the NULL pointer (and guarantee that it is never the address of some valid location). That might even be done by a convention: e.g. provide a full 232 bytes address space, but promise to never use address 0 (or whatever address you assigned for NULL, perhaps 42!)
When NULL happens to be derefencable, subtile bugs are not caught by a segmentation fault (so C programs are harder to debug).
Couldn't I invent a new architecture where the memory address 0 is accessible to processes?
You could, but you don't want to do that (if you care about providing any standard conforming C implementation). You prefer to make address 0 be the NULL. Doing otherwise make harder to write C compilers (and standard C libraries). And make that address invalid to the point of giving a segmentation fault when derefencing make debugging (and the life of your users coding in C) easier.
If you dream of weird architectures, read about Lisp machines (and Rekursiv, and iapx 432) and see The circuit less traveled talk at FOSDEM2018 by Liam Proven. It really is instructive, and it is a nice talk.
Making address zero unmapped so that a trap occurs if your program tries to access it is a convenience provided by many operating systems. It is not required by the C standard.
According to the C standard:
NULL is not be the address of any object or function. (Specifically, it requires that NULL compare unequal to a pointer to of any object or function.)
If you do apply * to NULL, the resulting behavior is not defined by the standard.
What this means for you is that you can use NULL as an indicator that a pointer is not pointing to any object or function. That is the only purpose the C standard provides for NULL—to use is tests such as if (p != NULL)…. The C standard does not guarantee that if you use *p when p is NULL that a trap will occur.
In other words, the C standard does not require NULL to provide any trapping capability. It is just a value that is different from any actual pointer, provided just so you have one pointer value that means “not pointing to anything.”
General-purpose operating systems typically arrange for the memory at address zero to be unmapped (and their C implementations define NULL to be (void *) 0 or something similar) specifically so that a trap will occur if you dereference a null pointer. When they do this, they are extended the C language beyond what the specification requires. They deliberately exclude address zero from the memory map of your process to make these traps work.
However, the C standard does not require this. A C implementation is free to leave the memory at address zero mapped, and, when you apply * to a null pointer, there might be data there, and your program could read and/or write that data, if the operating system has allowed it. When this is done, it is most often in code intended to run inside the operating system kernel (such as device drivers, kernel extensions, or the kernel itself) or embedded systems or other special-purpose systems with simple operating systems.
The null pointer constant (NULL) is 0-valued. The null pointer value may be something other than 0. During translation, the compiler will replace occurrences of the null pointer constant with the actual null pointer value.
NULL does not represent “address 0”; rather, it represents a well-defined invalid pointer value that is guaranteed not to point to any object or function, and attempts to dereference invalid pointers lead to undefined behavior.
Related
Comparing pointers with a relational operator (e.g. <, <=, >= or >) is only defined by the C standard when the pointers both point to within the same aggregate object (struct, array or union). This in practise means that a comparison in the shape of
if (start_object <= my_pointer && my_pointer < end_object+1) {
can be turned into
if (1) {
by an optimising compiler. Despite this, in K&R, section 8.7 "Example—A Storage Allocator", the authors make comparisons similar to the one above. They excuse this by saying
There is still one assumption, however, that pointers to different blocks returned by sbrk can be meaningfully compared. This is not guaranteed by the standard, which permits pointer comparisons only within an array. Thus this version of malloc is portable only among machines for which general pointer comparison is meaningful.
Furthermore, it appears the implementation of malloc used in glibc does the same thing!
What's worse is – the reason I stumbled across this to begin with is – for a school assignment I'm supposed to implement a rudimentary malloc like function, and the instructions for the assignment requires us to use the K&R code, but we have to replace the sbrk call with a call to mmap!
While comparing pointers from different sbrk calls is probably undefined, it is also only slightly dubious, since you have some sort of mental intuition that the returned pointers should come from sort of the same region of memory. Pointers returned by different mmap calls have, as far as I understand, no guarantee to even be remotely similar to each other, and consolidating/merging memory blocks across mmap calls should be highly illegal (and it appears glibc avoids this, resorting to only merging memory returned by sbrk or internally inside mmap pages, not across them), yet the assignment requires this.
Question: could someone shine some light on
Whether or not comparing pointers from different calls to sbrk may be optimised away, and
If so, what glibc does that lets them get away with it.
The language lawyer answer is (I believe) to be found in §6.5.8.5 of the C99 standard (or more precisely from ISO/IEC 9899:TC3 Committee Draft — Septermber 7, 2007 WG14/N1256 which is nearly identical but I don't have the original to hand) which has the following with regard to relational operators (i.e. <, <=, >, >=):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
(the C11 text is identical or near identical)
This at first seems unhelpful, or at least suggests the that the implementations each exploit undefined behaviour. I think, however, you can either rationalise the behaviour or use a work around.
The C pointers specified are either going to be NULL, or derived from taking the address of an object with &, or by pointer arithmetic, or by the result of some function. In the case concerned, they are derived by the result of the sbrk or mmap system calls. What do these systems calls really return? At a register level, they return an integer with the size uintptr_t (or intptr_t). It is in fact the system call interface which is casting them to a pointer. As we know casts between pointers and uintptr_t (or intptr_t) are by definition of the type bijective, we know we could cast the pointers to uintptr_t (for instance) and compare them, which will impose a well order relation on the pointers. The wikipedia link gives more information, but this will in essence ensure that every comparison is well defined as well as other useful properties such as a<b and b<c implies a<c. (I also can't choose an entirely arbitrary order as it would need to satisfy the other requirements of C99 §6.5.8.5 which pretty much leaves me with intptr_t and uintptr_t as candidates.)
We can exploit this to and write the (arguably better):
if ((uintptr_t)start_object <= (uintptr_t)my_pointer && (uintptr_t)my_pointer < (uintptr_t)(end_object+1)) {
There is a nit here. You'll note I casted to uintptr_t and not intptr_t. Why was that the right choice? In fact why did I not choose a rather bizarre ordering such as reversing the bits and comparing? The assumption here is that I'm chosing the same ordering as the kernel, specifically that my definition of < (given by the ordering) is such that the start and end of any allocated memory block will always be such that start < end. On all modern platforms I know, there is no 'wrap around' (e.g. the kernel will not allocate 32 bit memory starting at 0xffff8000 and ending at 0x00007ffff) - though note that similar wrap around has been exploited in the past.
The C standard specifies that pointer comparisons give undefined results under many circumstances. However, here you are building your own pointers out of integers returned by system calls. You can therefore either compare the integers, or compare the the pointers by casting them back to integers (exploiting the bijective nature of the cast). If you merely compare the pointers, you rely on the C compiler's implementation of pointer comparison being sane, which it almost certainly is, but is not guaranteed.
Are the possibilities I mention so obscure that they can be discounted? Nope, let's find a platform example where they might be important: 8086. It's possible to imagine an 8086 compilation model where every pointer is a 'far' pointer (i.e. contains a segment register). Pointer comparison could do a < or > on the segment register and only if they are equal do a < or > onto the offset. This would be entirely legitimate so long as all the structures in C99 §6.5.8.5 are in the same segment. But it won't work as one might expect between segments as 1000:1234 (which is equal to 1010:1134 in memory address) will appear smaller than 1010:0123. mmap here might well return results in different segments. Similarly one could think of another memory model where the segment register is actually a selector, and a pointer comparison uses a processor comparison instruction was used to compare memory addresses which aborts if an invalid selector or an offset outside a segment is used.
You ask two specific questions:
Whether or not comparing pointers from different calls to sbrk may be optimised away, and
If so, what glibc does that lets them get away with it.
In the formulation given above where start_object etc. are actually void *, then the calculation may be optimized away (i.e. might do what you want) but is not guaranteed to do so as the behaviour is undefined. A cast would guarantee that it does so provided the kernel uses the same well ordering as implied by the cast.
In answer to the second question, glibc is relying on a behaviour of the C compiler which is technically not required, but is very likely (per the above).
Note also (at least in the K&R in front of me) that the line you quote doesn't exist in the code. The caveat is in relation to the comparison of header * pointers with < (as I far as I can see comparison of void * pointers with < is always UB) which may derive from separate sbrk() calls.
The answer is simple enough. The C library implementation is written with some knowledge of (or perhaps expectations for) how the C compiler will handle certain code that has undefined behaviour according to the official specification.
There are many examples I could give; but that pointers actually refer to an address in the process' address space and can be compared freely is relied on by the C library implementation (at least by Glibc) and also by many "real world" programs. While it is not guaranteed by the standard for strictly conforming programs, it is true for the vast majority of real-world architectures/compilers. Also, note footnote 67, regarding conversion of pointers to integers and back:
The mapping functions for converting a pointer to an integer or an
integer to a pointer are intended to be consistent with the addressing
structure of the execution environment.
While this doesn't strictly give license to compare arbitrary pointers, it helps to understand how the rules are supposed to work: as a set of specific behaviours that are certain to be consistent across all platforms, rather than as a limit to what is permissible when the representation of a pointer is fully known and understood.
You've remarked that:
if (start_object <= my_pointer && my_pointer < end_object+1) {
Can be turned into:
if (1) {
With the assumption (which you didn't state) that my_pointer is derived in some way from the value of start_object or the address of the object which it delimits - then this is strictly true, but it's not an optimisation that compilers make in practice except in the case of static/automatic storage duration objects (i.e. objects which the compiler knows weren't dynamically allocated).
Consider the fact that calls to sbrk are defined to increment, or decrement the number of bytes allocated in some region (the heap), for some process by the given incr parameter according to some brk address. This is really just a wrapper around brk, which allows you to adjust the current top of the heap. When you call brk(addr), you're telling the kernel to allocate space for your process all the way from the bottom of the addr (or possibly to free space between the current previous higher-address top of the heap down to the new address). sbrk(incr) would be exactly equivalent if incr == new_top - original_top. Thus to answer your question:
Because sbrk just adjusts the size of the heap (i.e. some contiguous region of memory) by incr number of bytes, comparing the values of sbrk is just a comparison of points in some contiguous region of memory. That is exactly equivalent to comparing points in an array, and so it is a well defined operation according to the the C-standard. Therefore, pointer comparison calls around sbrk can be optimized away.
glibc doesn't do anything special to "get away with it" - they just assume that the assumption mentioned above holds true (which it does). In fact, if they're checking the state of a chunk for memory that was allocated with mmap, it explicitly verifies that the mmap'd memory is outside the range allocated with sbrk.
Edit: Something I want to make clearer about my answer: The key here is that there is no undefined behavior! sbrk is defined to allocate bytes in some contiguous region of memory, which is itself an 'object' as specified by the C-standard. Therefore, comparison of pointers within that 'object' is a completely sane and well defined operation. The assumption here is not that glibc is taking advantage of undefined pointer comparison, it's that it's assuming that sbrk grows / shrinks memory in some contiguous region.
The authors of the C Standard recognized that there are some segmented-memory hardware platforms where an attempt to perform a relational comparison between objects in different segments might behave strangely. Rather than say that such platforms could not efficiently accommodate efficient C implementations, the authors of the Standard allow such implementations to do anything they see fit if an attempt is made to compare pointers to objects that might be in different segments.
For the authors of the Standard to have said that comparisons between disjoint objects should only exhibit strange behavior on such segmented-memory systems that can't efficiently yield consistent behavior would have been seen as implying that such systems were inferior to platforms where relational comparisons between arbitrary pointers will yield a consistent ranking, and the authors of the Standard went out of their way to avoid such implications. Instead, they figured that since there was no reason for implementations targeting commonplace platforms to do anything weird with such comparisons, such implementations would handle them sensibly whether the Standard mandated them or not.
Unfortunately, some people who are more interested in making a compiler that conforms to the Standard than in making one that's useful have decided that any code which isn't written to accommodate the limitations of hardware that has been obsolete for decades should be considered "broken". They claim that their "optimizations" allow programs to be more efficient than would otherwise be possible, but in many cases the "efficiency" gains are only significant in cases where a compiler omits code which is actually necessary. If a programmer works around the compiler's limitations, the resulting code may end up being less efficient than if the compiler hadn't bothered with the "optimization" in the first place.
In many C implementations (but not all), NULL corresponds to a process's logical address 0. When we try to dereference NULL, we get a segmentation fault because NULL "maps" to a memory segment that is neither readable nor writable.
Are there systems with real, physical addresses that correspond to the virtual NULL address? If so, what might be stored there? Or is NULL purely a virtual address with no physical counterpart?
NULL is implementation defined.
It is a macro which expands to an implementation defined null pointer constant.
"it depends" #I3x
Is there a real, physical address that corresponds to the virtual NULL address?
On some platforms: yes, on others: no. It is implementation defined.
If so, what might be stored there?
On some platforms: a value, maybe 0. This value may/may not be accessible. On others: there is simple no access. It is implementation defined.
Or is NULL purely a virtual address with no physical counterpart?
On some platforms: it is a virtual address with no counterpart. On others it may be a real address. It is implementation defined.
From C-standard paper ISO/IEC 9899:201x (C11) §6.3.2.3 Pointers:
An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer
constant. If a null pointer constant is converted to a pointer
type, the resulting pointer, called a null pointer, is guaranteed to
compare unequal to a pointer to any object or function
This means that a NULL pointer should give a result TRUE if compared to a 0 value, and also that it will resolve to a value=0 if converted to an arithmetic type.
But it also mean that the compiler must generate a value for the null pointer that is guaranteed not to be an address existing (compare equal) in the code running environment (this should be more standardish).
The scope is to make the code portable imposing that each operation on NULL pointer must give same result on any CPU or architecture.
Note: on practical side to guarantee that the NULL pointer doesn't equals any address in the running environment there are two ways. The first will apply to those platforms where invalid address exists, getting advantage of such invalid addresses. The second is to conventionally reserve an address for the scope, typically 0. The latter means that the platform, the OS mainly, have to take care to make that address invalid in the sense that it isn't usable and then can never equal any other pointer, else there will always be an object that can legally use that address and consequently make valid its address comparison to the NULL pointer. This point is important because missing any infrastructure, as an OS could be, that takes care of the invalid value we can have a conflicting condition. An example is when using C compiler for base programming, OS and kernel, if the processor use the memory from address 0 to hold interrupt service addresses a pointer to the object will have a value of 0. In this case the compiler should use a different value of NULL pointer, but if the OS can access the whole addressing space. to allow full Memory Management, there will be no value applicable for NULL pointer.
On user or kernel HL code this is not a problem because there arealways some addresses that are normally not allowed and can then be used for the NULL pointer value.
Yes, most (all?) systems have a real physical address zero, it is just inaccessible from a process running behind a virtual address-space which remaps virtual addresses to (different) physical addresses. In those processes the address zero is simply left unmapped to anything, just as most of the rest of their address space.
However, as a matter of fact, all the x86 based processors till this day boot into the 'real mode' at first. This is a 8086 compatibility mode. Under this processor mode, not only there is a physical addressable address zero, but this address is actually where the interrupt vector table located at, and this is where the processor reads from when it handles interrupts.
Certainly on some computers, there is actual memory at address 0. On others, there may not be. There are even some hardware designs for which the answer may depend on the temporary state of a memory-mapping chip or chipset.
But in all cases, dereferencing a NULL pointer is undefined behavior.
In many C implementations (but not all), NULL corresponds to a process's logical address 0.
No, that is the wrong way to look at it. Null pointers do not correspond to anything other than themselves, by definition, and C in no way ties null pointers to any particular address, in any sense of the term.
At best, you could say that some implementations' representations of null pointers have the form that a representation of a pointer to an object residing at machine address 0 would have. Since null pointers compare unequal to every pointer to an object or function, however, such a situation requires that it be impossible for any object or function that the C program can reference to reside at that address.
When we try to dereference NULL, we get a segmentation fault
That is one of the multitude of observed behaviors that are possible when you invoke undefined behavior. Empirically, it is one of the more likely to be observed when undefined behavior is invoked in the particular way you describe. It is by no means guaranteed by C, and indeed, there are widely-used implementations which, empirically, exhibit different behavior.
because NULL "maps" to a memory segment that is neither readable nor writable.
But there you're delving deep into implementation details.
Is there a real, physical address that corresponds to the virtual NULL address?
No, because a null pointer is not an address. It is a pointer value that is explicitly invalid to dereference. Technically, although C refers to some pointer values as "addresses", it neither requires nor depends on any pointer value to be a machine address, neither a physical nor a virtual one. The value of a null pointer, however, is not an address even in C's sense of the term. The & operator never produces this value, and it is invalid to dereference -- it explicitly fails to correspond to any object or function.
If so, what might be stored there? Or is NULL purely a virtual address with no physical counterpart?
Although it is indeed common for null pointer values to be represented in the way that a pointer to an object / function residing at machine address 0 would be, this is irrelevant, because in that case a C program can never access anything residing at that address. Thus, as far as the program is concerned, nothing resides there. As far as the hardware and OS kernel are concerned, on the other hand, external programs' view of memory is largely irrelevant. And if one nevertheless insists on considering the kernel's view of what is stored in physical memory at an address derived from an external program's null pointer representation, it is highly system-dependent.
In C, deferencing the null pointer is Undefined Behavior, however the null pointer value has a bit representation that in some architectures make it points to a valid address (e.g the address 0).
Let's call this address the null pointer address, for the sake of clarity.
Suppose I want to write a piece of software in C, in an environment with unrestrained access to memory. Suppose further I want to write some data at the null pointer address: how would I achieve that in a standard compliant way?
Example case (IA32e):
#include <stdint.h>
int main()
{
uintptr_t zero = 0;
char* p = (char*)zero;
return *p;
}
This code when compiled with gcc with -O3 for IA32e gets transformed into
movzx eax, BYTE PTR [0]
ud2
due to UB (0 is the bit representation of the null pointer).
Since C is close to low level programming, I believe there must be a way to access the null pointer address and avoid UB.
Just to be clear
I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way.
I know the answer for the latter.
I read (part of) the C99 standard to clear my mind. I found the sections that are of interest for my own question and I'm writing this as a reference.
DISCLAIMER
I'm an absolute beginner, 90% or more of what I have written is wrong, makes no sense, or may break you toaster. I also try to make a rationale out of the standard, often with disastrous and naive results (as stated in the comment).
Don't read.
Consult #Olaf, for a formal and professional answer.
For the following, the term architectural address designed a memory address as seen by the processor (logical, virtual, linear, physical or bus address). In other word the addresses that you would use in assembly.
In section 6.3.2.3. it reads
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
and regarding integer to pointer conversion
An integer may be converted to any pointer type. Except as previously specified [i.e. for the case of null pointer constant], the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation†.
These imply that the compiler, to be compliant, need only to implement a function int2ptr from integer to pointers that
int2ptr(0) is, by definition, the null pointer.
Note that int2ptr(0) is not mandated to be 0. It can be any bit representation.
*int2ptr(n != 0) has no constraints.
Note that this means that int2ptr needs not to be the identity function, nor a function that return valid pointers!
Given the code below
char* p = (char*)241;
The standard makes absolute no guarantee that the expression *p = 56; will write to the architectural address 241.
And so it gives no direct way to access any other architectural address (including int2ptr(0), the address designed by a null pointer, if valid).
Simply put the standard does not deal with architectural addresses, but with pointers, their comparison, conversions and their operations‡.
When we write code like char* p = (char*)K we are not telling the compiler to make p point to the architectural address K, we are telling it to make a pointer out of the integer K, or in other term to make p point to the (C abstract) address K.
Null pointer and the (architectural) address 0x0 are not the same (cit.) and so is true for any other pointer made from the integer K and the (architectural) address K.
For some reasons, childhood heritages, I thought that integer literals in C could be used to express architectural addresses, instead I was wrong and that only happen to be (sort of) correct in the compilers I was using.
The answer to my own question is simply: There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one1.
Note about return *(volatile char*)0;
The standard says that
If an
invalid value [a null pointer value is an invalid value] has been assigned to the pointer, the behavior of the unary * operator is undefined.
and that
Therefore any expression referring
to such an [volatile] object shall be evaluated strictly according to the rules of the abstract machine.
The abstract machine says that * is undefined for null pointer values, so that code shouldn't differ from this one
return *(char*)0;
which is also undefined.
Indeed they don't differ, at least with GCC 4.9, both compile to the instructions stated in my question.
The implementation defined way to access the 0 architectural address is, for GCC, the use of the -fno-isolate-erroneous-paths-dereference flag which produces the "expected" assembly code.
†The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
‡Unfortunately it says that the & yields the address of its operand, I believe this is a bit improper, I would say that it yields a pointer to its operand. Consider a variable a that is known to resides at address 0xf1 in a 16 bit address space and consider a compiler that implements int2ptr(n) = 0x8000 | n. &a would yield a pointer whose bit representation is 0x80f1 which is not the address of a.
1Which was special to me because it was the only one, in my implementations, that couldn't be accessed.
As OP has correctly concluded in her answer to her own question:
There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one.
However, a situation where one would want to access memory directly is likely one where a custom linker script is employed. (I.e. some kind of embedded systems stuff.) So I would say, the standard compliant way of doing what OP asks would be to export a symbol for the (architectural) address in the linker script, and not bother with the exact address in the C code itself.
A variation of that scheme would be to define a symbol at address zero and simply use that to derive any other required address. To do that add something like the following to the SECTIONS portion of the linker script (assuming GNU ld syntax):
_memory = 0;
And then in your C code:
extern char _memory[];
Now it is possible to e.g. create a pointer to the zero address using for example char *p = &_memory[0]; (or simply char *p = _memory;), without ever converting an int to a pointer. Similarly, int addr = ...; char *p_addr = &_memory[addr]; will create a pointer to the address addr without technically casting an int to a pointer.
(This of course avoids the original question, because the linker is independent from the C standard and C compiler, and every linker might have a different syntax for their linker script. Also, the generated code might be less efficient, because the compiler is not aware of the address being accessed. But I think this still adds an interesting perspective to the question, so please forgive the slightly off-topic answer..)
Whatever solution is going to be implementation-dependent. Needfully. ISO C does not describe the environment a C programs runs on; rather, what a conforming C program looks like among a variety of environments («data-processing systems»). The Standard cannot indeed guarantee what you would get by accessing an address that is not an array of objects, i.e. something you visibly allocated, not the environment.
Therefore, I would use something the standard leaves as implementation-defined (and even as conditionally-supported) rather than undefined behavior*: Inline assembly. For GCC/clang:
asm volatile("movzx 0, %%eax;") // *(int*)0;
It also worth mentioning freestanding environments, the one you seem to be in. The standard says about this execution model (emphasis mine):
§ 5.1.2
Two execution environments are defined: freestanding and hosted. [...]
§ 5.1.2.1, comma 1
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined. [...]
Notice it doesn't say you can access any address at will.
Whatever that could mean. Things are a bit different when you are the implementation the standard delegates control to.
All quotes are from the draft N. 1570.
The C Standard does not require that implementations have addresses that resemble integers in any way shape or form; all it requires is that if types uintptr_t and intptr_t exist, the act of converting a pointer to uintptr_t or intptr_t will yield a number, and converting that number directly back to the same type as the original pointer will yield a pointer equal to the original.
While it is recommended that platforms which use addresses that resemble integers should define conversions between integers and addresses in a fashion that would be unsurprising to someone familiar with such mapping, that is not a requirement, and code relying upon such a recommendation would not be strictly conforming.
Nonetheless, I would suggest that if a quality implementation specifies that it performs integer-to-pointer conversion by a simple bitwise mapping, and if there may be plausible reasons why code would want to access address zero, a it should regard statements like:
*((uint32_t volatile*)0) = 0x12345678;
*((uint32_t volatile*)x) = 0x12345678;
as a request to write to address zero and address x, in that order even if
x happens to be zero, and even if the implementation would normally trap on
null pointer accesses. Such behavior isn't "standard", insofar as the
Standard says nothing about the mapping between pointers and integers, but
a good quality implementation should nonetheless behave sensibly.
I'm assuming the question you are asking is:
How do I access memory such that a pointer to that memory has the same representation as the null pointer?
According to a literal reading of the Standard, this is not possible. 6.3.2.3/3 says that any pointer to an object must compare unequal to the null pointer.
Therefore this pointer we are talking about must not point to an object. But the deference operator *, applied to an object pointer, only specifies the behaviour in the case that it points to an object.
Having said that, the object model in C has never been specified rigorously, so I would not put too much weight into the above interpretation. Nevertheless, it seems to me that whatever solution you come up with is going to have to rely on non-standard behaviour from whichever compiler is in use.
We see an example of this in the other answers in which gcc's optimizer detects an all-bits-zero pointer at a late stage of processing and flags it as UB.
Sometimes data at memory address 0x0 is quite valuable -- take x86 real mode IVT as a more known example: it starts at 0x0 and contains pointers to interrupt handlers: a dword at 0x00 is a pointer to division by zero error handler.
However, C11 language standard prohibits dereferencing null pointers [WG14 N1570 6.5.3.2], which are defined as pointers initialized with 0 or pointers initialized with a null pointer [WG14 N1570 6.3.2.3], effectively banning the very first byte.
How do people actually use 0x0 when it's needed?
C does not prohibit dereferencing the null pointer, it merely makes it undefined behavior.
If your environment is such that you're able to dereference a pointer containing the address 0x0, then you should be able to do so. The C language standard says nothing about what will happen when you do so. (In most environments, the result will be a program crash.)
A concrete example (if I'm remembering this correctly): On the 68k-based Sun 3 computers, dereferencing a null pointer did not cause a trap; instead, the OS stored a zero value at memory address zero, and dereferencing a null pointer (which pointed to address zero) would yield that zero value. That meant, for example, that a C program could treat a null pointer as if it were a valid pointer to an empty string. Some software, intentionally or not, depended on this behavior. This required a great deal of cleanup when porting software to the SPARC-based Sun 4, which trapped on null pointer dereferences. (I distinctly remember reading about this, but I was unable to find a reference; I'll update this if I can find it.)
Note that the null pointer is not necessarily address zero. More precisely, the representation of a null may or may not be all-bits-zero. It very commonly is, but it's not guaranteed. (If it's not, then the integer-to-pointer conversion of (void*)0 is non-trivial.)
Section 5 of the comp.lang.c FAQ discusses null pointers.
How do people actually use 0x0 when it's needed?
By either:
writing the required code in assembly language, or
writing the code in C and verifying that their compiler generates correct assembly language for the desired operation
The statement:
char * x = 0;
does not necessarily put 0x0 into x. It puts the defined null pointer value for the current architecture and compiler into x.
Now, in practical terms, all compilers / processors observed in common use do end up putting 32 (or 64) 0 bits in a row in a register or storage location in response to that statement, so, so if memory address 0 is useful, then, as others have indicated, you are stuck using formally undefined behavior. However, in once upon a time there was hardware out there for which a 'null pointer' was some bit pattern that was not all zeros, and, who knows, there may be again.
Annex J It is undefined behavior when...
The operand of the unary * operator has an invalid value (6.5.3.2).
In that same footnote you mentioned, it says a null pointer is an invalid value. Therefore, it is not prohibited, but undefined behavior. As for the distinction between address 0x0 and a null pointer, see Is memory address 0x0 usable?.
The null pointer is not necessarily address 0x0, so potentially an
architecture could choose another address to represent the null
pointer and you could get 0x0 from new as a valid address.
Whether the null pointer is reserved by the Operative System or the
C++ implementation is unspecified, but plain new will never return a
null pointer, whatever its address is (nothrow new is a different
beast). So, to answer your question:
Is memory address 0x0 usable?
Maybe, it depends on the particular implementation/architecture.
In other words, feel free to use 0x0 if you're sure on your system that it won't cause a crash.
The operating system use a table of pointers to interrupt routines to call appropriate interrupt(s). Generally (in most operating system) table of pointers is stored in low memory (the first few hundred or so locations), These locations hold the addresses of the interrupt service routines for the various devices.
So when you do
char *ptr = 0x0;
then most likely you are initializing your pointer with the address of an interrupt service routine. Dereferencing (or modifying) a memory location which belongs to operating system most likely cause program to crash.
So, better not to initialize a pointer to 0x0 and dereference it until you have the confirmation that it doesn't belongs to OS.
I have assigned some random address to a pointer of a particular data type. Then I stored a value in that particular address. When I run the program, it terminates abruptly.
char *c=2000;
*c='A';
printf("%u",c);
printf("%d",*c);
I could be able to print the value of c in first printf statement. But I couldn't fetch the value stored in that address through the second one. I have executed in Cygwin GCC compiler and also in online ideone.com compiler. In ideone.com compiler it shows runtime error. What's the reason behind this?
When you assign the address 2000 to the pointer c, you are assuming that will be a valid address. Generally, though, it is not a valid address. You can't choose addresses at random and expect the compiler (and operating system) to have allocated that memory for you to use. In particular, the first page of memory (often 4 KiB, usually at least 1 KiB) is completely off limits; all attempts to read or write there are more usually indicative of bugs than intentional behaviour, and the MMU (memory management unit) is configured to reject attempts to access that memory.
If you're using an embedded microprocessor, the rules might well be different, but on a general purpose o/s like Windows with Cygwin, addresses under 0x1000 (4 KiB) are usually verboten.
You can print the address (you did it unreliably, but presumably your compiler didn't warn you; mine would have warned me about using a format for a 4-byte integer quantity to print an 8-byte address). But you can't reliably read or write the data at the address. There could be machines (usually mainframes) where simply reading an invalid address (even without accessing the memory it points at) generates a memory fault.
So, as Acme said in their answer,you've invoked undefined behaviour. You've taken the responsibility from the compiler for assigning a valid address to your pointer, but you chose an invalid value. A crash is the consequence of your bad decision.
char *c=2000;
Assignment (and initialization) of integer values to pointers is implementation defined behavior.
Implementation-defined behavior is defined by the ISO C Standard in
section 3.4.1 as:
unspecified behavior where each implementation documents how the choice
is made
EXAMPLE An example of implementation-defined behavior is the
propagation of the high-order bit when a signed integer is shifted
right.
Any code that relies on implementation defined behaviour is only
guaranteed to work under a specific platform and/or compiler. Portable
programs should try to avoid such behaviour.