Is it undefined behavior to print null pointers with the %p conversion specifier?
#include <stdio.h>
int main(void) {
void *p = NULL;
printf("%p", p);
return 0;
}
The question applies to the C standard, and not to C implementations.
This is one of those weird corner cases where we're subject to the limitations of the English language and inconsistent structure in the standard. So at best, I can make a compelling counter-argument, as it's impossible to prove it :)1
The code in the question exhibits well-defined behaviour.
As [7.1.4] is the basis of the question, let's start there:
Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, [... other examples ...]) [...] the behavior is undefined. [... other statements ...]
This is clumsy language. One interpretation is that the items in the list are UB for all library functions, unless overridden by the individual descriptions. But the list starts with "such as", indicating that it's illustrative, not exhaustive. For example, it does not mention correct null-termination of strings (critical for the behaviour of e.g. strcpy).
Thus it's clear the intent/scope of 7.1.4 is simply that an "invalid value" leads to UB (unless stated otherwise). We have to look to each function's description to determine what counts as an "invalid value".
Example 1 - strcpy
[7.21.2.3] says only this:
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
It makes no explicit mention of null pointers, yet it makes no mention of null terminators either. Instead, one infers from "string pointed to by s2" that the only valid values are strings (i.e. pointers to null-terminated character arrays).
Indeed, this pattern can be seen throughout the individual descriptions. Some other examples:
[7.6.4.1 (fenv)] store the current floating-point environment in the object pointed to by envp
[7.12.6.4 (frexp)] store the integer in the int object pointed to by exp
[7.19.5.1 (fclose)] the stream pointed to by stream
Example 2 - printf
[7.19.6.1] says this about %p:
p - The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.
Null is a valid pointer value, and this section makes no explicit mention that null is a special case, nor that the pointer has to point at an object. Thus it is defined behaviour.
1. Unless a standards author comes forward, or unless we can find something similar to a rationale document that clarifies things.
The Short Answer
Yes. Printing null pointers with the %p conversion specifier has undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.
The answer applies to any of the C standards (C89/C99/C11).
The Long Answer
The %p conversion specifier expects an argument of type pointer to void, the conversion of the pointer to printable characters is implementation-defined. It doesn't state that a null pointer is expected.
The introduction to the standard library functions states that null pointers as arguments to (standard library) functions are considered to be invalid values, unless it is explicitly stated otherwise.
C99 / C11 §7.1.4 p1
[...] If an argument to a function has an invalid value (such as [...] a null pointer, [...] the behavior is undefined.
Examples for (standard library) functions that expect null pointers as valid arguments:
fflush() uses a null pointer for flushing "all streams" (that apply).
freopen() uses a null pointer for indicating the file "currently associated" with the stream.
snprintf() allows to pass a null pointer when 'n' is zero.
realloc() uses a null pointer for allocating a new object.
free() allows to pass a null pointer.
strtok() uses a null pointer for subsequent calls.
If we take the case for snprintf(), it makes sense to allow passing a null pointer when 'n' is zero, but this is not the case for other (standard library) functions that allow a similar zero 'n'. For example: memcpy(), memmove(), strncpy(), memset(), memcmp().
It is not only specified in the introduction to the standard library, but also once again in the introduction to these functions:
C99 §7.21.1 p2 / C11 §7.24.1 p2
Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values as described in 7.1.4.
Is it intentional?
I don't know whether the UB of %p with a null pointer is in fact intentional, but since the standard explicitly states that null pointers are considered invalid values as arguments to standard library functions, and then it goes and explicitly specifies the cases where a null pointer is a valid argument (snprintf, free, etc), and then it goes and once again repeats the requirement for the arguments to be valid even in zero 'n' cases (memcpy, memmove, memset), then I think it's reasonable to assume that the C standards committee isn't too concerned with having such things undefined.
The authors of the C Standard made no effort to exhaustively list all of the behavioral requirements an implementation must meet to be suitable for any particular purpose. Instead, they expected that people writing compilers would exercise a certain amount of common sense whether the Standard requires it or not.
The question of whether something invokes UB is seldom in and of itself useful. The real questions of importance are:
Should someone who is trying to write a quality compiler make it behave in predictable fashion? For the described scenario the answer is clearly yes.
Should programmers be entitled to expect that quality compilers for anything resembling normal platforms will behave in predictable fashion? In the described scenario, I would say the answer is yes.
Might some obtuse compiler writers stretch the interpretation of the Standard so as to justify doing something weird? I would hope not, but wouldn't rule it out.
Should sanitizing compilers squawk about the behavior? That would depend upon the paranoia level of their users; a sanitizing compiler probably shouldn't default to squawking about such behavior, but perhaps provide a configuration option to do in case programs might be ported to "clever"/dumb compilers that behave weirdly.
If a reasonable interpretation of the Standard would imply a behavior is defined, but some compiler writers stretch the interpretation to justify doing otherwise, does it really matter what the Standard says?
Related
A. printf("Values: X=%s Y=%s\n", x,y,z);
B. printf("Values: x=%s, Y=%s\n", x);
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters. I would like to choose between the lesser evil with an explanation. Can a modern C compiler help catch such problems? If yes, how does printf() implementor need to assist the compiler?
Both of the above printf() statements are incorrect: one has extra parameters, other has fewer parameters.
The first one is not incorrect according to the C standard. The rules for function calls in general, in C 2018 6.5.2.2, do not make it an error to pass unused arguments for a ... in the function prototype. For printf specifically, C 2018 7.21.6.1 2 (about fprintf, which the specification for printf refers to) says extra arguments are harmless:
… If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored…
Certainly if a programmer writes printf("Values: X=%s. Y=%s.\n", x, y, z);, they might have made a mistake, and a compiler would be reasonable in pointing out this possibility. However, consider code such as:
printf(ComputedFormat, x, y, z);
Here it is reasonable that we wish to print different numbers of values in different circumstances, and the ComputedFormat reflects this. It would be tedious to write code for each case and dispatch to them with a switch statement. It is simpler to write one call and let the computed format determine how many values are printed. So it is not always an error to have more arguments than the conversion specifications use.
I would like to choose between the lesser evil with an explanation.
The behavior of the latter code is not defined by the C standard. C 2018 7.21.6.1 2 also says:
… If there are insufficient arguments for the format, the behavior is undefined…
Thus, no behavior may be relied on from the latter code, unless there is some guarantee from the C implementation.
Can a modern C compiler help catch such problems?
Good modern C compilers have information about the specification of printf and, when the format argument is a string literal, they compare the number and types of the arguments to the conversion specifications in the string.
If yes, how does printf() implementor need to assist the compiler?
The implementor of printf does not need to do anything except conform to the specification of printf in the C standard. The aid described above is performed by the C compiler with reference to the C standard; it does not rely on features of the particular printf implementation.
In some platforms, information about the number of arguments passed is provided to the called routine. In such platforms, a printf implementor could check whether too few arguments are provided and signal an error in some method.
Eric Postpischil has already made a great answer that uses the most reliable source (the C standard), but I just want to post my own answer about why printf may behave as it does in both cases.
printf is a variadic function which can take a variable number of arguments. The way it knows how many you have passed is solely through the format string; every time it finds a format specifier, it takes the next argument out of the list (and assumes its type from which specifier has been used). Nothing would really happen to any extra arguments because since there is no specifier for them, the function will not even try to take them and they will not be printed. So you may be warned about the extra arguments by the compiler, but the behavior in the first example is well-defined.
The second, on the other hand, is definitely undefined behavior. Since there are not enough arguments to match the number of format specifiers in the string, eventually when it finds the second %s, it will try to take the next variadic argument, but the issue is that you haven't passed any. When this happens for me, it prints some garbage value in place of the format specifier that doesn't look too nice. Anything could happen in undefined behavior though. In this case, the function seems to try to take the next variadic argument from a CPU register / the stack (memory) and fetches some garbage value that happened to be there (though again, anything could happen with undefined behavior).
So in short:
printf("%s\n", "Hello", "World");
| | ^^^^^^^ Ignored
-------
and
printf("%s\n"); ?
| |
----------
Does the following code have defined beavior:
#include <stdio.h>
int main() {
int x;
if (scanf("%x", &x) == 1) {
printf("decimal: %d\n", x);
}
return 0;
}
clang compiles it without any warnings even with all warnings enabled, including -pedantic. The C Standard seems unambiguous about this:
C17 7.21.6.2 The fscanf function
...
... the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
...
The conversion specifiers and their meanings are:
...
x Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
On two's complement architectures, converting -1 with %x seems to work, but it would not on ancient sign/magnitude or ones complement systems.
Is there any provision to make this behavior defined or at least implementation defined?
This falls in the category of behaviors which quality implementations should support unless they document a good reason for doing otherwise, but which the Standard does not mandate. The authors of the Standard seem to have refrained from trying to list all such behaviors, and there are at least three good reasons for that:
Doing so would have made the Standard longer, and spending ink describing obvious behaviors that readers would expect anyway would distract from the places where the Standard needed to call readers' attention to things that they might not otherwise expect.
The authors of the Standard may not have wanted to preclude the possibility that an implementation might have a good reason for doing something unusual. I don't know whether that was a consideration in your particular case, but it could have been.
Consider, for example, a (likely theoretical) environment whose calling convention that requires passing information about argument types fed to variadic functions, and that supplies a scanf function that validates those argument types and squawks if int* is passed to a %X argument. The authors of the Standard were almost certainly not aware of any such environment [I doubt any ever existed], and thus would be in no position to weigh the benefits of using the environment's scanf routine versus the benefits of supporting the common behavior. Thus, it would make sense to leave such judgment up to people who would be in a better position to assess the costs and benefits of each approach.
It would be extremely difficult for the authors of the Standard to ensure that they exhaustively enumerated all such cases without missing any, and the more exhaustively they were to attempt to enumerate such cases, the more likely it would be that accidental omissions would be misconstrued as deliberate.
In practice, some compiler writers seem to regard most situations where the Standard fails to mandate the behavior of some action as an invitation to assume code will never attempt it, even if all implementations prior to the Standard had behaved consistently and it's unlikely there would ever be any good reason for an implementation to do otherwise. Consequently, using %X to read an int falls in the category of behaviors that will be reliable on implementations that make any effort to be compatible with common idioms, but could fail on implementations whose designers place a higher value on being able to process useless programs more efficiently, or on implementations that are designed to squawk when given programs that could be undermined by such implementations.
I was reading through the C11 standard. As per the C11 standard undefined behavior is classified into four different types. The parenthesized numbers refer to the subclause of the C Standard (C11) that identifies the undefined behavior.
Example 1: The program attempts to modify a string literal (6.4.5). This undefined behavior is classified as: Undefined Behavior (information/confirmation needed)
Example 2 : An lvalue does not designate an object when evaluated (6.3.2.1). This undefined behavior is classified as: Critical Undefined Behavior
Example 3: An object has its stored value accessed other than by an lvalue of an allowable type (6.5). This undefined behavior is classified as: Bounded Undefined Behavior
Example 4: The string pointed to by the mode argument in a call to the fopen function does not exactly match one of the specified character sequences (7.21.5.3). This undefined behavior is classified as: Possible Conforming Language Extension
What is the meaning of the classifications? What do these classification convey to the programmer?
I only have access to a draft of the standard, but from what I’m reading, it seems like this classification of undefined behavior isn’t mandated by the standard and only matters from the perspective of compilers and environments that specifically indicate that they want to create C programs that can be more easily analyzed for different classes of errors. (These environments have to define a special symbol __STDC_ANALYZABLE__.)
It seems like the key idea here is an “out of bounds write,” which is defined as a write operation that modifies data that isn’t otherwise allocated as part of an object. For example, if you clobber the bytes of an existing variable accidentally, that’s not an out of bounds write, but if you jumped to a random region of memory and decorated it with your favorite bit pattern you’d be performing an out of bounds write.
A specific behavior is bounded undefined behavior if the result is undefined, but won’t ever do an out of bounds write. In other words, the behavior is undefined, but you won’t jump to a random address not associated with any objects or allocated space and put bytes there. A behavior is critical undefined behavior if you get undefined behavior that cannot promise that it won’t do an out-of-bounds write.
The standard then goes on to talk about what can lead to critical undefined behavior. By default undefined behaviors are bounded undefined behaviors, but there are exceptions for UB that result from memory errors like like accessing deallocated memory or using an uninitialized pointer, which have critical undefined behavior. Remember, though, that these classifications only exist and have meaning in the context of implementations of C that choose to specifically separate out these sorts of behaviors. Unless your C environment guarantees it’s analyzable, all undefined behaviors can potentially do absolutely anything!
My guess is that this is intended for environments like building drivers or kernel plugins where you’d like to be able to analyze a piece of code and say “well, if you're going to shoot someone in the foot, it had better be your foot that you’re shooting and not mine!” If you compile a C program with these constraints, the runtime environment can instrument the very few operations that are allowed to be critical undefined behavior and have those operations trap to the OS, and assume that all other undefined behaviors will at most destroy memory that’s specifically associated with the program itself.
All of these are cases where the behaviour is undefined, i.e. the standard "imposes no requirements". Traditionally, within undefined behaviour and considering one implementation (i.e. C compiler + C standard library), one could see two kinds of undefined behaviour:
constructs for which the behaviour would not be documented, or would be documented to cause a crash, or the behaviour would be erratic,
constructs that the standard left undefined but for which the implementation defines some useful behaviour.
Sometimes these can be controlled by compiler switches. E.g. example 1 usually always causes bad behaviour - a trap, or crash, or modifies a shared value. Earlier versions of GCC allowed one to have modifiable string literals with -fwritable-strings; therefore if that switch was given, the implementation defined the behaviour in that case.
C11 added an optional orthogonal classification: bounded undefined behaviour and critical undefined behaviour. Bounded undefined behaviour is that which does not perform an out-of-bounds store, i.e. it cannot cause values being written in arbitrary locations in memory. Any undefined behaviour that is not bounded undefined behaviour is critical undefined behaviour.
Iff __STDC_ANALYZABLE__ is defined, the implementation will conform to the appendix L, which has this definitive list of critical undefined behaviour:
An object is referred to outside of its lifetime (6.2.4).
A store is performed to an object that has two incompatible declarations (6.2.7),
A pointer is used to call a function whose type is not compatible with the referenced type (6.2.7, 6.3.2.3, 6.5.2.2).
An lvalue does not designate an object when evaluated (6.3.2.1).
The program attempts to modify a string literal (6.4.5).
The operand of the unary * operator has an invalid value (6.5.3.2).
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just
beyond the array object and is used as the operand of a unary *
operator that is evaluated (6.5.6).
An attempt is made to modify an object defined with a const-qualified type through use of an lvalue with
non-const-qualified type (6.7.3).
An argument to a function or macro defined in the standard library has an invalid value or a type not expected by a function
with variable number of arguments (7.1.4).
The longjmp function is called with a jmp_buf argument where the most recent invocation of the setjmp macro in the same invocation of
the program with the corresponding jmp_buf argument is nonexistent,
or the invocation was from another thread of execution, or the
function containing the invocation has terminated execution in the
interim, or the invocation was within the scope of an identifier with
variably modified type and execution has left that scope in the
interim (7.13.2.1).
The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3).
A string or wide string utility function accesses an array beyond the end of an object (7.24.1, 7.29.4).
For the bounded undefined behaviour, the standard imposes no requirements other than that an out-of-bounds write is not allowed to happen.
The example 1: modification of a string literal is also. classified as critical undefined behaviour. The example 4 is critical undefined behaviour too - the value is not one expected by the standard library.
For example 4, the standard hints that while the behaviour is undefined in case of mode that is not defined by the standard, there are implementations that might define behaviour for other flags. For example glibc supports many more mode flags, such as c, e, m and x, and allow setting the character encoding of the input with ,ccs=charset modifier (and putting the stream into wide mode right away).
Some programs are intended solely for use with input that is known to be valid, or at least come from trustworthy sources. Others are not. Certain kinds of optimizations which might be useful when processing only trusted data are stupid and dangerous when used with untrusted data. The authors of Annex L unfortunately wrote it excessively vaguely, but the clear intention is to allow compilers that they won't do certain kinds of "optimizations" that are stupid and dangerous when using data from untrustworthy sources.
Consider the function (assume "int" is 32 bits):
int32_t triplet_may_be_interesting(int32_t a, int32_t b, int32_t c)
{
return a*b > c;
}
invoked from the context:
#define SCALE_FACTOR 123456
int my_array[20000];
int32_t foo(uint16_t x, uint16_t y)
{
if (x < 20000)
my_array[x]++;
if (triplet_may_be_interesting(x, SCALE_FACTOR, y))
return examine_triplet(x, SCALE_FACTOR, y);
else
return 0;
}
When C89 was written, the most common way a 32-bit compiler would process that code would have been to do a 32-bit multiply and then do a signed comparison with y. A few optimizations are possible, however, especially if a compiler in-lines the function invocation:
On platforms where unsigned compares are faster than signed compares, a compiler could infer that since none of a, b, or c can be negative, the arithmetical value of a*b is non-negative, and it may thus use an unsigned compare instead of a signed comparison. This optimization would be allowable even if __STDC_ANALYZABLE__ is non-zero.
A compiler could likewise infer that if x is non-zero, the arithmetical value of x*123456 will be greater than every possible value of y, and if x is zero, then x*123456 won't be greater than any. It could thus replace the second if condition with simply if (x). This optimization is also allowable even if __STDC_ANALYzABLE__ is non-zero.
A compiler whose authors either intend it for use only with trusted data, or else wrongly believe that cleverness and stupidity are antonyms, could infer that since any value of x larger than 17395 will result in an integer overflow, x may be safely presumed to be 17395 or less. It could thus perform my_array[x]++; unconditionally. A compiler may not define __STDC_ANALYZABLE__ with a non-zero value if it would perform this optimization. It is this latter kind of optimization which Annex L is designed to address. If an implementation can guarantee that the effect of overflow will be limited to yielding a possibly-meaningless value, it may be cheaper and easier for code to deal with the possibly of the value being meaningless than to prevent the overflow. If overflow could instead cause objects to behave as though their values were corrupted by future computations, however, there would be no way a program could deal with things like overflow after the fact, even in cases where the result of the computation would end up being irrelevant.
In this example, if the effect of integer overflow would be limited to yielding a possibly-meaningless value, and if calling examine_triplet() unnecessarily would waste time but would otherwise be harmless, a compiler may be able to usefully optimize triplet_may_be_interesting in ways that would not be possible if it were written to avoid integer overflow at all costs. Aggressive
"optimization" will thus result in less efficient code than would be possible with a compiler that instead used its freedom to offer some loose behavioral guarantees.
Annex L would be much more useful if it allowed implementations to offer specific behavioral guarantees (e.g. overflow will yield a possibly-meaningless result, but have no other side-effects). No single set of guarantees would be optimal for all programs, but the amount of text Annex L spent on its impractical proposed trapping mechanism could have been better spent specifying macros to indicate what guarantees various implementations could offer.
According to cppreference :
Critical undefined behavior
Critical UB is undefined behavior that might perform a memory write or
a volatile memory read out of bounds of any object. A program that has
critical undefined behavior may be susceptible to security exploits.
Only the following undefined behaviors are critical:
access to an object outside of its lifetime (e.g. through a dangling pointer)
write to an object whose declarations are not compatible
function call through a function pointer whose type is not compatible with the type of the function it points to
lvalue expression is evaluated, but does not designate an object attempted modification of a string literal
dereferencing an invalid (null, indeterminate, etc) or past-the-end pointer
modification of a const object through a non-const pointer
call to a standard library function or macro with an invalid argument
call to a variadic standard library function with unexpected argument type (e.g. call to printf with an argument of the type that
doesn't match its conversion specifier)
longjmp where there is no setjmp up the calling scope, across threads, or from within the scope of a VM type.
any use of the pointer that was deallocated by free or realloc
any string or wide string library function accesses an array out of bounds
Bounded undefined behavior
Bounded UB is undefined behavior that cannot perform an illegal memory
write, although it may trap and may produce or store indeterminate
values.
All undefined behavior not listed as critical is bounded, including
multithreaded data races
use of a indeterminate values with automatic storage duration
strict aliasing violations
misaligned object access
signed integer overflow
unsequenced side-effects modify the same scalar or modify and read the same scalar
floating-to-integer or pointer-to-integer conversion overflow
bitwise shift by a negative or too large bit count
integer division by zero
use of a void expression
direct assignment or memcpy of inexactly-overlapped objects
restrict violations
etc.. ALL undefined behavior that's not in the critical list.
"I was reading through the C11 standard. As per the C11 standard undefined behavior is classified into four different types."
I wonder what you were actually reading. The 2011 ISO C standard does not mention these four different classifications of undefined behavior. In fact it's quite explicit in not making any distinctions among different kinds of undefined behavior.
Here's ISO C11 section 4 paragraph 2:
If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe "behavior
that is undefined".
All the examples you cite are undefined behavior, which, as far as the Standard is concerned, means nothing more or less than:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
If you have some other reference, that discusses different kinds of undefined behavior, please update your question to cite it. Your question would then be about what that document means by its classification system, not (just) about the ISO C standard.
Some of the wording in your question appears similar to some of the information in C11 Annex L, "Analyzability" (which is optional for conforming C11 implementations), but your first example refers to "Undefined Behavior (information/confirmation needed)", and the word "confirmation" appears nowhere in the ISO C standard.
I've been playing with pointers and accidentally typed the wrong argument to printf
#include <stdio.h>
int
main (void)
{
double * p1;
double * p2;
double d1, d2;
d1 = 1.2345;
d2 = 2.3456;
p1 = &d1;
p2 = &d2;
printf ("p1=%p\n, *p1=%g\n", (void *)p1, *p1);
printf ("p2=%p\n, *p2=%g\n", (void *)p2, p2); /* third argument should be *p2 */
return 0;
}
The output was
warning: format ‘%g’ expects argument of type ‘double’, but argument 3 has type ‘double *’
p1=0x7ffc9aec46b8, *p1=1.2345
p2=0x7ffc9aec46c0, *p2=1.2345
Why in this case the output of p2 is always equal to the output of *p1?
I use gcc (v5.4.0) compiler with its default standard for C (gnu11).
Code that invokes undefined behavior can do anything -- that's why it's undefined.
That said, one could make a good guess at why it happens to do this particular thing on your particular machine using your specific compiler with exactly the options you used and compiled on the same weekday of a year with a 6, you get the point, right? It's undefined, and there is no explanation that you can rely on even if you think you know all the variables. One day, the humidity drops, or something, and your program could decide to do something different. Even without recompiling. Even in two iterations of the same loop. That's just what undefined behavior is.
Anyway, on your platform floating-point arguments are probably passed in dedicated floating-point registers (or a dedicated floating-point stack) rather than on the main stack. printf("%g") expects a floating-point argument, so it looks in a floating-point register. But you didn't pass anything in a floating-point register; all you passed were two pointer arguments, which both went on the stack (or wherever pointer arguments go; this is also outside the scope of the C standard). So the second printf call gets whatever garbage was in that particular floating point register last time it was loaded. It just so happens that the last thing you loaded into that register was the value of *p1, in the last printf call, so that value gets reused.
The rules that determine (among other things) where function arguments are placed so the function knows where to look for them are collectively called a calling convention. You're probably using an x86 or derivative, so you might find the Wikipedia page on x86 calling conventions interesting. But if you want to know specifically what your compiler is doing, ask it to emit assembly language (gcc -S).
It's not defined -- that's the whole point. What you're seeing is likely the result of the old value remaining in whatever register is used for passing a floating-point argument.
At language level there's usually little value in this kind of research.
But one possible practical scenario might look as follows:
The compiler uses different passing conventions (memory areas, stacks, registers) to pass different types of arguments. Pointers are passed in one way (say, CPU stack), while double values are passed in a different way (say, FPU register stack). You passed a pointer, but told printf that it was a double. printf went into the area for passing doubles (e.g., top of FPU register stack) and read the "garbage" value that was left over there by the previous printf call.
undefined behavior - there are no restrictions on the behavior of the
program. Examples of undefined behavior are memory accesses outside of
array bounds, signed integer overflow, null pointer dereference,
modification of the same scalar more than once in an expression
without sequence points, access to an object through a pointer of a
different type, etc. Compilers are not required to diagnose undefined
behavior (although many simple situations are diagnosed), and the
compiled program is not required to do anything meaningful.1
Undefined behavior doesn't mean random behavior, but "not covered by the standard" behavior. So it may be anything the implementator chooses to do with it.
The standard specifies UBs because it allows for compilation optimizations that might not be possible otherwise.
Other answers have covered what undefined behaviour is. Here's an interesting article that describes why there is so much undefined behaviour in C, and what the benefits may be. It's not because K&R were lazy or didn't care.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
In short, undefined behaviour and implementation-defined behaviour can open opportunities for optimisation, and for more efficient implementation on different platforms.
The C language was in wide use well before the C89 Standard was published, and the authors of the Standard did not want to require that conforming compilers not be able to do everything existing compilers could do as efficiently as they were already doing it. If mandating that all compilers implement some behavior would make some compiler somewhere less suitable for its task, that would be justification for leaving the behavior undefined. Even if the behavior was useful, and in common use, on 99% of platforms, the authors of the Standard saw no reason to believe that leaving the behavior undefined should affect that. If compiler writers thought it was practical and useful to support a behavior in the days before any Standard mandated anything, there was no reason to expect that they'd need a mandate in order to maintain such support. Evidence of that view can be found in the rationale about promoting short unsigned integer types to signed.
Somehow a bizarre view has taken hold that everything must either be mandated by the Standard or unpredictable. The Standard describes common consequences of Undefined Behavior, and one of the most common in 1989 was that the implementation will behave in a documented fashion characteristic to the implementation.
If your implementation specifies the means via which floating-point values are passed to variadic functions, and if the method it uses is to create a temporary and pass its address, then the behavior of your code might be defined on that particular implementation, in which case it would hardly be surprising that it works as it does. If the implementation handles arguments that way but doesn't document them well enough to guarantee the behavior, it should hardly be surprising that the implementation's behavior isn't affected by the lack of documentation.
I ran into some code I couldn't find an answer to on Google or SO. I am looking at a thread function which returns void* as you could expect. However, before the thread function ends it suddenly pulls this stunt,
return (void*) 0;
What is the purpose of that? I can't make any sense of it.
edit:
After understanding this is the same as NULL-- it is my thought they used this to skip including stdlib.
(void*)0 is the null pointer, a.k.a. NULL (which actually is a macro defined in several header files, e.g. stddef.h or stdio.h, that basically amounts to the same thing as (void*)0).
Update:
How to explain null pointers and their usefulness? Basically, it's a special value that says, "This pointer doesn't point anywhere," or, "This pointer is not set to a valid object reference."
Historical note: Tony Hoare, who is said to have invented null references in 1965, is known to regret that invention and thus calls it his "Billion Dollar Mistake":
Whenever you work with pointers, you must make sure to never dereference a null pointer (because it doesn't reference anything by definition). If you do it anyway, you'll either get abnormal program termination, a general protection fault, or unexpected program behaviour at the very least.
Well, I have not encountered any C++ compiler saying NULL or 0 cannot be converted to void* (or to/from int*, for example). But there might be some smart compilers or static-analysis tools that would report 0 to void-pointer conversion as a warning.
That statement is commonly found in callback implementation (like a thread-routine), which must adhere to prototype of callback being demanded (pthread_create, CreateThread etc). Therefore, when you implement that function, you must return the same type it was demanded for. For pthread_create routine, you must return a void* - and that's why return (void*)0; is there.