Is 'printf("%n %d", &a, a)' well defined? - c

#include <stdio.h>
int main(void) {
int i = 0;
printf("abc %n %d", &i, i);
printf("\n%d\n", i);
}
When I executed that, I got the following result.
abc 0
4
I thought that this result is intended.
But when I execute next one, I got a different result.
int main(void) {
int i; // not initialize
printf("abc %n %d", &i, i);
printf("\n%d\n", i);
}
which produced:
abc 1
4
I don't know why the result of i in the first printf() is 1.
Even, I found more stranger behavior:
int main(void) {
int sdf;
printf("abc %n %d", &sdf, sdf);
printf("\n%d\n", sdf);
int i;
printf("abc %n %d", &i, i);
printf("\n%d\n", i);
}
with this output:
abc 1
4
abc Random_Value
4
The first one always shows 1 but others show random value (I think this is garbage value).
I think garbage value was intended but I didn't get why the first one has a different result.

The value of an uninitialized local integer variable of storage class auto, that is stored in the stack area, is undefined. For this reason, printing it, a random value is absolutely expected.
That's why, in your output
abc 1
4
abc Random_Value
4
the value 1 is garbage as well.
In fact it is the first location in the stack, and its value could be different changing system and/or compiler. My guess is that its value is 1 because it represents a "ghost argc" that, being main a very special function, is present even if the function is defined without parameters, and its value is 1.
Since argc represents the number of parameters used to call your program from command line (at least 1: the executable name) there's a way to verify this hypotesis: call your program in this way
executableName foo
This would make argc become 2, so the value shown by that first printf should become 2 as well.
Out of curiosity, I tested this hypotesis by compiling your second example on my machine (DevC++ with gcc compiler under W10 64bit OS). I confirmed two statements:
When undefined behavior occurs, changing environment leads to a change in the output
argc is present in the stack and affects the initial value of the uninitialized local variables
Output after executing uninitVars.exe
abc
0
abc
1
Output after executing uninitVars.exe dog
abc
0
abc
2
Output after executing uninitVars.exe dog cat
abc
0
abc
3
So it seems that the first four bytes of my stack are always set to 0 (are they the location of the return value?) and that the second one is actually argc, even if it is not explicitly defined in the main() prototype.
The second print, instead, shows different values because it is defined after two printf calls, and their execution write several bytes in the stack (some of them are addresses, and that explains why the value is always different, as the addresses of a process are virtual and always different).

The result is well defined. Address of i and i are passed to the printf. Then printf assigns the value to i. But because i was passed before it, printf will print the value of i before the call.
The latter snippets use uninitialised variable and this undetermined value will be printed first. The behaviour later will be the same as per snipped 1.
The uninitialised variable will have undetermined value. It can be also an UB if the the type of the variable has trap representation (which is not the case here) But because the executing environment is the same, it is very likely that you will see the same values. If you run it on other OS, other computer or will use other compiler they will be probably different.

Is printf(“%n %d”, &a, a) defined well in printf?
If a has type int (equivalently, signed int) and is not const-qualified then yes, otherwise, no. In the well-defined case, the value of a at the time of the call is printed, and after the call returns, a will have the value 0.
Perhaps the key points here are that the arguments to a function call are evaluated before the called function is entered, and they are passed by value. There is a sequence point between the evaluation of the arguments and execution of the first statement in the function body, so the fact that a is read by the calling function and written by printf does not present any particular issue.
[...] when I execute next one, I got a different result.
int main(void)
{
int i; // not initialize
printf("abc %n %d", &i, i);
printf("\n%d\n", i);
}
abc 1
4
I don't know why the result of 'i' in the first printf() is 1.
No one does. The value of an automatic variable that has neither been initialized nor assigned to is indeterminate. The same considerations apply here as in the title question: The expression i in printf's argument list is evaluated before the any part of printf executes, so the fact that printf will later assign a value to it does not affect the value it receives.

With the variable in question not being initialized, the behavior is at best unspecified and at worst undefined.
The local variables i and sdf are uninitialized, which means that their values are indeterminate. The formal definition is in section 3.19 of the C standard and is as follows:
3.19.2
1 indeterminate value
either an unspecified value or a trap representation
3.19.3
1 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on
which value is chosen in any instance
2 NOTE An unspecified value cannot be a trap representation.
3.19.4
1 trap representation
an object representation that need not represent a value of the object type
This basically means that the value is unpredictable. In fact simply reading an indeterminate value can in some cases lead to undefined behavior. This can happen if the indeterminate value happens to be a trap representation as defined as above.
It can also be undefined behavior if the variable in question never had its address taken, however that doesn't apply in this case because you did take the address. This behavior is documented in section 6.3.2.1p2:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
So assuming your implementation doesn't have trap representations, the values of sdf and i are unspecified, which means they could be any value, including 0 or 1. As an example, you get the values 1 and (some random value) for sdf and i. When I run the same code I get this:
abc 0
4
abc 0
4
And if I compile with -O3 which sets a higher optimization level, I get this:
abc 1446280512
4
abc 0
4
As you can see, running the came code as you that reads an unspecified value can have different results on different machines, or even on the same machine with different compiler settings.
There's nothing special about the value 0 that I got or the value 1 that you got. They're just as random as 1446280512.

First of all, you can not print the result of %n in the same line, i will keep its previous value. About the random value, while you have not initialized the i and sdf, they have a random value (most likely 0 but there is no guaranty). In C, global variables will have 0 value if they are not initialized, but the local variables will have uninitialized values (random).

Related

Reading two integers, but inputting only one results in an indeterminate value being printed

I've just started learning C and would need some help here:
#include <stdio.h>
int main (){
int x, y;
scanf("%d/%d", &x, &y);
printf("%d,%d", x, y);
return 0
}
If I run the above code and only input 1 integer ('12'), it will print ('12,16'). Can someone explain how did the '16' come about?
If input stream contains just one number then the second value is not read. You would know that if you checked the return value:
int numbersread = scanf("%d/%d", &x, &y);
Then the y variable is uninitialized and it contains some garbage value. In your case it's 16.
The scanf function will fail to find a / character in the input and will stop reading there. It will not assign any value to y, which will remain uninitialized and will thus contain an indeterminate value. In your case, this was 16, but it could have been any other value, and this value could change when executing the program multiple times. The value is typically something that was used by the same process and was left in that memory location, and is now being interpreted as an int.
If the address of y was not taken, printing its value would have been undefined behavior (C11 6.3.2.1-2):
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However, since the address of y was taken (&y), its value is simply indeterminate. The lvalue y "is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion".
It is a good practice to always check the return value of scanf - which contains the "number of receiving arguments successfully assigned" - which allows you to handle the case of incorrect input.
Acknowledgements: Credits to Andreas Wenzel and Eric Postpischil for the correction on the origin of the indeterminate value.

Is an optimized out variable allowed to hold a value out of its range? [duplicate]

If I have:
unsigned int x;
x -= x;
it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).
Two questions:
Is the behavior of this code indeed undefined?
(E.g. Might the code crash [or worse] on a compliant system?)
If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?
i.e. What is the advantage given by not defining the behavior here?
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Yes this behavior is undefined but for different reasons than most people are aware of.
First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.
What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.
Edit: The relevant phrase of the standard is 6.3.2.1p2:
If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.
And to make it clearer, the following code is legal under all circumstances:
unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
Here the addresses of a and b are taken, so their value is just
indeterminate.
Since unsigned char never has trap representations
that indeterminate value is just unspecified, any value of unsigned char could
happen.
At the end a must hold the value 0.
Edit2: a and b have unspecified values:
3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance
Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.
The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
during the first loop iteration, x is uninitialized by the time -x is evaluated.
-x has undefined behavior, so its value is whatever-is-convenient.
The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.
When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.
Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.
(6.2.6.1 on a late C11 draft says)
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)
(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)
The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)
3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.
So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.
Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).
If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.
Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.
Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?
Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.
For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:
In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].
The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.
[1]: C11 6.3.2.1:
If the lvalue designates an
object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior to use), the behavior
is undefined.
[2]: C11 6.2.6.1:
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such a representation is produced
by a side effect that modifies all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.50) Such a representation is called
a trap representation.
[3] C11:
3.19.2
indeterminate value
either an unspecified value or a trap representation
3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no
requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
3.19.4
trap representation
an object representation that need not represent a value of the object type
While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:
volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
uint16_t temp;
if (a)
temp = y;
else if (b)
temp = z;
return temp;
}
a compiler for a platform like the ARM where all instructions other than
loads and stores operate on 32-bit registers might reasonably process the
code in a fashion equivalent to:
volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
// Since x is never used past this point, and since the return value
// will need to be in r0, a compiler could map temp to r0
uint32_t temp;
if (a)
temp = y;
else if (b)
temp = z & 0xFFFF;
return temp;
}
If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

Why is the dereference of float pointer set to int variable's address printing as `0`?

I created the following code while playing with pointers -
#include <stdio.h>
int main()
{
float a=1000;
int *c=&a;
float *d=&a;
printf("\nValue of a is %f",a);
printf("\nValue of a is %f",*c);
printf("\nValue of a is %f",*d);
printf("\nValue of a is %f",*&*c);
printf("\nValue of a is %f\n",*&*d);
int b=2000;
int *e=&b;
float *f=&b;
printf("\nValue of b is %d",b);
printf("\nValue of b is %d",*e);
printf("\nValue of b is %d",*f); //Will produce 0 output
printf("\nValue of b is %d",*&*e);
printf("\nValue of b is %d\n",*&*f); //Will produce 0 output
float g=3000;
float *h=&g;
printf("\nValue of g is %f\n",*h);
}
Which has produced the output -
aalpanigrahi#aalpanigrahi-HP-Pavilion-g4-Notebook-PC:~/Desktop/C/Daily programs/pointers$ ./pointer004
Value of a is 1000.000000
Value of a is 1000.000000
Value of a is 1000.000000
Value of a is 1000.000000
Value of a is 1000.000000
Value of b is 2000
Value of b is 2000
Value of b is 0
Value of b is 2000
Value of b is 0
Value of g is 3000.000000
And For the second part where the variable b has been declared as integer and *e and *f are pointers , the value of b is printing out to be 0 in case of *f and *&*f (as shown in the code) but this has worked in the case above that where variable a has been declared as a floating point number.
Why is this happening ??
This issue is system depended.
In some platforms you will receive 0 and in some -1.
That is because you are printing the float as a int using the %d.
The float in most platform is 4 bytes (check it using sizeof(float)).
The binary value of the float number 2000 is 01000100111110100000000000000000 and it mark as a float so when you are trying to print it with %d it encounters undefined behavior.
Why is this happening ??
-shrugs- strange things happen when you try to fly planes backwards, with blindfolds and earplugs in. Thus, you should always learn to fly a plane by reading as much as possible before you jump into the cockpit.
Take off your blindfold and earplugs. Open your eyes, and get to reading a book. If you're already reading a book, get a new one, because this one's not working for you.
printf("\nValue of a is %f",*c); is undefined behaviour, because an attempt is made to print an expression which has type int (*c) using a format specifier which claims it's a double. C11/7.21.6.1p9 states this blatantly:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Note that I didn't imply any consequences, positive or negative. It's undefined behaviour, meaning we can't tell whether it'll "work" (whatever that means) or whether it'll "crash".
Thus is the nature of UB. You probably didn't notice this error, because it "works", but it's not required to... moving on, we come to the example you ask about:
printf("\nValue of b is %d",*f);
A similar situation applies here; *f is a float expression, and %d tells scanf to expect an int value. The behaviour is undefined, which happens to mean in this case, you see a confusing "0".
You can not define the undefined by claiming it shouldn't print 0. It's allowed to print 0, because you've allowed it to by causing undefined behaviour. Similarly, if you were to see a crash, it's allowed to crash because you've allowed it to crash by causing undefined behaviour.
Also of relevance is C11/6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
Hence, more often than not, if you're casting something you're likely making a mistake... a common mistake, involving not reading a book, for example... so... which book are you reading?

issue with assignment operator inside printf()

Here is the code
int main()
{
int x=15;
printf("%d %d %d %d",x=1,x<20,x*1,x>10);
return 0;
}
And output is 1 1 1 1
I was expecting 1 1 15 1 as output,
x*1 equals to 15 but here x*1 is 1 , Why ?
Using assignment operator or modifying value inside printf() results in undefined behaviour?
Your code produces undefined behavior. Function argument evaluations are not sequenced relative to each other. Which means that modifying access to x in x=1 is not sequenced with relation to other accesses, like the one in x*1. The behavior is undefined.
Once again, it is undefined not because you "used assignment operator or modifying value inside printf()", but because you made a modifying access to variable that was not sequenced with relation to other accesses to the same variable. This code
(x = 1) + x * 1
also has undefined behavior for the very same reason, even though there's no printf in it. Meanwhile, this code
int x, y;
printf("%d %d", x = 1, y = 5);
is perfectly fine, even though it "uses assignment operator or modifying value inside printf()".
Within a function call, the function parameters may be evaluated in any order.
Since one of the parameters modifies x and the others access it, the results are undefined.
The Standard states that;
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
It doesn't impose an order of evaluation on sub-expressions unless there's a sequence point between them, and rather than requiring some unspecified order of evaluation, it says that modifying an object twice produces undefined behaviour.

How does this program duplicate itself?

This code is from Hacker's Delight. It says this is the shortest such program in C and is 64 characters in length, but I don't understand it:
main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}
I tried to compile it. It compiles with 3 warnings and no error.
This program relies upon the assumptions that
return type of main is int
function's parameter type is int by default and
the argument a="main(a){printf(a,34,a=%c%s%c,34);}" will be evaluated first.
It will invoke undefined behavior. Order of evaluation of arguments of a function is not guaranteed in C.
Albeit, this program works as follows:
The assignment expression a="main(a){printf(a,34,a=%c%s%c,34);}" will assign the string "main(a){printf(a,34,a=%c%s%c,34);}" to a and the value of the assignment expression would be "main(a){printf(a,34,a=%c%s%c,34);}" too as per C standard --C11: 6.5.16
An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment [...]
Taking in mind the above semantic of assignment operator the program will be expanded as
main(a){
printf("main(a){printf(a,34,a=%c%s%c,34);}",34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);
}
ASCII 34 is ". Specifiers and its corresponding arguments:
%c ---> 34
%s ---> "main(a){printf(a,34,a=%c%s%c,34);}"
%c ---> 34
A better version would be
main(a){a="main(a){a=%c%s%c;printf(a,34,a,34);}";printf(a,34,a,34);}
It is 4 character longer but at least follows K&R C.
It relies on several quirks of the C language and (what I think is) undefined behavior.
First, it defines the main function. It is legal to declare a function without a return type or parameter types, and they will be presumed to be int. This is why the main(a){ part works.
Then, it calls printf with 4 parameters. Since it has no prototype, it is assumed to return int and accept int parameters (unless your compiler implicitly declares it otherwise, like Clang does).
The first parameter is presumed int and is argc at the beginning of the program. The second parameter is 34 (which is ASCII for the double-quote character). The third parameter is an assignment expression that assigns the format string to a and returns it. It relies on a pointer-to-int conversion, which is legal in C. The last parameter is another quote character in numeric form.
At runtime, the %c format specifiers are substituted with quotes, the %s is substituted with the format string, and you get the original source again.
As far as I know, the order of argument evaluation is undefined. This quine works because the assignment a="main(a){printf(a,34,a=%c%s%c,34);}" is evaluated before a is passed as the first parameter to printf, but as far as I know, there is no rule to enforce it. Additionally, this can't work on 64-bit platforms because the pointer-to-int conversion will truncate the pointer to a 32-bit value. As a matter of fact, even though I can see how it works on some platforms, it doesn't work on my computer with my compiler.
This works based on lots of quirks that C allows you to do, and some undefined behavior that happens to work in your favor. In order:
main(a) { ...
Types are assumed to be int if unspecified, so this is equivalent to:
int main(int a) { ...
Even though main is supposed to take either 0 or 2 arguments, and this is undefined behavior, this can be allowed as just ignoring the missing second argument.
Next, the body, which I will space out. Note that a is an int as per main:
printf(a,
34,
a = "main(a){printf(a,34,a=%c%s%c,34);}",
34);
The order of evaluation of arguments is undefined, but we're relying on the 3rd argument - the assignment - getting evaluated first. We're also relying on the undefined behavior of being able to assign a char * to an int. Also, note that 34 is the ASCII value of ". Thus, the intended impact of the program is:
int main(int a, char** ) {
printf("main(a){printf(a,34,a=%c%s%c,34);}",
'"',
"main(a){printf(a,34,a=%c%s%c,34);}",
'"');
return 0; // also left off
}
Which, when evaluated, produces:
main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}
which was the original program. Tada!
The program is supposed to print its own code. Note the similarity of the string literal to the overall program code. The idea is that the literal will be used as the printf() format string because its value is assigned to variable a (albeit in the argument list) and that it will also be passed as the string to print (because an assignment expression evaluates to the value that was assigned). The 34 is the ASCII code for the double quote character ("); using it avoids a format string containing escaped literal quotation mark characters.
The code relies on unspecified behavior in the form of the order of evaluation of the function arguments. If they are evaluated in argument list order then the program is likely to fail because the value of a would then be used as a pointer to the format string before the correct value was actually assigned to it.
Additionally, the type of a defaults to int, and there is no guarantee that int is wide enough to hold an object pointer without truncating it.
Furthermore, the C standard specifies only two permitted signatures for main(), and the signature used is not among them.
Moreover, the type of printf() inferred by the compiler in the absence of a prototype is incorrect. It is by no means guaranteed that the compiler will generate a calling sequence that works for it.

Resources