What is the expected output of initializing a variable to itself? [duplicate] - c

I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?

Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.

This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.

I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.

Related

What is the difference between directly accessing a variable by name and accessing a variable by using *(&variable) in C?

Suppose we declare a variable
int i = 10;
And we have these 2 statements-
printf("%d",i);
printf("%d",*(&i));
These two statements print the same value, i.e. 10
From my understanding of pointers, not just their output is same, but the above two statements mean exactly the same. They are just two different ways of writing the same statement.
However, I came across an interesting code-
#include <stdio.h>
int main(){
const int i = 10;
int* pt1 = &i;
*pt1 = 20;
printf("%d\n", i);
printf("%d\n", *(&i));
return 0;
}
To my surprise, the result is-
10
20
This suggests that i and *(&i) don't mean the same if i is declared with the const qualifier. Can anyone explain?
The behavior of *pt1 = 20; is not defined by the C standard, because pt1 has been improperly set to point to the const int i. C 2018 6.7.3 says:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined…
Because of this, the behavior of the entire program is not defined by the C standard.
In C code with defined behavior, *(&i) is defined to produce the value of i. However, in code with behavior not defined by the C standard, the normal rules that would apply to *(&i) are canceled. It could produce the value that the const i was initialized to, it could produce the value the program attempted to change i to, it could produce some other value, it could cause the program to crash, or it could cause other behavior.
Think of what it means to declare something as const - you're telling the compiler you don't want the value of i to change, and that it should flag any code that has an expression like i = 20.
However, you're going behind the compiler's back and trying to change the value of i indirectly through pt1. The behavior of this action is undefined - the language standard places no requirements on the compiler or the runtime environment to handle this situation in any particular way. The code is erroneous and you should not expect a meaningful result.
One possible explanation (of many) for this result is that the i in the first printf statement was replaced with the constant 10. After all, you promised that the value of i would never change, so the compiler is allowed to make that optimization.
But since you take the address of i, storage must be allocated for it somewhere, and in this case it apparently allocated storage in a writable segment (const does not mean "put this in read-only memory"), so the update through pt1 was successful. The compiler actually evaluates the expression *(&i) to retrieve the current value of i, rather than replacing it with a constant. On a different platform, or in a different program, or in a different build of the same program, you may get a different result.
The moral of this story is "don't do that". Don't attempt to modify a const object through a non-const expression.

Different ways of suppressing 'uninitialized variable warnings' in C

I have encountered multiple uses of the uninitialized_var() macro designed to get rid of warnings like:
warning: ‘ptr’ is used uninitialized in this function [-Wuninitialized]
For GCC (<linux/compiler-gcc.h>) it is defined such a way:
/*
* A trick to suppress uninitialized variable warning without generating any
* code
*/
#define uninitialized_var(x) x = x
But also I discovered that <linux/compiler-clang.h> has the same macro defined in a different way:
#define uninitialized_var(x) x = *(&(x))
Why we have two different definitions? For what reason the first way may be insufficient? Is the first way insufficient just for Clang or in some other cases too?
c example:
#define uninitialized_var(x) x = x
struct some {
int a;
char b;
};
int main(void) {
struct some *ptr;
struct some *uninitialized_var(ptr2);
if (1)
printf("%d %d\n", ptr->a, ptr2->a); // warning about ptr, not ptr2
}
Compilers are made to recognize certain constructs as indications that the author intended something deliberately, when the compiler would otherwise warn about it. For example, given if (b = a), GCC and Clang both warn that an assignment is being used as a conditional, but they do not warn about if ((b = a)) even though it is equivalent in terms of the C standard. This particular construct with extra parentheses has simply been set as a way to tell the compiler the author truly intends this code.
Similarly, x = x has been set as a way to tell GCC not to warn about x being uninitialized. There are times where a function may appear to have a code path in which an object is used without being initialized, but the author knows the function is intended not to be used with parameters that would ever cause that particular code path to be executed and, for reasons of efficiency, they want to silence the compiler warning rather than add an initialization that is not actually necessary for program correctness.
Clang was presumably designed not to recognize GCC’s idiom for this and needed a different method.
Why we have two different definitions?
Unclear, but I speculate that it's because Clang still produces a warning for x = x when x is uninitialized, but not for x = *(&(x)). Under almost every circumstance* in which one of those expressions has well-defined behavior, the other has the same well-defined behavior. Under other circumstances, such as when the value of x is undefined or indeterminate, both have undefined behavior, or the behavior of x = x is defined and that of x = *(&(x)) undefined, so the latter provides no advantage.
For what reason the first way may be insufficient?
Because the behavior of both is undefined in the use cases for which they seem to be intended. It is not at all surprising, then, that different compilers handle them differently.
Is the first way insufficient just for Clang or in some other cases too?
Both expressions' meaning and behavior is undefined. In one sense, then, one cannot safely conclude that either is sufficient for anything. In the empirical sense of whether using one or the other fools certain compilers into not emitting warnings that they otherwise would, and still should, emit, it is likely that there were, are, and / or will be compilers that handle the undefined behavior associated with both of those expressions differently than GCC and Clang do.
* The exception being when x is declared with register storage class, in which case the second expression has undefined behavior regardless of whether x has a well-defined value.

Explain the surprising behavior of shadowed variable initialization in c/++ [duplicate]

I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.

GCC: why constant variables not placed in .rodata

I've always believed that GCC would place a static const variable to .rodata segments (or to .text segments for optimizations) of an ELF or such file. But it seems not that case.
I'm currently using gcc (GCC) 4.7.0 20120505 (prerelease) on a laptop with GNU/Linux. And it does place a static constant variable to .bss segment:
/*
* this is a.c, and in its generated asm file a.s, the following line gives:
* .comm a,4,4
* which would place variable a in .bss but not .rodata(or .text)
*/
static const int a;
int main()
{
int *p = (int*)&a;
*p = 0; /* since a is in .data, write access to that region */
/* won't trigger an exception */
return 0;
}
So, is this a bug or a feature? I've decided to file this as a bug to bugzilla but it might be better to ask for help first.
Are there any reasons that GCC can't place a const variable in .rodata?
UPDATED:
As tested, a constant variable with an explicit initialization(like const int a = 0;) would be placed into .rodata by GCC, while I left the variable uninitialized. Thus this question might be closed later -- I didn't present a correct question maybe.
Also, in my previous words I wrote that the variable a is placed in '.data' section, which is incorrect. It's actually placed into .bss section since not initialized. Text above now is corrected.
The compiler has made it a common, which can be merged with other compatible symbols, and which can go in bss (taking no space on disk) if it ends up with no explicitly initialized definition. Putting it in rodata would be a trade-off; you'd save memory (commit charge) at runtime, but would use more space on disk (potentially a lot for a huge array).
If you'd rather it go in rodata, use the -fno-common option to GCC.
Why GCC does it? Can't really answer that question without asking the developers themselves. If I'm allowed to speculate, I'd wager it has to do with optimization--compilers don't have to enforce const.
That said, I think it's better if we look at the language itself, particularly undefined behavior. There are a few mentions of undefined behavior, but none of them go in-depth.
Modifying a constant is undefined behavior. Const is a contract, and that is especially true in C (and C++).
"But what if I const_cast away the const and modify y anyway?" Then you have undefined behavior.
What undefined behavior means is that the compiler is allowed to do quite literally anything it wants, and whatever the compiler decides to do will not be considered a violation of the ISO 9899 standard.
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
ISO/IEC 9899:1999, §3.4.3
What this means is that, because you have invoked undefined behavior, anything the compiler does is technically correct by way of not being incorrect. Ergo, it is correct for GCC to take...
static const int a = 0;
...and turn it into a .rodata symbol, while taking...
static const int a; // guaranteed to be zero
...and turning it into a .bss symbol.
In the former case, any attempt to modify a--even by proxy--will typically result in a segmentation violation, causing the kernel to force-kill the running program. In the latter case, the program will probably run without crashing.
That said, it is not reasonable to guess which one the compiler will do. Const is a contract, and it is up to you, the programmer, to uphold that contract by not modifying data that is supposed to be constant. Violating that contract means undefined behavior, and all the portability issues and program bugs that come with it.
So GCC can do a couple things.
It might write the symbol to .rodata, giving it protection under the OS kernel
It might write the object to somewhere where memory protection is not guaranteed, in which case...
It might change the value
It might change the value and immediately change it back
It might completely delete the offending code under the rationale that the value isn't changing (0 -> 0), essentially optimizing...
int main(){
int *p = &a;
*p = 0;
return 0;
}
...to...
int main(void){
return 0;
}
It might even send a model T-800 back in time to terminate your parents before you're born.
All of these behaviors are legal (well, legal in the sense of adhering to the standard), so the bug report was not warranted.
writing to an object that has been declared const qualified is undefined behavior: anything can happen, even that.
There is no way in C to declare the object itself to be unmutable, you only forbid it to be mutable through the particular access that you have to it. Here you have an int*, so modification is "allowed" in the sense that the compiler is not forced to issue a diagnostic. Doing a cast in C means that you suppose to know what you are doing.
Are there any reasons that GCC can't place a const variable in .rodata?
Your program is optimized by the compiler (even in -O0 some optimizations are done). Constant propagation is done: http://en.wikipedia.org/wiki/Constant_folding
Try to deceive the compiler like this (note that this program is still technically undefined behavior):
#include <stdio.h>
static const int a;
int main(void)
{
*(int *) &a = printf(""); // compiler cannot assume it is 0
printf("%d\n", a);
return 0;
}

How undefined is undefined behavior?

I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?
I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.
Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.
Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.
If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour
When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)
It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.
I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.
Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.

Resources