I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.
Related
Suppose we declare a variable
int i = 10;
And we have these 2 statements-
printf("%d",i);
printf("%d",*(&i));
These two statements print the same value, i.e. 10
From my understanding of pointers, not just their output is same, but the above two statements mean exactly the same. They are just two different ways of writing the same statement.
However, I came across an interesting code-
#include <stdio.h>
int main(){
const int i = 10;
int* pt1 = &i;
*pt1 = 20;
printf("%d\n", i);
printf("%d\n", *(&i));
return 0;
}
To my surprise, the result is-
10
20
This suggests that i and *(&i) don't mean the same if i is declared with the const qualifier. Can anyone explain?
The behavior of *pt1 = 20; is not defined by the C standard, because pt1 has been improperly set to point to the const int i. C 2018 6.7.3 says:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined…
Because of this, the behavior of the entire program is not defined by the C standard.
In C code with defined behavior, *(&i) is defined to produce the value of i. However, in code with behavior not defined by the C standard, the normal rules that would apply to *(&i) are canceled. It could produce the value that the const i was initialized to, it could produce the value the program attempted to change i to, it could produce some other value, it could cause the program to crash, or it could cause other behavior.
Think of what it means to declare something as const - you're telling the compiler you don't want the value of i to change, and that it should flag any code that has an expression like i = 20.
However, you're going behind the compiler's back and trying to change the value of i indirectly through pt1. The behavior of this action is undefined - the language standard places no requirements on the compiler or the runtime environment to handle this situation in any particular way. The code is erroneous and you should not expect a meaningful result.
One possible explanation (of many) for this result is that the i in the first printf statement was replaced with the constant 10. After all, you promised that the value of i would never change, so the compiler is allowed to make that optimization.
But since you take the address of i, storage must be allocated for it somewhere, and in this case it apparently allocated storage in a writable segment (const does not mean "put this in read-only memory"), so the update through pt1 was successful. The compiler actually evaluates the expression *(&i) to retrieve the current value of i, rather than replacing it with a constant. On a different platform, or in a different program, or in a different build of the same program, you may get a different result.
The moral of this story is "don't do that". Don't attempt to modify a const object through a non-const expression.
I noticed just now that the following code can be compiled with clang/gcc/clang++/g++, using c99, c11, c++11 standards.
int main(void) {
int i = i;
}
and even with -Wall -Wextra, none of the compilers even reports warnings.
By modifying the code to int i = i + 1; and with -Wall, they may report:
why.c:2:13: warning: variable 'i' is uninitialized when used within its own initialization [-Wuninitialized]
int i = i + 1;
~ ^
1 warning generated.
My questions:
Why is this even allowed by compilers?
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
Because i is uninitialized when use to initialize itself, it has an indeterminate value at that time. An indeterminate value can be either an unspecified value or a trap representation.
If your implementation supports padding bits in integer types and if the indeterminate value in question happens to be a trap representation, then using it results in undefined behavior.
If your implementation does not have padding in integers, then the value is simply unspecified and there is no undefined behavior.
EDIT:
To elaborate further, the behavior can still be undefined if i never has its address taken at some point. This is detailed in section 6.3.2.1p2 of the C11 standard:
If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is uninitialized
(not declared with an initializer and no assignment to it
has been performed prior to use), the behavior is undefined.
So if you never take the address of i, then you have undefined behavior. Otherwise, the statements above apply.
This is a warning, it's not related to the standard.
Warnings are heuristic with "optimistic" approach. The warning is issued only when the compiler is sure that it's going to be a problem. In cases like this you have better luck with clang or newest versions of gcc as stated in comments (see another related question of mine: why am I not getting an "used uninitialized" warning from gcc in this trivial example?).
anyway, in the first case:
int i = i;
does nothing, since i==i already. It is possible that the assignment is completely optimized out as it's useless. With compilers which don't "see" self-initialization as a problem you can do this without a warning:
int i = i;
printf("%d\n",i);
Whereas this triggers a warning all right:
int i;
printf("%d\n",i);
Still, it's bad enough not to be warned about this, since from now on i is seen as initialized.
In the second case:
int i = i + 1;
A computation between an uninitialized value and 1 must be performed. Undefined behaviour happens there.
I believe you are okay with getting the warning in case of
int i = i + 1;
as expected, however, you expect the warning to be displayed even in case of
int i = i;
also.
Why is this even allowed by compilers?
There is nothing inherently wrong with the statement. See the related discussions:
Why does the compiler allow initializing a variable with itself?
Why is initialization of a new variable by itself valid?
for more insight.
What does the C/C++ standards say about this? Specifically, what's the behavior of this? UB or implementation dependent?
This is undefined behavior, as the type int can have trap representation and you never have taken the address of the variable in discussion. So, technically, you'll face UB as soon as you try to use the (indeterminate) value stored in variable i.
You should turn on your compiler warnings. In gcc,
compile with -Winit-self to get a warning. in C.
For C++, -Winit-self is enabled with -Wall already.
I have encountered multiple uses of the uninitialized_var() macro designed to get rid of warnings like:
warning: ‘ptr’ is used uninitialized in this function [-Wuninitialized]
For GCC (<linux/compiler-gcc.h>) it is defined such a way:
/*
* A trick to suppress uninitialized variable warning without generating any
* code
*/
#define uninitialized_var(x) x = x
But also I discovered that <linux/compiler-clang.h> has the same macro defined in a different way:
#define uninitialized_var(x) x = *(&(x))
Why we have two different definitions? For what reason the first way may be insufficient? Is the first way insufficient just for Clang or in some other cases too?
c example:
#define uninitialized_var(x) x = x
struct some {
int a;
char b;
};
int main(void) {
struct some *ptr;
struct some *uninitialized_var(ptr2);
if (1)
printf("%d %d\n", ptr->a, ptr2->a); // warning about ptr, not ptr2
}
Compilers are made to recognize certain constructs as indications that the author intended something deliberately, when the compiler would otherwise warn about it. For example, given if (b = a), GCC and Clang both warn that an assignment is being used as a conditional, but they do not warn about if ((b = a)) even though it is equivalent in terms of the C standard. This particular construct with extra parentheses has simply been set as a way to tell the compiler the author truly intends this code.
Similarly, x = x has been set as a way to tell GCC not to warn about x being uninitialized. There are times where a function may appear to have a code path in which an object is used without being initialized, but the author knows the function is intended not to be used with parameters that would ever cause that particular code path to be executed and, for reasons of efficiency, they want to silence the compiler warning rather than add an initialization that is not actually necessary for program correctness.
Clang was presumably designed not to recognize GCC’s idiom for this and needed a different method.
Why we have two different definitions?
Unclear, but I speculate that it's because Clang still produces a warning for x = x when x is uninitialized, but not for x = *(&(x)). Under almost every circumstance* in which one of those expressions has well-defined behavior, the other has the same well-defined behavior. Under other circumstances, such as when the value of x is undefined or indeterminate, both have undefined behavior, or the behavior of x = x is defined and that of x = *(&(x)) undefined, so the latter provides no advantage.
For what reason the first way may be insufficient?
Because the behavior of both is undefined in the use cases for which they seem to be intended. It is not at all surprising, then, that different compilers handle them differently.
Is the first way insufficient just for Clang or in some other cases too?
Both expressions' meaning and behavior is undefined. In one sense, then, one cannot safely conclude that either is sufficient for anything. In the empirical sense of whether using one or the other fools certain compilers into not emitting warnings that they otherwise would, and still should, emit, it is likely that there were, are, and / or will be compilers that handle the undefined behavior associated with both of those expressions differently than GCC and Clang do.
* The exception being when x is declared with register storage class, in which case the second expression has undefined behavior regardless of whether x has a well-defined value.
Is there a reason this kind of declaration is wrong in C?
short foo()
{
short x,y,z;
y=24;
z = x + y;
return z;
}
The declarations are not wrong per se, but there are issues with the code:
In modern C, short foo(void) is preferred over short foo(). The former says the function takes no parameters. The latter leaves it flexible, which involves a number of issues that can allow bugs to occur.
In z = x + y;, x has not been given a value. The behavior of the code is then undefined. (This is due to a special rule that using an object with automatic storage duration that has neither been given a value nor had its address taken has behavior not defined by the C standard.)
The return value of z will be undefined behavior because you don't initialize the value of x variable in this function. The return value of this function will be inconsistent, sometimes it will be 24, sometimes it will be another value.
#include<stdio.h>
int main()
{
int i = 10;
printf("0 i %d %p\n",i,&i);
if (i == 10)
goto f;
{
int i = 20;
printf("1 i %d\n",i);
}
{
int i = 30;
f:
printf("2 i %d %p\n",i,&i); //statement X
}
return 0;
}
Output:
[test]$ ./a.out
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
I have difficulty in understanding how statement X works?? As you see the output it is junk. It should rather say i not declared??
That's because goto skips the shadowing variable i's initialization.
This is one of the minor nuances of the differences between C and C++. In strict C++ go to crossing variable initialization is an error, while in C it's not. GCC also confirms this, when you compile with -std=c11 it allows while with std=c++11 it complains: jump to label 'f' crosses initialization of 'int i'.
From C99:
A goto statement shall not jump from outside the scope of an identifier having a variably modified type to inside the scope of that identifier.
VLAs are of variably modified type. Jumps inside a scope not containing VM types are allowed.
From C++11 (emphasis mine):
A program that jumps from a point where a variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the preceding types and is declared without an initializer.
From the output, it is clear that the address of 'i's are unique, since they are declared in different scopes.
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
how statement X works?? As you see the output it is junk. It should
rather say I not declared??
i is also declared in the local scope of statement x but the initialization of i to 30 is skipped because of goto statement. Therefore the local variable i contains a garbage value.
In the first printf statement, you accessed the i in address 0xbfbeaea8 which was declared and initialized in the statement int i = 10;
Once you hit the goto f; statement, you are in the scope of the 2nd i, which is declared at this point and resides in address 0xbfbeaea4 but which is not initialized as you skipped the initialization statement.
That's why you were getting rubbish.
When control reaches the third block, i is declared for the compiler, hence i represents some memory address therefore compiler tries to read it again. But since i has now become out-of-scope, you cannot be sure that it will contain the same value what it originally had.
My suggestion to understand somewhat complex code is to strip out, one by one, all "unnecessary" code and leave the bare problem. How do you know what's unnecessary? Initially, when you're not fluent with the language, you'll be removing parts of the code at random, but very quickly you'll learn what's necessary and what is not.
Give it a try: my hint is to start removing or commenting out the "goto" statement. Recompile and, if there are no errors, see what changed when you run the program again.
Another suggestion would be: try to recreate the problem "from scratch": imagine you are working on a top-secret project and you cannot show any single line of code to anyone, let alone post on Stack Overflow. Now, try to replicate the problem by rewriting equivalent source code, that would show the same behaviour.
As they say, "asking the right question is often solving half the problem".
The i you print in this printf("2 i %d %p\n",i,&i); statement, is not the i which value was 10 in if statement, and as you skip this int i = 30; statement with goto you print garbage. This int i = 30; is actual definition of the i that would be printed, i.e. where compiler allocates room and value of i.
The problem is that your goto is skipping the assignment to the second i, which shadows (conceals) the first i whose value you've set, so you're printing out an uninitialized variable.
You'll get a similar wrong answer from this:
#include<stdio.h>
int main()
{
int i = 10; /* First "i" */
printf("0 i %d %p\n",i,&i);
{ /* New block scope */
int i; /* Second "i" shadows first "i" */
printf("2 i %d %p\n",i,&i);
}
return 0;
}
Three lessons: don't shadow variables; don't create blocks ({ ... }) for no reason; and turn on compiler warnings.
Just to clarify: variable scope is a compile-time concept based on where variables are declared, not something that is subject to what happens at runtime. The declaration of i#2 conceals i#1 inside the block that i#2 is declared in. It doesn't matter if the runtime control path jumps into the middle of the block — i#2 is the i that will be used and i#1 is hidden (shadowed). Runtime control flow doesn't carry scope around in a satchel.