#include<stdio.h>
int main()
{
int i = 10;
printf("0 i %d %p\n",i,&i);
if (i == 10)
goto f;
{
int i = 20;
printf("1 i %d\n",i);
}
{
int i = 30;
f:
printf("2 i %d %p\n",i,&i); //statement X
}
return 0;
}
Output:
[test]$ ./a.out
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
I have difficulty in understanding how statement X works?? As you see the output it is junk. It should rather say i not declared??
That's because goto skips the shadowing variable i's initialization.
This is one of the minor nuances of the differences between C and C++. In strict C++ go to crossing variable initialization is an error, while in C it's not. GCC also confirms this, when you compile with -std=c11 it allows while with std=c++11 it complains: jump to label 'f' crosses initialization of 'int i'.
From C99:
A goto statement shall not jump from outside the scope of an identifier having a variably modified type to inside the scope of that identifier.
VLAs are of variably modified type. Jumps inside a scope not containing VM types are allowed.
From C++11 (emphasis mine):
A program that jumps from a point where a variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the preceding types and is declared without an initializer.
From the output, it is clear that the address of 'i's are unique, since they are declared in different scopes.
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
how statement X works?? As you see the output it is junk. It should
rather say I not declared??
i is also declared in the local scope of statement x but the initialization of i to 30 is skipped because of goto statement. Therefore the local variable i contains a garbage value.
In the first printf statement, you accessed the i in address 0xbfbeaea8 which was declared and initialized in the statement int i = 10;
Once you hit the goto f; statement, you are in the scope of the 2nd i, which is declared at this point and resides in address 0xbfbeaea4 but which is not initialized as you skipped the initialization statement.
That's why you were getting rubbish.
When control reaches the third block, i is declared for the compiler, hence i represents some memory address therefore compiler tries to read it again. But since i has now become out-of-scope, you cannot be sure that it will contain the same value what it originally had.
My suggestion to understand somewhat complex code is to strip out, one by one, all "unnecessary" code and leave the bare problem. How do you know what's unnecessary? Initially, when you're not fluent with the language, you'll be removing parts of the code at random, but very quickly you'll learn what's necessary and what is not.
Give it a try: my hint is to start removing or commenting out the "goto" statement. Recompile and, if there are no errors, see what changed when you run the program again.
Another suggestion would be: try to recreate the problem "from scratch": imagine you are working on a top-secret project and you cannot show any single line of code to anyone, let alone post on Stack Overflow. Now, try to replicate the problem by rewriting equivalent source code, that would show the same behaviour.
As they say, "asking the right question is often solving half the problem".
The i you print in this printf("2 i %d %p\n",i,&i); statement, is not the i which value was 10 in if statement, and as you skip this int i = 30; statement with goto you print garbage. This int i = 30; is actual definition of the i that would be printed, i.e. where compiler allocates room and value of i.
The problem is that your goto is skipping the assignment to the second i, which shadows (conceals) the first i whose value you've set, so you're printing out an uninitialized variable.
You'll get a similar wrong answer from this:
#include<stdio.h>
int main()
{
int i = 10; /* First "i" */
printf("0 i %d %p\n",i,&i);
{ /* New block scope */
int i; /* Second "i" shadows first "i" */
printf("2 i %d %p\n",i,&i);
}
return 0;
}
Three lessons: don't shadow variables; don't create blocks ({ ... }) for no reason; and turn on compiler warnings.
Just to clarify: variable scope is a compile-time concept based on where variables are declared, not something that is subject to what happens at runtime. The declaration of i#2 conceals i#1 inside the block that i#2 is declared in. It doesn't matter if the runtime control path jumps into the middle of the block — i#2 is the i that will be used and i#1 is hidden (shadowed). Runtime control flow doesn't carry scope around in a satchel.
Related
I'm new to C. I cannot understand the result in following code. I use goto and jump the declaration of int a[N] array and int x. Although x is not initilized to 10, I can still get access to these variables.
#include <stdlib.h>
#include <stdio.h>
#define N 4
void printArray(int a[], int length) {
for (int i = 0; i < length; i++) {
printf("%d, ", a[i]);
}
printf("\n");
}
int main(void) {
goto done;
int a[N];
int x=10;
printf("x=%d\n", x);
done:
for (int i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
result
0, 1, 2, 3
x=0
My question:
Why I can get access to these variables whose declarations have been jumped? How are variables declared in C? It seems that variable declarations are not run line by line.
Aside from "Variable Length Arrays (VLAs)", an automatic variable's "lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." (§6.2.4) The initialization (if any) occurs a bit later, when the program passes through the declaration. It's legal to jump over the declaration if the program does not depend on the initialization of the variable. But regardless of whether the initialization will eventually happen, the variable exists from the moment your program enters the block, however that is done. (Jumping into a block from outside is also legal, and may also prevent initialization. But as soon as you're in the block, the variable exists.)
If the program attempts to read the value of an uninitialised variable, it receives an indeterminate value. (Most compilers attempt to detect the possibility that this might happen, but you'll need to enable warnings in order to see the report.)
The consequence of "hoisting" the lifetime of a variable to its enclosing block is that there is a portion of the program in which the variable exists but is not visible (since its scope starts where it is defined.) If you save the address of the variable and then jump back into this region of the program, you will be able to access the value of the variable through the pointer, which shows that scope and lifetime are distinct.
If the variable is a VLA, then it's lifetime starts at the declaration and the compiler will not allow you to jump over the declaration. VLAs cannot be initialised, so you must assign a value to every element in a VLA which you intend to access. Not all compilers implement VLAs. Your example does not show a VLA, since N is a macro which expands to an integer constant.
For objective info about the C standard, see rici's answer.
But I see you are asking how this behavior is possible from a C program, and remarking that:
It seems that variable declarifications are not run line by line.
In fact, most computer languages are not run line by line. There is almost always some kind of multi-line parsing step that happens beforehand. For example, even the Bash shell language is processed multiple lines at a time. When Bash finds a while loop, it seems to do extra parsing to make sure the done command is found before it runs any of the code in the while loop. If you don't believe me, try running this Bash script:
while [ 1 ]
do
echo hi
sleep 1
# file terminates before the expected "done" command that ends the loop
Similarly, a C compiler is capable of parsing your code, checking its syntax, and checking that your program is well-formed before it even executes a single line of your code inside any of your functions. When the compiler sees that you are using a or x, it knows what those things are because you declared them above, even if the execution of the program never passed through those lines.
One way that you could imagine that a compiler might work is that (after checking the validity of your code) it moves all the local variable declarations to the top of the function. So your main function might look like this in the internal representation in the compiler:
int main(void) {
int a[N];
int x;
int i;
goto done;
x = 10;
printf("x=%d\n", x);
done:
for (i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
This could actually be a useful transformation that would help the compiler generate code for your function because it would make it easy for the compiler to allocate memory on the stack for all your local variables when the function starts.
I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.
I unknowingly named a variable twice but with different data type. It missed the compilation error as one is in main() and other is in while() loop of main().
So I made a code like this.
#include <stdio.h>
int main()
{
int t;
scanf("%d",&t);
while(t>0)
{
double t;
scanf("%lf",&t);
printf("%lf\n",t);
t--;
}
return 0;
}
And here I notice that the program never ends! For any input value of double t either a negative or positive or zero, while() loop never gets terminated.
Can anyone care to explain why is this happening? How does the while loop get terminated there?
You said it yourself. You have two variables named t with different scope.
The double t you declared has the scope of the block that executes inside of the while loop. The conditional in the while loop uses the t that is declared in the scope surrounding the while loop (the int t) which is never modified (because the loop hides t and modifies the double t) and so it never reaches 0.
Here are some points about scope of variables in C:
blocks inherit all global variables
variables declared inside a block are only valid inside the block
blocks can be nested
a nested block inherits the variables from the outer block,
variables may be declared in a nested block to hide variables in the outer block (such as in your case)
EDIT
As #pmg suggested, here are a couple of extra points:
a hidden variable cannot be accessed at all, except through use of pointers created when it was still visible
although strictly speaking, hiding variables is not an error, it's almost never a good thing and most compilers will issue warnings when a variable is hidden!
Can anyone care to explain why is this happening?
There are two variables in your program called t. One is an int, at the scope of main, and one is a double, in the scope of the while loop. Inside the while loop the double t is hiding the int t, and your scanf is setting the double t.
How does the while loop get terminated there?
The program can't be terminated from inside the while loop as written.
while(t>0)//int t;
{
{
double t;
scanf("%lf",&t);
printf("%lf\n",t);
}
t--;
}
When the statement t-- is executed, it refers to the (double t) variable declared earlier in the same block, so your int t in the outer block scope, which is shadowed by the double t, never gets decremented.
Suppose I have a function that declares and initializes two local variables – which by default have the storage duration auto. This function then calls a second function, to which it passes the addresses of these two local variables. Can this second function safely use these pointers?
A trivial programmatic example, to supplement that description:
#include <stdio.h>
int adder(int *a, int *b)
{
return *a + *b;
}
int main()
{
auto int a = 5; // `auto' is redundant; included for clarity
auto int b = 3;
// adder() gets the addresses of two auto variables! is this an issue?
int result = adder(&a, &b);
printf("5 + 3 = %d\n", result);
return 0;
}
This program works as expected, printing 5 + 3 = 8.
Usually, when I have questions about C, I turn to the standard, and this was no exception. Specifically, I checked ISO/IEC 9899, §6.2.4. It says there, in part:
4
An object whose identifier is declared with no linkage and without
the storage-class specifier static has automatic storage duration.
5
For such an object that does not have a variable length array type,
its lifetime extends from entry into the block with which it is
associated until execution of that block ends in any way. (Entering an
enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively,
a new instance of the object is created each time. The initial value
of the object is indeterminate. If an initialization is specified for
the object, it is performed each time the declaration is reached in
the execution of the block; otherwise, the value becomes indeterminate
each time the declaration is reached.
Reading this, I reason the following points:
Variables a and b have storage duration auto, which I've made explicit using the auto keyword.
Calling the adder() function corresponds to the parenthetical in clause 5, in the partial quote above. That is, entering the adder() function "suspends, but does not end," the execution of the current block (which is main()).
Since the main() block is not "end[ed] in any way," storage for a and b is guaranteed. Thus, accessing them using the addresses &a and &b, even inside adder(), should be safe.
My question, then, is: am I correct in this? Or am I just getting "lucky," and accessing memory locations that, by happenstance, have not been overwritten?
P.S. I was unable to find an exact answer to this question through either Google or SO's search. If you can, mark this as a duplicate and I'll delete it.
Yes, it is safe and basically your assumptions are correct. The lifetime of an automatic object is from the entry in the block where it has been declared until the block terminates.
(C99, 6.2.4p5) "For such an object [...] its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way.
Your reasoning is correct for your particular function call chain, and you have read and quoted the relevant portions of the standard. This is a perfectly valid use of pointers to local variables.
Where you have to be wary is if the function stores the pointer values in a structure that has a lifetime longer than its own call. Consider two functions, foo(), and bar():
int *g_ptr;
void bar (int *p) {
g_ptr = p;
}
void foo () {
int x = 10;
bar(&x);
}
int main () {
foo ();
/* ...do something with g_ptr? */
return 0;
}
In this case, the variable xs lifetime ends with foo() returns. However, the pointer to x has been stored in g_ptr by bar(). In this case, it was an error for foo() to pass a pointer to its local variable x to bar().
What this means is that in order to know whether or not it is valid to pass a pointer to a local variable to a function, you have to know what that function will do with it.
Those variables are allocated in the stack. As long as you do not return from the function that declared them, they remain valid.
As I'm not yet allowed to comment, I'd rather write another answer as amendment to jxh's answer above:
Please see my elaborate answer here for a similar question. This contains a real world example where the aliasing in the called function makes your code break even though it follows all the c-language rules.
Even though it is legal in the C-language I consider it as harmful to pass pointers to automatic variables in a function call. You never know (and often you don't want to know) what exactly the called function does with the passed values. When the called function establishes an alias, you get in big trouble.
I heard (probably from a teacher) that one should declare all variables on top of the program/function, and that declaring new ones among the statements could cause problems.
But then I was reading K&R and I came across this sentence: "Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function". He follows with an example:
if (n > 0){
int i;
for (i=0;i<n;i++)
...
}
I played a bit with the concept, and it works even with arrays. For example:
int main(){
int x = 0 ;
while (x<10){
if (x>5){
int y[x];
y[0] = 10;
printf("%d %d\n",y[0],y[4]);
}
x++;
}
}
So when exactly I am not allowed to declare variables? For example, what if my variable declaration is not right after the opening brace? Like here:
int main(){
int x = 10;
x++;
printf("%d\n",x);
int z = 6;
printf("%d\n",z);
}
Could this cause trouble depending on the program/machine?
I also often hear that putting variables at the top of the function is the best way to do things, but I strongly disagree. I prefer to confine variables to the smallest scope possible so they have less chance to be misused and so I have less stuff filling up my mental space in each line on the program.
While all versions of C allow lexical block scope, where you can declare the variables depends of the version of the C standard that you are targeting:
C99 onwards or C++
Modern C compilers such as gcc and clang support the C99 and C11 standards, which allow you to declare a variable anywhere a statement could go. The variable's scope starts from the point of the declaration to the end of the block (next closing brace).
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
int z = 42;
printf("%d", z); // z is in scope in this line
}
You can also declare variables inside for loop initializers. The variable will only exist only inside the loop.
for(int i=0; i<10; i++){
printf("%d", i);
}
ANSI C (C90)
If you are targeting the older ANSI C standard, then you are limited to declaring variables immediately after an opening brace1.
This doesn't mean you have to declare all your variables at the top of your functions though. In C you can put a brace-delimited block anywhere a statement could go (not just after things like if or for) and you can use this to introduce new variable scopes. The following is the ANSI C version of the previous C99 examples:
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
{
int z = 42;
printf("%d", z); // z is in scope in this line
}
}
{int i; for(i=0; i<10; i++){
printf("%d", i);
}}
1 Note that if you are using gcc you need to pass the --pedantic flag to make it actually enforce the C90 standard and complain that the variables are declared in the wrong place. If you just use -std=c90 it makes gcc accept a superset of C90 which also allows the more flexible C99 variable declarations.
missingno covers what ANSI C allows, but he doesn't address why your teachers told you to declare your variables at the top of your functions. Declaring variables in odd places can make your code harder to read, and that can cause bugs.
Take the following code as an example.
#include <stdio.h>
int main() {
int i, j;
i = 20;
j = 30;
printf("(1) i: %d, j: %d\n", i, j);
{
int i;
i = 88;
j = 99;
printf("(2) i: %d, j: %d\n", i, j);
}
printf("(3) i: %d, j: %d\n", i, j);
return 0;
}
As you can see, I've declared i twice. Well, to be more precise, I've declared two variables, both with the name i. You might think this would cause an error, but it doesn't, because the two i variables are in different scopes. You can see this more clearly when you look at the output of this function.
(1) i: 20, j: 30
(2) i: 88, j: 99
(3) i: 20, j: 99
First, we assign 20 and 30 to i and j respectively. Then, inside the curly braces, we assign 88 and 99. So, why then does the j keep its value, but i goes back to being 20 again? It's because of the two different i variables.
Between the inner set of curly braces the i variable with the value 20 is hidden and inaccessible, but since we have not declared a new j, we are still using the j from the outer scope. When we leave the inner set of curly braces, the i holding the value 88 goes away, and we again have access to the i with the value 20.
Sometimes this behavior is a good thing, other times, maybe not, but it should be clear that if you use this feature of C indiscriminately, you can really make your code confusing and hard to understand.
If your compiler allows it then its fine to declare anywhere you want. In fact the code is more readable (IMHO) when you declare the variable where you use instead of at the top of a function because it makes it easier to spot errors e.g. forgetting to initialize the variable or accidently hiding the variable.
A post shows the following code:
//C99
printf("%d", 17);
int z=42;
printf("%d", z);
//ANSI C
printf("%d", 17);
{
int z=42;
printf("%d", z);
}
and I think the implication is that these are equivalent. They are not. If int z is placed at the bottom of this code snippet, it causes a redefinition error against the first z definition but not against the second.
However, multiple lines of:
//C99
for(int i=0; i<10; i++){}
does work. Showing the subtlety of this C99 rule.
Personally, I passionately shun this C99 feature.
The argument that it narrows the scope of a variable is false, as shown by these examples. Under the new rule, you cannot safely declare a variable until you have scanned the entire block, whereas formerly you only needed to understand what was going on at the head of each block.
Internally all variables local to a function are allocated on a stack or inside CPU registers, and then the generated machine code swaps between the registers and the stack (called register spill), if compiler is bad or if CPU doesn't have enough registers to keep all the balls juggling in the air.
To allocate stuff on stack, CPU has two special registers, one called Stack Pointer (SP) and another -- Base Pointer (BP) or frame pointer (meaning the stack frame local to the current function scope). SP points inside the current location on a stack, while BP points to the working dataset (above it) and the function arguments (below it). When function is invoked, it pushes the BP of the caller/parent function onto the stack (pointed by SP), and sets the current SP as the new BP, then increases SP by the number of bytes spilled from registers onto stack, does computation, and on return, it restores its parent's BP, by poping it from the stack.
Generally, keeping your variables inside their own {}-scope could speedup compilation and improve the generated code by reducing the size of the graph the compiler has to walk to determine which variables are used where and how. In some cases (especially when goto is involved) compiler can miss the fact the variable wont be used anymore, unless you explicitly tell compiler its use scope. Compilers could have time/depth limit to search the program graph.
Compiler could place variables declared near each other to the same stack area, which means loading one will preload all other into cache. Same way, declaring variable register, could give compiler a hint that you want to avoid said variable being spilled on stack at all costs.
Strict C99 standard requires explicit { before declarations, while extensions introduced by C++ and GCC allow declaring vars further into the body, which complicates goto and case statements. C++ further allows declaring stuff inside for loop initialization, which is limited to the scope of the loop.
Last but not least, for another human being reading your code, it would be overwhelming when he sees the top of a function littered with half a hundred variables declarations, instead of them localized at their use places. It also makes easier to comment out their use.
TLDR: using {} to explicitly state variables scope can help both compiler and human reader.
With clang and gcc, I encountered major issues with the following.
gcc version 8.2.1 20181011
clang version 6.0.1
{
char f1[]="This_is_part1 This_is_part2";
char f2[64]; char f3[64];
sscanf(f1,"%s %s",f2,f3); //split part1 to f2, part2 to f3
}
neither compiler liked f1,f2 or f3, to be within the block. I had to relocate f1,f2,f3 to the function definition area.
the compiler did not mind the definition of an integer with the block.
As per the The C Programming Language By K&R -
In C, all variables must be declared before they are used, usually at the
beginning of the function before any executable statements.
Here you can see word usually it is not must..