I heard (probably from a teacher) that one should declare all variables on top of the program/function, and that declaring new ones among the statements could cause problems.
But then I was reading K&R and I came across this sentence: "Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function". He follows with an example:
if (n > 0){
int i;
for (i=0;i<n;i++)
...
}
I played a bit with the concept, and it works even with arrays. For example:
int main(){
int x = 0 ;
while (x<10){
if (x>5){
int y[x];
y[0] = 10;
printf("%d %d\n",y[0],y[4]);
}
x++;
}
}
So when exactly I am not allowed to declare variables? For example, what if my variable declaration is not right after the opening brace? Like here:
int main(){
int x = 10;
x++;
printf("%d\n",x);
int z = 6;
printf("%d\n",z);
}
Could this cause trouble depending on the program/machine?
I also often hear that putting variables at the top of the function is the best way to do things, but I strongly disagree. I prefer to confine variables to the smallest scope possible so they have less chance to be misused and so I have less stuff filling up my mental space in each line on the program.
While all versions of C allow lexical block scope, where you can declare the variables depends of the version of the C standard that you are targeting:
C99 onwards or C++
Modern C compilers such as gcc and clang support the C99 and C11 standards, which allow you to declare a variable anywhere a statement could go. The variable's scope starts from the point of the declaration to the end of the block (next closing brace).
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
int z = 42;
printf("%d", z); // z is in scope in this line
}
You can also declare variables inside for loop initializers. The variable will only exist only inside the loop.
for(int i=0; i<10; i++){
printf("%d", i);
}
ANSI C (C90)
If you are targeting the older ANSI C standard, then you are limited to declaring variables immediately after an opening brace1.
This doesn't mean you have to declare all your variables at the top of your functions though. In C you can put a brace-delimited block anywhere a statement could go (not just after things like if or for) and you can use this to introduce new variable scopes. The following is the ANSI C version of the previous C99 examples:
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
{
int z = 42;
printf("%d", z); // z is in scope in this line
}
}
{int i; for(i=0; i<10; i++){
printf("%d", i);
}}
1 Note that if you are using gcc you need to pass the --pedantic flag to make it actually enforce the C90 standard and complain that the variables are declared in the wrong place. If you just use -std=c90 it makes gcc accept a superset of C90 which also allows the more flexible C99 variable declarations.
missingno covers what ANSI C allows, but he doesn't address why your teachers told you to declare your variables at the top of your functions. Declaring variables in odd places can make your code harder to read, and that can cause bugs.
Take the following code as an example.
#include <stdio.h>
int main() {
int i, j;
i = 20;
j = 30;
printf("(1) i: %d, j: %d\n", i, j);
{
int i;
i = 88;
j = 99;
printf("(2) i: %d, j: %d\n", i, j);
}
printf("(3) i: %d, j: %d\n", i, j);
return 0;
}
As you can see, I've declared i twice. Well, to be more precise, I've declared two variables, both with the name i. You might think this would cause an error, but it doesn't, because the two i variables are in different scopes. You can see this more clearly when you look at the output of this function.
(1) i: 20, j: 30
(2) i: 88, j: 99
(3) i: 20, j: 99
First, we assign 20 and 30 to i and j respectively. Then, inside the curly braces, we assign 88 and 99. So, why then does the j keep its value, but i goes back to being 20 again? It's because of the two different i variables.
Between the inner set of curly braces the i variable with the value 20 is hidden and inaccessible, but since we have not declared a new j, we are still using the j from the outer scope. When we leave the inner set of curly braces, the i holding the value 88 goes away, and we again have access to the i with the value 20.
Sometimes this behavior is a good thing, other times, maybe not, but it should be clear that if you use this feature of C indiscriminately, you can really make your code confusing and hard to understand.
If your compiler allows it then its fine to declare anywhere you want. In fact the code is more readable (IMHO) when you declare the variable where you use instead of at the top of a function because it makes it easier to spot errors e.g. forgetting to initialize the variable or accidently hiding the variable.
A post shows the following code:
//C99
printf("%d", 17);
int z=42;
printf("%d", z);
//ANSI C
printf("%d", 17);
{
int z=42;
printf("%d", z);
}
and I think the implication is that these are equivalent. They are not. If int z is placed at the bottom of this code snippet, it causes a redefinition error against the first z definition but not against the second.
However, multiple lines of:
//C99
for(int i=0; i<10; i++){}
does work. Showing the subtlety of this C99 rule.
Personally, I passionately shun this C99 feature.
The argument that it narrows the scope of a variable is false, as shown by these examples. Under the new rule, you cannot safely declare a variable until you have scanned the entire block, whereas formerly you only needed to understand what was going on at the head of each block.
Internally all variables local to a function are allocated on a stack or inside CPU registers, and then the generated machine code swaps between the registers and the stack (called register spill), if compiler is bad or if CPU doesn't have enough registers to keep all the balls juggling in the air.
To allocate stuff on stack, CPU has two special registers, one called Stack Pointer (SP) and another -- Base Pointer (BP) or frame pointer (meaning the stack frame local to the current function scope). SP points inside the current location on a stack, while BP points to the working dataset (above it) and the function arguments (below it). When function is invoked, it pushes the BP of the caller/parent function onto the stack (pointed by SP), and sets the current SP as the new BP, then increases SP by the number of bytes spilled from registers onto stack, does computation, and on return, it restores its parent's BP, by poping it from the stack.
Generally, keeping your variables inside their own {}-scope could speedup compilation and improve the generated code by reducing the size of the graph the compiler has to walk to determine which variables are used where and how. In some cases (especially when goto is involved) compiler can miss the fact the variable wont be used anymore, unless you explicitly tell compiler its use scope. Compilers could have time/depth limit to search the program graph.
Compiler could place variables declared near each other to the same stack area, which means loading one will preload all other into cache. Same way, declaring variable register, could give compiler a hint that you want to avoid said variable being spilled on stack at all costs.
Strict C99 standard requires explicit { before declarations, while extensions introduced by C++ and GCC allow declaring vars further into the body, which complicates goto and case statements. C++ further allows declaring stuff inside for loop initialization, which is limited to the scope of the loop.
Last but not least, for another human being reading your code, it would be overwhelming when he sees the top of a function littered with half a hundred variables declarations, instead of them localized at their use places. It also makes easier to comment out their use.
TLDR: using {} to explicitly state variables scope can help both compiler and human reader.
With clang and gcc, I encountered major issues with the following.
gcc version 8.2.1 20181011
clang version 6.0.1
{
char f1[]="This_is_part1 This_is_part2";
char f2[64]; char f3[64];
sscanf(f1,"%s %s",f2,f3); //split part1 to f2, part2 to f3
}
neither compiler liked f1,f2 or f3, to be within the block. I had to relocate f1,f2,f3 to the function definition area.
the compiler did not mind the definition of an integer with the block.
As per the The C Programming Language By K&R -
In C, all variables must be declared before they are used, usually at the
beginning of the function before any executable statements.
Here you can see word usually it is not must..
Related
I'm new to C. I cannot understand the result in following code. I use goto and jump the declaration of int a[N] array and int x. Although x is not initilized to 10, I can still get access to these variables.
#include <stdlib.h>
#include <stdio.h>
#define N 4
void printArray(int a[], int length) {
for (int i = 0; i < length; i++) {
printf("%d, ", a[i]);
}
printf("\n");
}
int main(void) {
goto done;
int a[N];
int x=10;
printf("x=%d\n", x);
done:
for (int i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
result
0, 1, 2, 3
x=0
My question:
Why I can get access to these variables whose declarations have been jumped? How are variables declared in C? It seems that variable declarations are not run line by line.
Aside from "Variable Length Arrays (VLAs)", an automatic variable's "lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." (§6.2.4) The initialization (if any) occurs a bit later, when the program passes through the declaration. It's legal to jump over the declaration if the program does not depend on the initialization of the variable. But regardless of whether the initialization will eventually happen, the variable exists from the moment your program enters the block, however that is done. (Jumping into a block from outside is also legal, and may also prevent initialization. But as soon as you're in the block, the variable exists.)
If the program attempts to read the value of an uninitialised variable, it receives an indeterminate value. (Most compilers attempt to detect the possibility that this might happen, but you'll need to enable warnings in order to see the report.)
The consequence of "hoisting" the lifetime of a variable to its enclosing block is that there is a portion of the program in which the variable exists but is not visible (since its scope starts where it is defined.) If you save the address of the variable and then jump back into this region of the program, you will be able to access the value of the variable through the pointer, which shows that scope and lifetime are distinct.
If the variable is a VLA, then it's lifetime starts at the declaration and the compiler will not allow you to jump over the declaration. VLAs cannot be initialised, so you must assign a value to every element in a VLA which you intend to access. Not all compilers implement VLAs. Your example does not show a VLA, since N is a macro which expands to an integer constant.
For objective info about the C standard, see rici's answer.
But I see you are asking how this behavior is possible from a C program, and remarking that:
It seems that variable declarifications are not run line by line.
In fact, most computer languages are not run line by line. There is almost always some kind of multi-line parsing step that happens beforehand. For example, even the Bash shell language is processed multiple lines at a time. When Bash finds a while loop, it seems to do extra parsing to make sure the done command is found before it runs any of the code in the while loop. If you don't believe me, try running this Bash script:
while [ 1 ]
do
echo hi
sleep 1
# file terminates before the expected "done" command that ends the loop
Similarly, a C compiler is capable of parsing your code, checking its syntax, and checking that your program is well-formed before it even executes a single line of your code inside any of your functions. When the compiler sees that you are using a or x, it knows what those things are because you declared them above, even if the execution of the program never passed through those lines.
One way that you could imagine that a compiler might work is that (after checking the validity of your code) it moves all the local variable declarations to the top of the function. So your main function might look like this in the internal representation in the compiler:
int main(void) {
int a[N];
int x;
int i;
goto done;
x = 10;
printf("x=%d\n", x);
done:
for (i = 0; i < N; i++) {
a[i] = i;
}
printArray(a, N);
printf("x=%d\n", x);
return EXIT_SUCCESS;
}
This could actually be a useful transformation that would help the compiler generate code for your function because it would make it easy for the compiler to allocate memory on the stack for all your local variables when the function starts.
#include<stdio.h>
int main()
{
int i = 10;
printf("0 i %d %p\n",i,&i);
if (i == 10)
goto f;
{
int i = 20;
printf("1 i %d\n",i);
}
{
int i = 30;
f:
printf("2 i %d %p\n",i,&i); //statement X
}
return 0;
}
Output:
[test]$ ./a.out
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
I have difficulty in understanding how statement X works?? As you see the output it is junk. It should rather say i not declared??
That's because goto skips the shadowing variable i's initialization.
This is one of the minor nuances of the differences between C and C++. In strict C++ go to crossing variable initialization is an error, while in C it's not. GCC also confirms this, when you compile with -std=c11 it allows while with std=c++11 it complains: jump to label 'f' crosses initialization of 'int i'.
From C99:
A goto statement shall not jump from outside the scope of an identifier having a variably modified type to inside the scope of that identifier.
VLAs are of variably modified type. Jumps inside a scope not containing VM types are allowed.
From C++11 (emphasis mine):
A program that jumps from a point where a variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the preceding types and is declared without an initializer.
From the output, it is clear that the address of 'i's are unique, since they are declared in different scopes.
0 i 10 0xbfbeaea8
2 i 134513744 0xbfbeaea4
how statement X works?? As you see the output it is junk. It should
rather say I not declared??
i is also declared in the local scope of statement x but the initialization of i to 30 is skipped because of goto statement. Therefore the local variable i contains a garbage value.
In the first printf statement, you accessed the i in address 0xbfbeaea8 which was declared and initialized in the statement int i = 10;
Once you hit the goto f; statement, you are in the scope of the 2nd i, which is declared at this point and resides in address 0xbfbeaea4 but which is not initialized as you skipped the initialization statement.
That's why you were getting rubbish.
When control reaches the third block, i is declared for the compiler, hence i represents some memory address therefore compiler tries to read it again. But since i has now become out-of-scope, you cannot be sure that it will contain the same value what it originally had.
My suggestion to understand somewhat complex code is to strip out, one by one, all "unnecessary" code and leave the bare problem. How do you know what's unnecessary? Initially, when you're not fluent with the language, you'll be removing parts of the code at random, but very quickly you'll learn what's necessary and what is not.
Give it a try: my hint is to start removing or commenting out the "goto" statement. Recompile and, if there are no errors, see what changed when you run the program again.
Another suggestion would be: try to recreate the problem "from scratch": imagine you are working on a top-secret project and you cannot show any single line of code to anyone, let alone post on Stack Overflow. Now, try to replicate the problem by rewriting equivalent source code, that would show the same behaviour.
As they say, "asking the right question is often solving half the problem".
The i you print in this printf("2 i %d %p\n",i,&i); statement, is not the i which value was 10 in if statement, and as you skip this int i = 30; statement with goto you print garbage. This int i = 30; is actual definition of the i that would be printed, i.e. where compiler allocates room and value of i.
The problem is that your goto is skipping the assignment to the second i, which shadows (conceals) the first i whose value you've set, so you're printing out an uninitialized variable.
You'll get a similar wrong answer from this:
#include<stdio.h>
int main()
{
int i = 10; /* First "i" */
printf("0 i %d %p\n",i,&i);
{ /* New block scope */
int i; /* Second "i" shadows first "i" */
printf("2 i %d %p\n",i,&i);
}
return 0;
}
Three lessons: don't shadow variables; don't create blocks ({ ... }) for no reason; and turn on compiler warnings.
Just to clarify: variable scope is a compile-time concept based on where variables are declared, not something that is subject to what happens at runtime. The declaration of i#2 conceals i#1 inside the block that i#2 is declared in. It doesn't matter if the runtime control path jumps into the middle of the block — i#2 is the i that will be used and i#1 is hidden (shadowed). Runtime control flow doesn't carry scope around in a satchel.
Recently I had to modify a legacy code that was compiled with a very old version of GCC (somewhere around version 2.3). Within a function, variable had to be declared before being used. I believe this is done C89 standard. This limitation is later removed.
My question is: Back then, why did they enforce this ruling? Was there any concern that could jeopardise the integrity of the software?
Variables still have to be declared before being used -- and they've never had to be declared just at the top of a function.
The C89 requirement is that a block consists of an opening {, followed by zero or more declarations, followed by zero or more statements, followed by the closing }.
For example, this is legal C89 (and, without the void, even K&R C, going back to 1978 or earlier):
int foo(void) {
int outer = 10;
{
int inner = 20;
printf("outer = %d, inner = %d\n", outer, inner);
}
printf("outer = %d, inner is not visible\n", outer);
return 0;
}
C99 loosened this, allowing declarations and statements to be mixed within a block:
int foo(void) {
int x = 10;
printf("x = %d\n", x);
int y = 20;
printf("y = %d\n", y);
return 0;
}
As for the reason for the original restriction, I think it goes back to C's ancestor languages: B, BCPL, and even Algol. It probably did make the compiler's job a bit easier. (I was thinking that it would make parsing easier, but I don't think it does; it still has to be able to distinguish whether something is a declaration or a statement without knowing in advance from the context.)
It was mainly to make compilers easier to write. If all the declarations were at the top of the function, it would be easy for the compiler to parse all the locals and determine how much stack is needed.
Of course now, compilers are a lot more mature than they were 30 years ago. So it makes sense to get rid of this restriction as it's become a nuisance to programmers.
We are currently developing an application for a msp430 MCU, and are running into some weird problems. We discovered that declaring arrays withing a scope after declaration of "normal" variables, sometimes causes what seems to be undefined behavior. Like this:
foo(int a, int *b);
int main(void)
{
int x = 2;
int arr[5];
foo(x, arr);
return 0;
}
foo is passed a pointer as the second variable, that sometimes does not point to the arr array. We verify this by single stepping through the program, and see that the value of the arr array-as-a-pointer variable in the main scope is not the same as the value of the b pointer variable in the foo scope. And no, this is not really reproduceable, we have just observed this behavior once in a while.
This is observable even before a single line of the foo function is executed, the passed pointer parameter (b) is simply not pointing to the address that arr is.
Changing the example seems to solve the problem, like this:
foo(int a, int *b);
int main(void)
{
int arr[5];
int x = 2;
foo(x, arr);
return 0;
}
Does anybody have any input or hints as to why we experience this behavior? Or similar experiences? The MSP430 programming guide specifies that code should conform to the ANSI C89 spec. and so I was wondering if it says that arrays has to be declared before non-array variables?
Any input on this would be appreciated.
Update
#Adam Shiemke and tomlogic:
I'm wondering what C89 specifies about different ways of initializing values within declarations. Are you allowed to write something like:
int bar(void)
{
int x = 2;
int y;
foo(x);
}
And if so, what about:
int bar(int z)
{
int x = z;
int y;
foo(x);
}
Is that allowed? I assume the following must be illegal C89:
int bar(void)
{
int x = baz();
int y;
foo(x);
}
Thanks in advance.
Update 2
Problem solved. Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.
If anybody is interested I can post the complete example reproducing the problem, and the fix?
Thanks for all the input on this.
That looks like a compiler bug.
If you use your first example (the problematic one) and write your function call as foo(x, &arr[0]);, do you see the same results? What about if you initialize the array like int arr[5] = {0};? Neither of these should change anything, but if they do it would hint at a compiler bug.
In your updated question:
Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.
It sounds as if the interrupt disabling intrinsic/function/macro (or however interrupts are disabled) might be causing an instruction to be 'skipped' or something. I'd investigate whether it is coded/working correctly.
You should be able to determine if it is a compiler bug based on the assembly code that is produced. Is the assembly different when you change the order of the variable declarations? If your debugger allows you, try single stepping through the assembly.
If you do find a compiler bug, also, check your optimization. I have seen bugs like this introduced by the optimizer.
Both examples look to be conforming C89 to me. There should be no observable difference in behaviour assuming that foo isn't accessing beyond the bounds of the array.
For C89, the variables need to be declared in a list at the start of the scope prior to any assignment. C99 allows you to mix assignment an declaration. So:
{
int x;
int arr[5];
x=5;
...
is legal c89 style. I'm surprised your compiler didn't throw some sort of error on that if it doesn't support c99.
Assuming the real code is much more complex, heres some things i would check, keep in mind they are guesses:
Could you be overflowing the stack on occasion? If so could this be some artifact of "stack defense" by the compiler/uC? Does the incorrect value of &foo fall inside a predictable memory range? if so does that range have any significance (inside the stack, etc)?
Does the mcu430 have different ranges for ram and rom addressing? That is, is the address space for ram 16bit while the program address space 24bit? PIC's have such an architecture for example. If so it would be feasible that arr is getting allocated as rom (24bit) and the function expects a pointer to ram (16bit) the code would work when the arr was allocated in the first 16bit's of address space but brick if its above that range.
Maybe you have at some place in your program in illegal memory write which corrupts your stack.
Did you have a look at the disassembly?
Is the following code legal according to C99?
...
for(....) {
int x = 4;
...
}
...
You can assume that before line 3 the variable x was never declared.
C99 (PDF)
Until now I have only found the following, but I dont think that this is enough:
A block allows a set of declarations and statements to be grouped into one syntactic unit.
The initializers of objects that have automatic storage duration, and the variable length
array declarators of ordinary identifiers with block scope, are evaluated and the values are
stored in the objects (including storing an indeterminate value in objects without an
initializer) each time the declaration is reached in the order of execution, as if it were a
statement, and within each declaration in the order that declarators appear.
From page 145 of that PDF.
This is legal in both C99 and C89.
Look at 6.8.2 , which defines compound statement
Yes, you can declare or define a variable anywhere you want in C99 (at the start of a block in C89).
You said:
"You can assume that before line 3 the
variable x was never declared."
Even if it was previously declared, you could declare a new variable with the same name. Doing that prevents you from accessing the old variable within that block.
int x = 0; /* old x */
printf("%d\n", x); /* old x, prints 0 */
do {
int x = 42; /* new x */
printf("%d\n", x); /* new x, prints 42 */
} while (0);
printf("%d\n", x); /* old x, prints 0 */
I've never tried the following in C99. I really don't know what happens :)
I'll try later, when I get access to a (almost) C99 compiler
int x = 0;
do {
printf("%d\n", x); /* old x? new x? crash? Undefined Behaviour? */
int x = 42;
} while (0);
The C99 feature of declaring/defining variables wherever one wants is not a feature that makes me want to change :)
Yes, you can create a variable at the beginning of any block. The variable is initialised each time the block is entered In C++, you can create them anywhere within the block.
for(....)
{
int x=4;
/*More code*/
}
Yeah this is legal in C99 but you are not allowed to access 'x' after the block.It would be Undefined Behaviour trying to access 'x' beyond its scope.