gcc attributes for init-on-first-use functions - c

I've been using the gcc const and pure attributes for functions which return a pointer to "constant" data that's allocated and initialized on the first use, i.e. where the function will return the same value each time it's called. As an example (not my usage case, but a well-known example) think of a function that allocates and computes trig lookup tables on the first call and just returns a pointer to the existing tables after the first call.
The problem: I've been told this usage is incorrect because these attributes forbid side effects, and that the compiler could even optimize out the call completely in some cases if the return value is not used. Is my usage of const/pure attributes safe, or is there any other way to tell the compiler that N>1 calls to the function are equivalent to 1 call to the function, but that 1 call to the function is not equivalent to 0 calls to the function? Or in other words, that the function only has side effects the first time it's called?

I say this is correct based on my understanding of pure and const, but if anyone has a precise definition of the two, please speak up. This gets tricky because the GCC documentation doesn't lay out exactly what it means for a function to have "no effects except the return value" (for pure) or to "not examine any values except their arguments" (for const). Obviously all functions have some effects (they use processor cycles, modify memory) and examine some values (the function code, constants).
"Side effects" would have to be defined in terms of the semantics of the C programming language, but we can guess what the GCC folks mean based on the purpose of these attributes, which is to enable additional optimizations (at least, that's what I assume they are for).
Forgive me if some of the following is too basic...
Pure functions can participate in common subexpression elimination. Their feature is that they don't modify the environment, so the compiler is free to call it fewer times without changing the semantics of the program.
z = f(x);
y = f(x);
becomes:
z = y = f(x);
Or gets eliminated entirely if z and y are unused.
So my best guess is that a working definition of "pure" is "any function which can be called fewer times without changing the semantics of the program". However, function calls may not be moved, e.g.,
size_t l = strlen(str); // strlen is pure
*some_ptr = '\0';
// Obviously, strlen can't be moved here...
Const functions can be reordered, because they do not depend on the dynamic environment.
// Assuming x and y not aliased, sin can be moved anywhere
*some_ptr = '\0';
double y = sin(x);
*other_ptr = '\0';
So my best guess is that a working definition of "const" is "any function which can be called at any point without changing the semantics of the program". However, there is a danger:
__attribute__((const))
double big_math_func(double x, double theta, double iota)
{
static double table[512];
static bool initted = false;
if (!initted) {
...
initted = true;
}
...
return result;
}
Since it's const, the compiler could reorder it...
pthread_mutex_lock(&mutex);
...
z = big_math_func(x, theta, iota);
...
pthread_mutex_unlock(&mutex);
// big_math_func might go here, if the compiler wants to
In this case, it could be called simultaneously from two processors even though it only appears inside a critical section in your code. Then the processor could decide to postpone changes to table after a change to initted already went through, which is bad news. You can solve this with memory barriers or pthread_once.
I don't think this bug will ever show up on x86, and I don't think it shows up on many systems that don't have multiple physical processors (not cores). So it will work fine for ages and then fail suddenly on a dual-socket POWER computer.
Conclusion: The advantage of these definitions is that they make it clear what kind of changes the compiler is allowed to make in the presence of these attributes, which (I think is) somewhat vague in the GCC docs. The disadvantage is that it's not clear that these are the definitions used by the GCC team.
If you look at the Haskell language specification, for example, you'll find a much more precise definition of purity, since purity is so important to the Haskell language.
Edit: I have not been able to coerce GCC or Clang into moving a solitary __attribute__((const)) function call across another function call, but it seems entirely possible that in the future, something like that would happen. Remember when -fstrict-aliasing became the default, and everybody suddenly had a lot more bugs in their programs? It's stuff like that that makes me cautious.
It seems to me that when you mark a function __attribute__((const)), you are promising the compiler that the result of the function call is the same no matter when it is called during your program's execution, as long as the parameters are the same.
However, I did come up with a way of moving a const function out of a critical section, although the way I did it could be called "cheating" of a sort.
__attribute__((const))
extern int const_func(int x);
int func(int x)
{
int y1, y2;
y1 = const_func(x);
pthread_mutex_lock(&mutex);
y2 = const_func(x);
pthread_mutex_unlock(&mutex);
return y1 + y2;
}
The compiler translates this into the following code (from the assembly):
int func(int x)
{
int y;
y = const_func(x);
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
return y * 2;
}
Note that this won't happen with only __attribute__((pure)), the const attribute and only the const attribute triggers this behavior.
As you can see, the call inside the critical section disappeared. It seems rather arbitrary that the earlier call was kept, and I would not be willing to wager that the compiler won't, in some future version, make a different decision about which call to keep, or whether it might move the function call somewhere else entirely.
Conclusion 2: Tread carefully, because if you don't know what promises you are making to the compiler, a future version of the compiler might surprise you.

Related

How consistently C compilers optimize unreachable code?

Say you have (for reasons that are not important here) the following code:
int k = 0;
... /* no change to k can happen here */
if (k) {
do_something();
}
Using the -O2 flag, GCC will not generate any code for it, recognizing that the if test is always false.
I'm wondering if this is a pretty common behaviour across compilers or it is something I should not rely on.
Does anybody knows?
Dead code elimination in this case is trivial to do for any modern optimizing compiler. I would definitely rely on it, given that optimizations are turned on and you are absolutely sure that the compiler can prove that the value is zero at the moment of check.
However, you should be aware that sometimes your code has more potential side effects than you think.
The first source of problems is calling non-inlined functions. Whenever you call a function which is not inlined (i.e. because its definition is located in another translation unit), compiler assumes that all global variables and the whole contents of the heap may change inside this call. Local variables are the lucky exception, because compiler knows that it is illegal to modify them indirectly... unless you save the address of a local variable somewhere. For instance, in this case dead code won't be eliminated:
int function_with_unpredictable_side_effects(const int &x);
void doit() {
int k = 0;
function_with_unpredictable_side_effects(k);
if (k)
printf("Never reached\n");
}
So compiler has to do some work and may fail even for local variables. By the way, I believe the problem which is solved in this case is called escape analysis.
The second source of problems is pointer aliasing: compiler has to take into account that all sort of pointers and references in your code may be equal, so changing something via one pointer may change the contents at the other one. Here is one example:
struct MyArray {
int num;
int arr[100];
};
void doit(int idx) {
MyArray x;
x.num = 0;
x.arr[idx] = 7;
if (x.num)
printf("Never reached\n");
}
Visual C++ compiler does not eliminate the dead code, because it thinks that you may access x.num as x.arr[-1]. It may sound like an awful thing to do to you, but this compiler has been used in gamedev area for years, and such hacks are not uncommon there, so the compiler stays on the safe side. On the other hand, GCC removes the dead code. Maybe it is related to its exploitation of strict pointer aliasing rule.
P.S. The const keywork is never used by optimizer, it is only present in C/C++ language for programmers' convenience.
There is no pretty common behaviour across compilers. But there is a way to explore how different compilers acts with specific part of code.
Compiler explorer will help you to answer on every question about code generation, but of course you must be familiar with assembler language.

Is there a static C analyzer which detects uninitialized static variables? [duplicate]

I need to debug an ugly and huge math C library, probably once produced by f2c. The code is abusing local static variables, and unfortunately somewhere it seems to exploit the fact that these are automatically initialized to 0. If its entry function is called with the same input twice, it is giving different results. If I unload the library and reload it again, it works correctly. It needs to be fast, so I would like to get rid of the load/unload.
My question is that how to uncover these errors with valgrind or by any other tool without manually walking through the entire code.
I am hunting places where a local static variable is declared, read first, and written only later. The problem is even further complicated by the fact that the static variables are sometimes passed further via pointers (yep - it is so ugly).
I understand that one can argue that mistakes like this should not be necessary detected by an automatic tool, as in some scenarios this is exactly the intended behaviour. Still, is there a way to make the auto-initialized local static variables "dirty"?
The devil is in the details, but this may work for you:
First, get Frama-C. If you are using Unix, your distribution may have a package. The package won't be the last version but it may be good enough and it will save you some time if you install it this way.
Say your example is as below, only so much bigger that it's not obvious what is wrong:
int add(int x, int y)
{
static int state;
int result = x + y + state; // I tested it once and it worked.
state++;
return result;
}
Type a command like:
frama-c -lib-entry -main add -deps ugly.c
Options -lib-entry -main add mean "look at function add". Option -deps computes functional dependencies. You'll find these "functional dependencies" in the log:
[from] Function add:
state FROM state; (and default:false)
\result FROM x; y; state; (and default:false)
This lists the actual inputs the results of add depend on, and the actual outputs computed from these inputs, including static variables read from and modified. A static variable that was properly initialized before being used would normally not appear as input, unless the analyzer was unable to determine that it was always initialized before being read from.
The log shows state as dependency of \result. If you expected the returned result to depend only on the arguments (meaning two calls with the same arguments produce the same result), it's a hint there may be something wrong here, with the variable state.
Another hint shown in the above lines is that the function modifies state.
This may help or not. Option -lib-entry means that the analyzer does not assume that any non-const static variable has kept its value at the time the function under analysis is called, so that may be too imprecise for your code. There are ways around that, but then it is up to you whether you want to gamble the time it takes to learn these ways.
EDIT: here is a more complex example:
void initialize_1(int *p)
{
*p = 0;
}
void initialize_2(int *p)
{
*p; // I made a mistake here.
}
int add(int x, int y)
{
static int state1;
static int state2;
initialize_1(&state1);
initialize_2(&state2);
// This is safe because I have initialized state1 and state2:
int result = x + y + state1 + state2;
state1++;
state2++;
return result;
}
On this example, the same command produces the results:
[from] Function initialize_1:
state1 FROM p
[from] Function initialize_2:
[from] Function add:
state1 FROM \nothing
state2 FROM state2
\result FROM x; y; state2
What you see for initialize_2 is an empty list of dependencies, meaning the function assigns nothing. I will make this case clearer by displaying an explicit message rather than just an empty list. If you know what any of the functions initialize_1, initialize_2 or add is supposed to do, you can compare this a priori knowledge to the results of the analysis and see that something is wrong for initialize_2 and add.
SECOND EDIT: and now my example shows something strange for initialize_1, so perhaps I should explain that. Variable state1 depends on p in the sense that p is used to write to state1, and if p had been different, then the final value of state1 would have been different. Here is a last example:
int t[10];
void initialize_index(int i)
{
t[i] = 1;
}
int main(int argc, char **argv)
{
initialize_index(argv[1][0]-'0');
}
With the command frama-c -deps t.c, the dependencies computed for initialize_index are:
[from] Function initialize_index:
t[0..9] FROM i (and SELF)
This means that each of the cells depends on i (it may be modified if i is the index of that particular cell). Each cell may also keep its value (if i indicates another cell): this is indicated with the (and SELF) mention in the latest version, and was indicated with a more obscure (and default:true) in previous versions.
Static code analysis tools are pretty good at finding typical programming errors like the use of uninitialized variables. Here is a list of free tools that do this for C.
Unfortunately I can't recommend any of the tools in the list. I am only familiar with two commercial products, Coverity and Klocwork. Coverity is very good (and expensive). Klocwork is so so (but less expensive).
What I did in the end is removed all static qualifiers from the code by '#define static'. This turns uninitialised static usage into invalid use, and the type of abuse I am hunting can be uncovered by the tools.
In my actual case this was enough to determine the place of the bug, but in a more general situation it should be refined if static's are actually doing something important, by gradually re-adding 'static' when the code fails to continue.
I don't know of any library that does this for you, but I would look into using regular expressions to find them. Something like
rgrep "static\s*int" path/to/src/root | grep -v = | grep -v "("
That should return all static int variables declared without an equals sign, and the last pipe should remove anything with parenthesis in them (getting rid of funcions). There's a good change that this won't work exactly for you, but playing around with grep may be the fastest way for you to track this down.
Of course, once you find one that works you can replace int with all of the other kinds of variables to search for those too. HTH
My question is that how to uncover these errors ...
But these aren't errors: the expectation that a static variable is initialized to 0 is perfectly valid, as is assigning some other value to it.
So asking for a tool that will automatically find non-errors is unlikely to produce a satisfying result.
From your description, it appears that somefunc() returns correct result first time it is called, and incorrect result on subsequent calls.
The simplest way to debug such problems is to have two GDB sessions side-by-side: one freshly-loaded (will compute correct answer), and one with "second iteration" (will compute wrong answer). Then step through both sessions "in parallel", and see where their computation or control flow starts to diverge.
Since you can usually effectively divide the problem in half, it often doesn't take long to find the bug. Bugs that always reproduce are the easiest ones to find. Just do it.

Usage of the const keyword

I know that using the const keyword on function arguments provides better performance, but I always forget to add it. Is the compiler (GCC in this case) smart enough to notice that the variabele never changes during the function, and compile it as if I would have added const explicitly?
You have a common misunderstanding about const. Only making an object const means that its value never changes, and then it's not just during a function, it never changes.
Making a parameter to a function const does not mean its value never changes, it just means that function cannot change the value through that const pointer. The value can change other ways.
For example, look at this function:
void f(const int* x, int* y)
{
cout << "x = " << *x << endl;
*y = 5;
cout << "x = " << *x << endl;
}
Notice that it takes a const pointer to x. However, what if you call it like this:
int x = 10;
f(&x, &x);
Now, f has a const pointer, but it's to a non-const object. So the value can change, and it does since y is a non-const pointer to the same object. All of this is perfectly legal code. There's no funny business here.
So there's really no answer to your question since it's based entirely on false premises.
Is the compiler (GCC in this case) smart enough to notice that the
variabele never changes during the function, and compile it as if I
would have added const explicitly?
Not necessarily. For example:
void some_function(int *ptr); // defined in another translation unit
int foo(int a) {
some_function(&a);
return a + 1;
}
The compiler can't see what some_function does, so it can't assume that it won't modify a.
Link-time optimization could perhaps see what some_function really does and act accordingly, but as far as this answer is concerned I'll consider only optimization for which the definition of some_function is unavailable.
int bar(const int a) {
some_function((int*)&a);
return a + 1;
}
The compiler can't see what some_function does, but it can assume that the value of a does not change anyway. Therefore it can make any optimizations that apply: maybe it can keep a in a callee-saves register across the call to some_function; maybe it computes the return value before making the call instead of after, and zaps a. The program has undefined behavior if some_function modifies a, and so from the compiler's POV once that happens it doesn't matter whether it uses the "right" or "wrong" value for a.
So, by in this example by marking a const you have told the compiler something that it cannot otherwise know -- that some_function will not modify *ptr. Or anyway that if it does modify it, then you don't care what your program's behavior is.
int baz(int a) {
some_function(NULL);
return a + 1;
}
Here the compiler can see all relevant code as far as the standard is concerned. It doesn't know what some_function does, but it does know that it doesn't have any standard means to access a. So it should make no difference whether a is marked const or not because the compiler knows it doesn't change anyway.
Debugger support can complicate this situation, of course -- I don't know how things stand with gcc and gdb, but in theory at least if the compiler wants to support you breaking in with the debugger and modifying a manually then it might not treat it as unmodifiable. The same applies to the possibility that some_function uses platform-specific functionality to walk up the stack and mess with a. Platforms don't have to provide such functionality, but if they do then it conflicts with optimization.
I've seen an old version of gcc (3.x, can't remember x) that failed to make certain optimizations where I failed to make a local int variable const, but in my case gcc 4 did make the optimization. Anyway, the case I'm thinking of wasn't a parameter, it was an automatic variable initialized with a constant value.
There's nothing special about a being a parameter in any of what I've said -- it could just as well be any automatic variable defined in the function. Mind you, the only way to for a parameter to get the effect of initialization with a constant value is to call the function with a constant value, and for the compiler to observe the value for that call. This tends to happen only when the function is inlined. So inlined calls to functions can have additional optimizations applied to them that the "out-of-line" function body isn't eligible for.
const, much like inline, is only a hint for a compiler and does not guarantee any performance gains. The more important task of const is to protect programmers from themselves so they do not unwilling modify variables where they shouldn’t be modified.
1) Really const is not affecting your performance directly. It may in some cases make simpler points-to analysis (so prefer const char* to char*), but const is more about semantics and readability of your code.
2) CV-qualified type forms different type in C and C++. So your compiler, even if it sees profit from making default const, will not do it, because it will change type and may lead to surprisingly odd things.
As part of the optimization the compiler is taking a deep look at when memory locations are read or written. So the compiler is quite good at detecting when a variable is not changed (const) and when it is changed. The optimizer does not need you to tell him when a variable is const.
Nevertheless you should always use const when appropriate. Why? Because it makes interfaces more clear and easier to understand. And it helps detect bugs when you are changing a variable that you did not want to change.

tracking uninitialized static variables

I need to debug an ugly and huge math C library, probably once produced by f2c. The code is abusing local static variables, and unfortunately somewhere it seems to exploit the fact that these are automatically initialized to 0. If its entry function is called with the same input twice, it is giving different results. If I unload the library and reload it again, it works correctly. It needs to be fast, so I would like to get rid of the load/unload.
My question is that how to uncover these errors with valgrind or by any other tool without manually walking through the entire code.
I am hunting places where a local static variable is declared, read first, and written only later. The problem is even further complicated by the fact that the static variables are sometimes passed further via pointers (yep - it is so ugly).
I understand that one can argue that mistakes like this should not be necessary detected by an automatic tool, as in some scenarios this is exactly the intended behaviour. Still, is there a way to make the auto-initialized local static variables "dirty"?
The devil is in the details, but this may work for you:
First, get Frama-C. If you are using Unix, your distribution may have a package. The package won't be the last version but it may be good enough and it will save you some time if you install it this way.
Say your example is as below, only so much bigger that it's not obvious what is wrong:
int add(int x, int y)
{
static int state;
int result = x + y + state; // I tested it once and it worked.
state++;
return result;
}
Type a command like:
frama-c -lib-entry -main add -deps ugly.c
Options -lib-entry -main add mean "look at function add". Option -deps computes functional dependencies. You'll find these "functional dependencies" in the log:
[from] Function add:
state FROM state; (and default:false)
\result FROM x; y; state; (and default:false)
This lists the actual inputs the results of add depend on, and the actual outputs computed from these inputs, including static variables read from and modified. A static variable that was properly initialized before being used would normally not appear as input, unless the analyzer was unable to determine that it was always initialized before being read from.
The log shows state as dependency of \result. If you expected the returned result to depend only on the arguments (meaning two calls with the same arguments produce the same result), it's a hint there may be something wrong here, with the variable state.
Another hint shown in the above lines is that the function modifies state.
This may help or not. Option -lib-entry means that the analyzer does not assume that any non-const static variable has kept its value at the time the function under analysis is called, so that may be too imprecise for your code. There are ways around that, but then it is up to you whether you want to gamble the time it takes to learn these ways.
EDIT: here is a more complex example:
void initialize_1(int *p)
{
*p = 0;
}
void initialize_2(int *p)
{
*p; // I made a mistake here.
}
int add(int x, int y)
{
static int state1;
static int state2;
initialize_1(&state1);
initialize_2(&state2);
// This is safe because I have initialized state1 and state2:
int result = x + y + state1 + state2;
state1++;
state2++;
return result;
}
On this example, the same command produces the results:
[from] Function initialize_1:
state1 FROM p
[from] Function initialize_2:
[from] Function add:
state1 FROM \nothing
state2 FROM state2
\result FROM x; y; state2
What you see for initialize_2 is an empty list of dependencies, meaning the function assigns nothing. I will make this case clearer by displaying an explicit message rather than just an empty list. If you know what any of the functions initialize_1, initialize_2 or add is supposed to do, you can compare this a priori knowledge to the results of the analysis and see that something is wrong for initialize_2 and add.
SECOND EDIT: and now my example shows something strange for initialize_1, so perhaps I should explain that. Variable state1 depends on p in the sense that p is used to write to state1, and if p had been different, then the final value of state1 would have been different. Here is a last example:
int t[10];
void initialize_index(int i)
{
t[i] = 1;
}
int main(int argc, char **argv)
{
initialize_index(argv[1][0]-'0');
}
With the command frama-c -deps t.c, the dependencies computed for initialize_index are:
[from] Function initialize_index:
t[0..9] FROM i (and SELF)
This means that each of the cells depends on i (it may be modified if i is the index of that particular cell). Each cell may also keep its value (if i indicates another cell): this is indicated with the (and SELF) mention in the latest version, and was indicated with a more obscure (and default:true) in previous versions.
Static code analysis tools are pretty good at finding typical programming errors like the use of uninitialized variables. Here is a list of free tools that do this for C.
Unfortunately I can't recommend any of the tools in the list. I am only familiar with two commercial products, Coverity and Klocwork. Coverity is very good (and expensive). Klocwork is so so (but less expensive).
What I did in the end is removed all static qualifiers from the code by '#define static'. This turns uninitialised static usage into invalid use, and the type of abuse I am hunting can be uncovered by the tools.
In my actual case this was enough to determine the place of the bug, but in a more general situation it should be refined if static's are actually doing something important, by gradually re-adding 'static' when the code fails to continue.
I don't know of any library that does this for you, but I would look into using regular expressions to find them. Something like
rgrep "static\s*int" path/to/src/root | grep -v = | grep -v "("
That should return all static int variables declared without an equals sign, and the last pipe should remove anything with parenthesis in them (getting rid of funcions). There's a good change that this won't work exactly for you, but playing around with grep may be the fastest way for you to track this down.
Of course, once you find one that works you can replace int with all of the other kinds of variables to search for those too. HTH
My question is that how to uncover these errors ...
But these aren't errors: the expectation that a static variable is initialized to 0 is perfectly valid, as is assigning some other value to it.
So asking for a tool that will automatically find non-errors is unlikely to produce a satisfying result.
From your description, it appears that somefunc() returns correct result first time it is called, and incorrect result on subsequent calls.
The simplest way to debug such problems is to have two GDB sessions side-by-side: one freshly-loaded (will compute correct answer), and one with "second iteration" (will compute wrong answer). Then step through both sessions "in parallel", and see where their computation or control flow starts to diverge.
Since you can usually effectively divide the problem in half, it often doesn't take long to find the bug. Bugs that always reproduce are the easiest ones to find. Just do it.

C89, Mixing Variable Declarations and Code

I'm very curious to know why exactly C89 compilers will dump on you when you try to mix variable declarations and code, like this for example:
rutski#imac:~$ cat test.c
#include <stdio.h>
int
main(void)
{
printf("Hello World!\n");
int x = 7;
printf("%d!\n", x);
return 0;
}
rutski#imac:~$ gcc -std=c89 -pedantic test.c
test.c: In function ‘main’:
test.c:7: warning: ISO C90 forbids mixed declarations and code
rutski#imac:~$
Yes, you can avoid this sort of thing by staying away from -pedantic. But then your code is no longer standards compliant. And as anybody capable of answering this post probably already knows, this is not just a theoretical concern. Platforms like Microsoft's C compiler enforce this quick in the standard under any and all circumstances.
Given how ancient C is, I would imagine that this feature is due to some historical issue dating back to the extraordinary hardware limitations of the 70's, but I don't know the details. Or am I totally wrong there?
The C standard said "thou shalt not", because it was not allowed in the earlier C compilers which the C89 standard standardized. It was a radical enough step to create a language that could be used for writing an operating system and its utilities. The concept probably simply wasn't considered - no other language at the time allowed it (Pascal, Algol, PL/1, Fortran, COBOL), so C didn't need to either. And it probably makes the compiler slightly harder to handle (bigger), and the original compilers were space-constrained by the 64 KiB code and 64 KiB data space allowed in the PDP 11 series (the big machines; the littler ones only allowed 64 KiB for both code and data, AFAIK). So, extra complexity was not a good idea.
It was C++ that allowed declarations and variables to be interleaved, but C++ is not, and never has been, C. However, C99 finally caught up with C++ (it is a useful feature). Sadly, though, Microsoft never implemented C99.
It was probably never implemented that way, because it was never needed.
Suppose you want to write something like this in plain C:
int myfunction(int value)
{
if (value==0)
return 0;
int result = value * 2;
return result;
}
Then you can easily rewrite this in valid C, like this:
int myfunction(int value)
{
int result;
if (value==0)
return 0;
result = value * 2;
return result;
}
There is absolutely no performance impact by first declaring the variable, then setting its value.
However, in C++, this is not the case anymore.
In the following example, function2 will be slower than function1:
double function1(const Factory &factory)
{
if (!factory.isWorking())
return 0;
Product product(factory.makeProduct());
return product.getQuantity();
}
double function2(const Factory &factory)
{
Product product;
if (!factory.isWorking())
return 0;
product = factory.makeProduct();
return product.getQuantity();
}
In function2 the product variable needs to be constructed, even when the factory is not working.
Later, the factory makes the product and then the assignment operator needs to copy the product (from the return value of makeProduct to the product variable). In function1, product is only constructed when the factory is working, and even then, the copy constructor is called, not the normal constructor and assignment operator.
However, I would expect nowadays that a good C++ compiler would optimize this code, but in the first C++ compilers this probably wasn't the case.
A second example is the following:
double function1(const Factory &factory)
{
if (!factory.isWorking())
return 0;
Product &product = factory.getProduct();
return product.getQuantity();
}
double function2(const Factory &factory)
{
Product &product;
if (!factory.isWorking())
return 0;
product = factory.getProduct(); // Invalid. You can't assign to a reference.
return product.getQuantity();
}
In this example, function2 is simply invalid. References can only be assigned a value at declaration time, not later.
This means that in this example, the only way to write valid code is to write the declaration at the moment where the variable is really initialized. Not sooner.
Both examples show why it was really needed in C++ to allow variable declarations after other executable statements, and not in the beginning of the block like in C.
This explains why this was added to C++, and not to C (and other languages) where it isn't really needed.
It is similar to requiring functions to be declared before they are used - it allows a simple-minded compiler to operate in one pass, from top to bottom, emitting object code as it goes.
In this particular case, the compiler can go through the declarations, adding up the stack space required. When it reaches the first statement, it can output the code to adjust the stack, allocating space for the locals, immediately before the start of the function code proper.
It is much easier to write a compiler for language which requires all variables to be declared at the start of function. Some languages even require variables to be declared in specific clause outside of function code (Pascal and Smalltalk come to mind).
The reason is it's easier to map this variables to stack (or registers if your compiler is smart enough) if they are known and don't change.
Any other statements (esp. function calls) may modify stack/registers, making variable mapping more complex.

Resources