Depended of the version of the C compiler and compiler flags it is possible to initialize variables on any place in your functions (As far as I am aware).
I'm used to put it all the variables at the top of the function, but the discussion started about the memory use of the variables if defined in any other place in the function.
Below I have written 2 short examples, and I wondered if anyone could explain me (or verify) how the memory gets allocated.
Example 1: Variable y is defined after a possible return statement, there is a possibility this variable won't be used for that reason, as far as I'm aware this doesn't matter and the code (memory allocation) would be the same if the variable was placed at the top of the function. Is this correct?
Example 2: Variable x is initialized in a loop, meaning that the scope of this variable is only within this loop, but what about the memory use of this variable? Would it be any different if placed on the top of the functions? Or just initialized on the stack at the function call?
Edit: To conclude a main question:
Does reducing the scope of the variable or change the location of the first use (so anywhere else instead of top) have any effects on the memory use?
Code example 1
static void Function(void){
uint8_t x = 0;
//code changing x
if(x == 2)
{
return;
}
uint8_t y = 0;
//more code changing y
}
Code example 2
static void LoopFunction(void){
uint8_t i = 0;
for(i =0; i < 100; i ++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%d", x);
}
//more code
}
I'm used to put it all the variables at the top of the function
This used to be required in the older versions of C, but modern compilers dropped that requirement. As long as they know the type of the variable at the point of its first use, the compilers have all the information they need.
I wondered if anyone could explain me how the memory gets allocated.
The compiler decides how to allocate memory in the automatic storage area. Implementations are not limited to the approach that gives each variable you declare a separate location. They are allowed to reuse locations of variables that go out of scope, and also of variables no longer used after a certain point.
In your first example, variable y is allowed to use the space formerly occupied by variable x, because the first point of use of y is after the last point of use of x.
In your second example the space used for x inside the loop can be reused for other variables that you may declare in the // more code area.
Basically, the story goes like this. When calling a function in raw assembler, it is custom to store everything used by the function on the stack upon entering the function, and clean it up upon leaving. Certain CPUs and ABIs may have a calling convention which involves automatic stacking of parameters.
Likely because of this, C and many other old languages had the requirement that all variables must be declared at the top of the function (or on top of the scope), so that the { } reflect push/pop on the stack.
Somewhere around the 80s/90s, compilers started to optimize such code efficiently, as in they would only allocate room for a local variable at the point where it was first used, and de-allocate it when there was no further use for it. Regardless of where that variable was declared - it didn't matter for the optimizing compiler.
Around the same time, C++ lifted the variable declaration restrictions that C had, and allowed variables to be declared anywhere. However, C did not actually fix this before the year 1999 with the updated C99 standard. In modern C you can declare variables everywhere.
So there is absolutely no performance difference between your two examples, unless you are using an incredibly ancient compiler. It is however considered good programming practice to narrow the scope of a variable as much as possible - though it shouldn't be done at the expense of readability.
Although it is only a matter of style, I would personally prefer to write your function like this:
(note that you are using the wrong printf format specifier for uint8_t)
#include <inttypes.h>
static void LoopFunction (void)
{
for(uint8_t i=0; i < 100; i++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%" PRIu8, x);
}
//more code
}
Old C allowed only to declare (and initialize) variables at the top of a block. You where allowed to init a new block (a pair of { and } characters) anywhere inside a block, so you had then the possibility of declaring variables next to the code using them:
... /* inside a block */
{ int x = 3;
/* use x */
} /* x is not adressabel past this point */
And you where permitted to do this in switch statements, if statements and while and do statements (everywhere where you can init a new block)
Now, you are permitted to declare a variable anywhere where a statement is allowed, and the scope of that variable goes from the point of declaration to the end of the inner nested block you have declared it into.
Compilers decide when they allocate storage for local variables so, you can all of them allocated when you create a stack frame (this is the gcc way, as it allocates local variables only once) or when you enter in the block of definition (for example, Microsoft C does this way) Allocating space at runtime is something that requires advancing the stack pointer at runtime, so if you do this only once per stack frame you are saving cpu cycles (but wasting memory locations). The important thing here is that you are not allowed to refer to a variable location outside of its scoping definition, so if you try to do, you'll get undefined behaviour. I discovered an old bug for a long time running over internet, because nobody take the time to compile that program using Microsoft-C compiler (which failed in a core dump) instead of the commmon use of compiling it with GCC. The code was using a local variable defined in an inner scope (the then part of an if statement) by reference in some other part of the code (as everything was on main function, the stack frame was present all the time) Microsoft-C just reallocated the space on exiting the if statement, but GCC waited to do it until main finished. The bug solved by just adding a static modifier to the variable declaration (making it global) and no more refactoring was neccesary.
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
struct bla_bla x;
...
pointer_to_x = &x;
}
/* x does not exist (but it did in gcc) */
do_something_to_bla_bla(pointer_to_x); /* wrong, x doesn't exist */
} /* main */
when changed to:
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
static struct bla_bla x; /* now global ---even if scoped */
...
pointer_to_x = &x;
}
/* x is not visible, but exists, so pointer_to_x continues to be valid */
do_something_to_bla_bla(pointer_to_x); /* correct now */
} /* main */
Related
If I define a variable anywhere in a function (not at the beginning), when the program is compiled and executed to this function, will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?
If it runs in order, Will it reduce some overhead when problems arise?
like so:
if(!(Size && Packs))
{
ret = false;
return SendAck(ret);
}
uint8_t *pSrc = (uint8_t *)pRcv->data;
uint8_t crc = 0;
The Standard C Computing Model
The C standard specifies memory reservation requirements in terms of storage duration and lifetime. Objects declared inside functions without extern, static, or _Thread_local have automatic storage duration. (Other storage durations, not discussed here, are static, thread, allocated, and temporary.) This includes parameters of functions, since they are declared inside the function declaration.
Each declaration inside a function has an associated block. Blocks are groups of statements (sometimes just a single statement). A compound statement bracketed with { and } is a block, each selection statement (if, switch) and each loop statement (for, while, do) is a block, and each substatement of those statements is a block. The block associated with a declaration inside a function is the innermost block it is in. For a function parameter, its associated block is the compound statement that defines the function.
For an automatic object that is not a variable length array, its lifetime starts when execution enters the block it is in, and it ends when execution of the block ends. (Calling a function suspends execution of the block b ut does not end it.) So in:
{
Label:
foo();
int A;
}
A exists as soon as execution reaches Label, because execution of A’s block has started.
This means that, as soon as the block is entered, all automatic objects in it other than variable length arrays should have memory reserved for them.
This fact is generally of little use, as there is no way to refer to A at Label. However, if we do this:
{
int i = 0;
int *p;
Label:
if (0 < i)
*p += foo();
int A = 0;
p = &A;
if (++i < 3)
goto Label;
bar(A);
}
then we can use A at Label after the first iteration because p points to it. We could imagine motivation for code like this could arise in a loop that needs to treat its first iteration specially. However, I have never seen it used in practice.
For an automatic object that is a variable length array, its lifetime starts when execution reaches its declaration and ends when execution leaves the scope of the declaration. So with this code:
int N = baz();
{
int i = 0;
Label:
foo();
int A[N];
if (++i < 3)
goto Label;
}
A does not exist at Label. Its lifetime begins each time execution reaches the declaration int A[N]; and ends, in the first few iterations, when the goto Label; transfers execution out of the scope of the declaration or, in the last iteration, when execution of the block ends.
Practical Implementation
In general-purpose C implementations, automatic objects are implemented with a hardware stack. A region of memory is set aside to be used as a stack, and a particular processor register, call the stack pointer, keeps track of how much is in use, by recording the address of the current “top” of stack. (For historic reasons, stacks usually start at high addresses and grow toward lower addresses, so the logical top of a stack is at the bottom of its used addresses.)
Because the lifetimes of automatic objects other than variable length arrays start when execution of their associate blocks begins, a compiler could implement this by adjusting the stack pointer whenever entering or ending a block. This can provide memory efficiency in at least two ways. Consider this code:
if (foo(0))
{
int A[100];
bar(A, x, 0);
}
if (foo(1))
{
int B[100];
bar(B, x, 1);
}
if (foo(2))
{
int C[1000];
bar(C, x, 2);
}
Because A and B do not exist at the same time, the compiler does not have to reserve memory for both of them when the function starts. It can adjust the stack pointer when each block is entered and ended. And for the large array C, the space might never be reserved at all; a compiler could choose to allocate space for 1000 int only if the block is actually entered.
However, I do not think GCC and Clang are taking advantage of this. In practice, I think they generally figure out the maximum space will be needed at any one time in the function and allocate that much space on the stack and use it through the function. This does include optimizations like using the same space for A and B, since they are never in use at the same time, but it does not include optimizating for the possibility that C is never used. However, I could be wrong; I have not checked on this compiler behavior lately.
In the cases of variable length arrays, the compiler generally cannot plan the memory use in advance, since it does not know the array size. So, for a variable length array, space has to be reserved for it on the stack when its declaration is reached.
Additionally, note that the compiler does not have to implement the computing model the C standard uses literally. It can make any optimizations that get the same observable behavior. This means that, for example, if it can tell part of an array is not used, it does not have to allocate memory for that at all. This means the answer to your question, “… will space be allocated to the defined variable first or will it be allocated when it runs to the defined statement?”, is that a compiler designer may choose either method, as long as the observable behavior of the resulting program matches what the C standard specifies.
The observable behavior includes data written to files, input/output interactions, and accesses to volatile objects.
If I define an array in if statement then does memory gets allocated during compile time eg.
if(1)
{
int a[1000];
}
else
{
float b[1000];
}
Then a memory of 2 * 1000 for ints + 4 * 1000 for floats get allocated?
It is reserved on the stack at run-time (assuming a non-trivial condition - in your case, the compiler would just exclude the else part). That means it only exists inside the scope block (between the {}).
In your example, only the memory for the ints gets allocated on the stack (1000 * sizeof(int)).
As you can guess, this is happening at run time. The generated code has instructions to allocate the space on the stack when the corresponding block of code is entered.
Keep in mind that this is happening because of the semantics of the language. The block structure introduces a new scope, and any automatic variables allocated in that scope have a lifetime that lasts as long as the scope does. In C, this is implemented by allocating it on the stack, which collapses as the scope disappears.
Just to drive home the point, note that the allocation would be different had the variables been of different nature.
if(1)
{
static int a[1000];
}
else
{
static float b[1000];
}
In this case, space is allocated for both the ints and the floats. The lifetime of these variables is the program. But the visibility is within the block scope they are allocated in.
Scope
Variables declared inside the scope of a pair of { } are on the stack. This applies to variables declared at the beginning of a function or in any pair of { } within the function.
int myfunc()
{
int i = 0; // On the stack, scoped: myfunc
printf("%i\n");
if (1)
{
int j = 1; // On the stack, scope: this if statement
printf("%i %i\n",i,j);
}
printf("%i %i\n",i,j); // Won't work, no j
}
These days the scope of the variables is limited to the surrounding { }. I recall that some older Microsoft compilers didn't limit the scope, and that in the example above the final printf() would compile.
So Where is it in memory?
The memory of i and j is merely reserved on the stack. This is not the same as memory allocation done with malloc(). That is important, because calling malloc() is very slow in comparison. Also with memory dynamically allocated using malloc() you have to call free().
In effect the compiler knows ahead of time what space is needed for a function's variables and will generate code that refers to memory relative to whatever the stack pointer is when myfunc() is called. So long as the stack is big enough (2MBytes normally, depends on the OS), all is good.
Stack overflow occurs in the situation where myfunc() is called with the stack pointer already close to the end of the stack (i.e. myfunc() is called by a function which in turn had been called by another which it self was called by yet another, etc. Each layer of nested calls to functions moves the stack pointer on a bit more, and is only moved back when functions return).
If the space between the stack pointer and the end of the stack isn't big enough to hold all the variables that are declared in myfunc(), the code for myfunc() will simply try to use locations beyond the end of the stack. That is almost always a bad thing, and exactly how bad and how hard it is to notice that something has gone wrong depends on the operating system. On small embedded micro controllers it can be a nightmare as it usually means some other part of the program's data (eg global variables) get silently overwritten, and it can be very hard to debug. On bigger systems (Linux, Windows) the OS will tell you what's happened, or will merely make the stack bigger.
Runtime Efficiency Considerations
In the example above I'm assigning values to i and j. This does actually take up a small amount of runtime. j is assigned 1 only after evaluation of the if statement and subsequent branch into where j is declared.
Say for example the if statement hadn't evaluated as true; in that case j is never assigned 1. If j was declared at the start of myfunc() then it would always get assigned the value of 1 regardless of whether the if statement was true - a minor waste of time. But consider a less trivial example where a large array is declared an initialised; that would take more execution time.
int myfunc()
{
int i = 0; // On the stack, scoped: myfunc
int k[10000] = {0} // On the stack, scoped: myfunc. A complete waste of time
// when the if statement evaluates to false.
printf("%i\n");
if (0)
{
int j = 1; // On the stack, scope: this if statement
// It would be better to move the declaration of k to here
// so that it is initialised only when the if evaluates to true.
printf("%i %i %i\n",i,j,k[500]);
}
printf("%i %i\n",i,j); // Won't work, no j
}
Placing the declaration of k at the top of myfunc() means that a loop 10,000 long is executed to initialise k every time myfunc() is called. However it never gets used, so that loop is a complete waste of time.
Of course, in these trivial examples compilers will optimise out the unnecessary code, etc. In real code where the compiler cannot predict ahead of time what the execution flow will be then things are left in place.
Memory for the array in the if block will be allocated on stack at run time. else part will be optimized (removed) by the compiler. For more on where the variables will be allocated memory, see Segmentation Fault when writing to a string
As DCoder & paddy corrected me, the memory will be calculated at compile time but allocated at run-time in stack memory segment, but with the scope & lifetime of the block in which the array is defined. The size of memory allocated depends on size of int & float in your system. Read this for an overview on C memory map
What is the difference between the block and function scopes in C99 in terms of what happens on stack when a function / block is entered and left?
In theory, a compiler could generate code to allocate a stack frame on entry to any block that contains local variables. In such a case, there wouldn't be much difference at all.
In practice, most compilers compute the maximum size of local variables that could be used by any path through a function, then allocate that size of stack frame on entry. Variables in any block inside the function are simply different offsets from the stack pointer. Note that in such a case, two (or more) blocks may use the same addresses. For example, with source code like this:
void f(int x) {
if (x) {
long y;
}
else {
float z;
}
}
...chances are quite good that y and z will end up at the same address.
Like this:
void foo(int n) // <-- beginning of function scope
{ // <-- beginning of function body scope
int x = n;
for (;;)
{ // <-- beginning of block scope
int q = n;
x *= q;
} // <-- end of block scope
foo(x);
{ // <-- another block scope
int w = x;
}
} // <-- end of function body scope
// and of function scope
Nothing "happens" when a scope ends, but a variable only lives inside the scope where it is declared (with some arcane exceptions). It is up to the implementation to reuse the space of variables of previous, nested scopes that have ended.
The only thing an implementation is required to do when control enters either a function or a block scope is to behave as if new instances have been created of all data objects directly in that scope with "automatic storage duration." Behave as if means it can do something different as long as the program being compiled can't tell the difference (or could only tell the difference by doing something whose behavior is undefined). For instance, if a variable is declared at function scope but only used within one subblock, the compiler can collapse its live range to that subblock, and probably will, because this makes register allocation easier.
An implementation is not required to do anything when control exits a function or block scope. The lifetimes of all automatic-storage-duration objects directly in that scope end, but no program can tell that this has happened without triggering undefined behavior.
There is no requirement for a C implementation to have a stack, and a stack is not the only way to implement the above requirements. See for instance "Cheney on the M.T.A." and c2:SpaghettiStack.
C implementations that do have a stack will normally try to avoid adjusting the stack pointer in the middle of a function, for reasons too complicated to go into here. This can mean that a value with block scope survives on the stack longer than its declared lifetime, but it's still undefined behavior to access it. The compiler is allowed to recycle storage for values that are no longer in scope, but it is also allowed to recycle storage for values that are still in scope but will not be accessed anymore ("dead" in compiler jargon). Historically compilers have been much more aggressive about doing that for values in registers than for values in stack slots, but again, that's a distinction that doesn't necessarily exist on your implementation.
I've been trying to get into the habit of defining trivial variables at the point they're needed. I've been cautious about writing code like this:
while (n < 10000) {
int x = foo();
[...]
}
I know that the standard is absolutely clear that x exists only inside the loop, but does this technically mean that the integer will be allocated and deallocated on the stack with every iteration? I realise that an optimising compiler isn't likely to do this, but it that guaranteed?
For example, is it ever better to write:
int x;
while (n < 10000) {
x = foo();
[...]
}
I don't mean with this code specifically, but in any kind of loop like this.
I did a quick test with gcc 4.7.2 for a simple loop differing in this way and the same assembly was produced, but my question is really are these two, according to the standard, identical?
Note that "allocating" automatic variables like this is pretty much free; on most machines it's either a single-instruction stack pointer adjustment, or the compiler uses registers in which case nothing needs to be done.
Also, since the variable remains in scope until the loop exits, there's absolutely no reason to "delete" (=readjust the stack pointer) it until the loop exits, I certainly wouldn't expect there to be any overhead per-iteration for code like this.
Also, of course the compiler is free to "move" the allocation out of the loop altogether if it feels like it, making the code equivalent to your second example with the int x; before the while. The important thing is that the first version is easier to read and more tighly localized, i.e. better for humans.
Yes, the variable x inside the loop is technically defined on each iteration, and initialized via the call to foo() on each iteration. If foo() produces a different answer each time, this is fine; if it produces the same answer each time, it is an optimization opportunity – move the initialization out of the loop. For a simple variable like this, the compiler typically just reserves sizeof(int) bytes on the stack — if it can't keep x in a register — that it uses for x when x is in scope, and may reuse that space for other variables elsewhere in the same function. If the variable was a VLA — variable length array — then the allocation is more complex.
The two fragments in isolation are equivalent, but the difference is the scope of x. In the example with x declared outside the loop, the value persists after the loop exits. With x declared inside the loop, it is inaccessible once the loop exits. If you wrote:
{
int x;
while (n < 10000)
{
x = foo();
...other stuff...
}
}
then the two fragments are near enough equivalent. At the assembler level, you'll be hard pressed to spot the difference in either case.
My personal point of view is that once you start worrying about such micro-optimisations, you're doomed to failure. The gain is:
a) Likely to be very small
b) Non-portable
I'd stick with code that makes your intention clear (i.e. declare x inside the loop) and let the compiler care about efficiency.
There is nothing in the C standard that says how the compiler should generate code in either case. It could adjust the stack pointer on every iteration of the loop if it fancies.
That being said, unless you start doing something crazy with VLAs like this:
void bar(char *, char *);
void
foo(int x)
{
int i;
for (i = 0; i < x; i++) {
char a[i], b[x - i];
bar(a, b);
}
}
the compiler will most likely just allocate one big stack frame at the beginning of the function. It's harder to generate code for creating and destroying variables in blocks instead of just allocating all you need at the beginning of the function.
I know that sometimes if you don't initialize an int, you will get a random number if you print the integer.
But initializing everything to zero seems kind of silly.
I ask because I'm commenting up my C project and I'm pretty straight on the indenting and it compiles fully (90/90 thank you Stackoverflow) but I want to get 10/10 on the style points.
So, the question: when is it appropriate to initialize, and when should you just declare a variable:
int a = 0;
vs.
int a;
There are several circumstances where you should not initialize a variable:
When it has static storage duration (static keyword or global var) and you want the initial value to be zero. Most compilers will actually store zeros in the binary if you explicitly initialize, which is usually just a waste of space (possibly a huge waste for large arrays).
When you will be immediately passing the address of the variable to another function that fills its value. Here, initializing is just a waste of time and may be confusing to readers of the code who wonder why you're storing something in a variable that's about to be overwritten.
When a meaningful value for the variable can't be determined until subsequent code has completed execution. In this case, it's actively harmful to initialize the variable with a dummy value such as zero/NULL, as this prevents the compiler from warning you if you have some code paths where a meaningful value is never assigned. Compilers are good at warning you about accessing uninitialized variables, but can't warn you about "still contains dummy value" variables.
Aside from these issues, I'd say it's generally good practice to initialize your non-static variables when possible.
A rule that hasn't been mentioned yet is this: when the variable is declared inside a function it is not initialised, and when it is declared in static or global scope it's set to 0:
int a; // is set to 0
void foo() {
int b; // set to whatever happens to be in memory there
}
However - for readability I would usually initialise everything at declaration time.
If you're interested in learning this sort of thing in detail, I'd recommend this presentation and this book
I can think of a couple of reason off the top of my head:
When you're going to be initializing it later on in your code.
int x;
if(condition)
{
func();
x = 2;
}
else
{
x = 3;
}
anotherFunc(x); // x will have been set a value no matter what
When you need some memory to store a value set by a function or another piece of code:
int x; // Would be pointless to give x a value here
scanf("%d", &x);
If the variable is in the scope of of a function and not a member of a class I always initialize it because otherwise you will get warnings. Even if this variable will be used later I prefer to assign it on declaration.
As for member variables, you should initialize them in the constructor of your class.
For pointers, always initialize them to some default, particularly NULL, even if they are to be used later, they are dangerous when uninitialized.
Also it is recommended to build your code with the highest level of warnings that your compiler supports, it helps to identify bad practices and potential errors.
Static and global variables will be initialized to zero for you so you may skip initialization. Automatic variables (e.g. non-static variables defined in function body) may contain garbage and should probably always be initialized.
If there is a non-zero specific value you need at initialization then you should always initialize explicitly.
It's always good practice to initialize your variables, but sometimes it's not strictly necessary. Consider the following:
int a;
for (a = 0; a < 10; a++) { } // a is initialized later
or
void myfunc(int& num) {
num = 10;
}
int a;
myfunc(&a); // myfunc sets, but does not read, the value in a
or
char a;
cin >> a; // perhaps the most common example in code of where
// initialization isn't strictly necessary
These are just a couple of examples where it isn't strictly necessary to initialize a variable, since it's set later (but not accessed between declaration and initialization).
In general though, it doesn't hurt to always initialize your variables at declaration (and indeed, this is probably best practice).
In general, there's no need to initialize a variable, with 2 notable exceptions:
You're declaring a pointer (and not assigning it immediately) - you
should always set these to NULL as good style and defensive
programming.
If, when you declare the variable, you already know
what value is going to be assigned to it. Further assignments use up
more CPU cycles.
Beyond that, it's about getting the variables into the right state that you want them in for the operation you're going to perform. If you're not going to be reading them before an operation changes their value (and the operation doesn't care what state it is in), there's no need to initialize them.
Personally, I always like to initialize them anyway; if you forgot to assign it a value, and it's passed into a function by mistake (like a remaining buffer length) 0 is usually cleanly handled - 32532556 wouldn't be.
There is absolutely no reason why variables shouldn't be initialised, the compiler is clever enough to ignore the first assignment if a variable is being assigned twice. It is easy for code to grow in size where things you took for granted (such as assigning a variable before being used) are no longer true. Consider:
int MyVariable;
void Simplistic(int aArg){
MyVariable=aArg;
}
//Months later:
int MyVariable;
void Simplistic(int aArg){
MyVariable+=aArg; // Unsafe, since MyVariable was never initialized.
}
One is fine, the other lands you in a heap of trouble. Occasionally you'll have issues where your application will run in debug mode, but release mode will throw an exception, one reason for this is using an uninitialised variable.
As long as I have not read from a variable before writing to it, I have not had to bother with initializing it.
Reading before writing can cause serious and hard to catch bugs. I think this class of bugs is notorious enough to gain a mention in the popular SICP lecture videos.
Initializing a variable, even if it is not strictly required, is ALWAYS a good practice. The few extra characters (like "= 0") typed during development may save hours of debugging time later, particularly when it is forgotten that some variables remained uninitialized.
In passing, I feel it is good to declare a variable close to its use.
The following is bad:
int a; // line 30
...
a = 0; // line 40
The following is good:
int a = 0; // line 40
Also, if the variable is to be overwritten right after initialization, like
int a = 0;
a = foo();
it is better to write it as
int a = foo();