Variable reuse in C - c

The code I'm looking at is this:
for (i = 0; i < linesToFree; ++i ){
printf("Parsing line[%d]\n", i);
memset( &line, 0x00, 65 );
strcpy( line, lines[i] );
//get Number of words:
int numWords = 0;
tok = strtok(line , " \t");
while (tok != NULL) {
++numWords;
printf("Number of words is: %d\n", numWords);
println(tok);
tok = strtok(NULL, " \t");
}
}
My question centers around the use of numWords. Does the runtime system reuse this variable or does it allocate a new int every time it runs through the for loop? If you're wondering why I'm asking this, I'm a Java programmer by trade who wants to get into HPC and am therefore trying to learn C. Typically I know you want to avoid code like this, so this question is really exploratory.
I'm aware the answer is probably reliant upon the compiler... I'm looking for a deeper explanation than that. Assume the compiler of your choice.

Your conception about how this works in Java might be misinformed - Java doesn't "allocate" a new int every time through a loop like that either. Primitive type variables like int aren't allocated on the Java heap, and the compiler will reuse the same local storage for each loop iteration.
On the other hand, if you call new anything in Java every time through a loop, then yes, a new object will be allocated every time. However, you're not doing that in this case. C also won't allocate anything from the heap unless you call malloc or similar (or in C++, new).

Please note the difference between automatic and dynamic memory allocation. In Java only the latter exists.
This is automatic allocation:
int numWords = 0;
This is dynamic allocation:
int *pNumWords = malloc(sizeof(int));
*pNumWords = 0;
The dynamic allocation in C only happens explicitly (when you call malloc or its derivatives).
In your code, only the value is set to your variable, no new one is allocated.

From a performance standpoint, it's not going to matter. (Variables map to registers or memory locations, so it has to be reused.)
From a logical standpoint, yes, it will be reused because you declared it outside the loop.
From a logical standpoint:
numWords will not be reused in the outer loop because it is declared inside it.
numWords will be reused in the inner loop because it isn't declared inside.

This is what is called "block", "automatic" or "local" scope in C. It is a form of lexical scoping, i.e., a name refers to its local environment. In C, it is top down, meaning that it happens as the file is parsed and compiled and visible only after defined in the program.
When the variable goes out of scope, the lexical name is no longer valid (visible) and the memory may be reused.
The variable is declared in a local scope or a block defined by curly braces { /* block */ }. This defines a whole group of C and C99 idioms, such as:
for(int i=0; i<10; ++i){ // C99 only. int i is local to the loop
// do something with i
} // i goes out of scope here...
There are subtleties, such as:
int x = 5;
int y = x + 10; // this works
int x = y + 10;
int y = 5; // compiler error
and:
int g; // static by default and init to 0
extern int x; // defined and allocated elsewhere - resolved by the linker
int main (int argc, const char * argv[])
{
int j=0; // automatic by default
while (++j<=2) {
int i=1,j=22,k=3; // j from outer scope is lexically redefined
for (int i=0; i<10; i++){
int j=i+10,k=0;
k++; // k will always be 1 when printed below
printf("INNER: i=%i, j=%i, k=%i\n",i,j,k);
}
printf("MIDDLE: i=%i, j=%i, k=%i\n",i,j,k); // prints middle j
}
// printf("i=%i, j=%i, k=%i\n",i,j,k); compiler error
return 0;
}
There are idiosyncrasies:
In K&R C, ANSI C89, and Visual Studio, All variables must be declared at the beginning of the function or compound statement (i.e., before the first statement)
In gcc, Variables may be declared anywhere in the function or compound statement and is only visible from that point on.
In C99 and C++, Loop variables may be declared in for statement and are visible until end of loop body.
In a loop block, the allocation is performed ONCE and the RH assignment (if any) is performed each time.
In the particular example you posted, you enquired about int numWords = 0; and if a new int is allocated each time through the loop. No, there is only one int allocated in a loop block, but the right hand side of the = is executed every time. This can be demonstrated so:
#include <stdio.h>
#include <time.h>
#include <unistd.h>
volatile time_t ti(void){
return time(NULL);
}
void t1(void){
time_t t1;
for(int i=0; i<=10; i++){
time_t t2=ti(); // The allocation once, the assignment every time
sleep(1);
printf("t1=%ld:%p t2=%ld:%p\n",t1,(void *)&t1,t2,(void *)&t2);
}
}
Compile that with any gcc (clang, eclipse, etc) compatible compiler with optimizations off (-O0) or on. The address of t2 will always be the same.
Now compare with a recursive function:
int factorial(int n) {
if(n <= 1)
return 1;
printf("n=%i:%p\n",n,(void *)&n);
return n * factorial(n - 1);
}
The address of n will be different each time because a new automatic n is allocated with each recursive call.
Compare with an iterative version of factorial forced to used a loop-block allocation:
int fac2(int num) {
int r=0; // needed because 'result' goes out of scope
for (unsigned int i=1; i<=num; i++) {
int result=result*i; // only RH is executed after the first time through
r=result;
printf("result=%i:%p\n",result,(void *)&result); // address is always the same
}
return r;
}
In conclusion, you asked about int numWords = 0; inside the for loop. The variable is reused in this example.
The way the code is written, the programmer is relying on the RH of int numWords = 0; after the first to be executed and resetting the variable to 0 for use in the while loop that follows.

The scope of the numWords variable is inside the for loop. Just as Java, you can only use the variable inside the loop, so theoretically its memory would have to be freed on exit - since it is also on the stack in your case.
Any good compiler however would use the same memory and simply re-set the variable to 0 on each iteration.
If you were using a class instead of an int, you would see the destructor being called every time the for loops.
Even consider this:
class A;
A* pA = new A;
delete pA;
pA = new A;
The two objects created here will probably reside at the same memory.

It will be allocated every time through the loop (the compiler can optimize out that allocation)
for (i = 0; i < 100; i++) {
int n = 0;
printf("%d : %p\n", i, (void*)&n);
}
No guarantees all 100 lines will have the same address (though probably they will).
Edit: The C99 Standard, in 6.2.4/5 says: "[the object] lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." and, in 6.8.5/5, it says that the body of a for statement is in fact a block ... so the paragraph 6.2.4/5 applies.

Related

C - Avoiding warning "address of stack memory associated with local variable returned"

I've written the following simple program that sums up the numbers from 0 to 9:
#include <stdio.h>
#include <stdlib.h>
int* allocArray() {
int arr[10];
return arr;
}
int main(void){
int* p;
int summe = 0;
p = allocArray();
for(int i = 0; i != 10; ++i) {
p[i] = i;
}
for(int i = 0; i != 10; ++i) {
summe += p[i];
}
printf("Sum = %d\n", summe);
}
The code compiles and delivers the expected result "45". However I get the following warning: 'address of stack memory associated with local variable
'arr' returned'. What am I doing wrong?
This is undefined behaviour, plain and simple. The only reason it "works" is because with this particular compiler the stack hasn't been trashed yet, but it is living on borrowed time.
The lifetime of arr ends immediately when the function call is complete. After that point any pointers to arr are invalidated and cannot be used.1
Your compiler uses stack memory to store local variables, and the warning indicates that you're returning an address to a now-invalidated stack variable.
The only way to work around this is to allocate memory dynamically and return that:
int* allocArray() {
int* arr = calloc(10, sizeof(int));
return arr;
}
Where you're now responsible for releasing that memory later.
You can also use static to extend the lifetime of arr:
int* allocArray() {
static int arr[10];
return arr;
}
Though this is not without consequences, either, as now arr is shared, not unique to each function call.
1 Like a lot of things in C there's significant overlap between what you "cannot do" because they lead to undefined behaviour and/or crashes and what you "can do" because it's permitted by the syntax and compiler. It's often your responsibility to know the consequences of any code you write, implied or otherwise.
To keep it in your code:
int arr[10];
will allocate the memory on the stack. As soon as you are leaving the function, the content of that array will be overwritten pretty soon. You want to allocate this on the heap.
You would need to use
int* arr = malloc(sizeof(int)*10);
and in the main function, after you've used it (at the end of main), call
delete[] arr;
Nevertheless, this code could be better if the ownership of the array would be properly handled. You want to make yourself familiar with C++ containers and shared/unique pointers.

Declaring memory on stack overwrites previously declared memory

How can I allocate memory on the stack and have it point to different memory addresses so I can use it later? For example. this code:
for (int i = 0; i < 5; i++) {
int nums[5];
nums[0] = 1;
printf("%p\n", &nums[0]);
}
Will print out the same address every time. How can I write memory to stack (not the heap, no malloc) and have it not overwrite something else that's on the stack already.
You could use alloca to allocate a different array from the runtime stack for each iteration in the loop. The array contents will remain valid until you exit the function:
#include <stdlib.h>
#include <stdio.h>
void function() {
for (int i = 0; i < 5; i++) {
int *nums = alloca(5 * sizeof(*nums));
nums[0] = 1;
printf("%p\n", (void *)nums);
/* store the value of `num` so the array can be used elsewhere.
* the arrays must only be used before `function` returns to its caller.
*/
...
}
/* no need to free the arrays */
}
Note however that alloca() is not part of the C Standard and might not be available on all architectures. There are further restrictions on how it can be used, see the documentation for your system.
I believe you are looking for:
a way to control how memory is being allocated on the stack, at least in the context of not overwriting already-used memory
Of course, that's taken care by the OS! The low level system calls will make sure that a newly created automatic variable will not be written upon an already used memory block.
In your example:
for (int i = 0; i < 5; i++) {
int nums[5];
...
}
this is not the case, since nums will go out of scope, when the i-th iteration of the for loop terminates.
As a result, the memory block nums was stored into during the first iteration, will be marked as free when the second iteration initiates, which means that when nums of the first iteration is going to be allocated in the stack, it will not be aware of any existence of the nums of the first iteration, since that has gone already out of scope - it doesn't exist!

printing strings produces garbage even when (I think) they are null terminated

When I run print_puzzle(create_puzzle(input)), I get a bunch of gobbledegook at the bottom of the output, only in the last row. I have no idea why this keeps happening. The output is supposed to be 9 rows of 9 numbers (the input is a sudoku puzzle with zeroes representing empty spaces).
This bunch of code should take that input, make a 2d array of strings and then, with print_puzzle, print those strings out in a grid. They are string because eventually I will implement a way to display all the values the square could possibly be. But for now, when I print it out, things are screwed up. I even tried putting the null value in every single element of all 81 strings but it still get's screwed up when it goes to print the strings. I'm lost!
typedef struct square {
char vals[10]; // string of possible values
} square_t;
typedef struct puzzle {
square_t squares[9][9];
} puzzle_t;
static puzzle_t *create_puzzle(unsigned char vals[9][9]) {
puzzle_t puz;
puzzle_t *p = &puz;
int i, j, k, valnum;
for (i = 0; i < 9; i++) {
for (j = 0; j < 9; j++) {
puz.squares[i][j].vals[0] = '\0';
puz.squares[i][j].vals[1] = '\0';
puz.squares[i][j].vals[2] = '\0';
puz.squares[i][j].vals[3] = '\0';
puz.squares[i][j].vals[4] = '\0';
puz.squares[i][j].vals[5] = '\0';
puz.squares[i][j].vals[6] = '\0';
puz.squares[i][j].vals[7] = '\0';
puz.squares[i][j].vals[8] = '\0';
puz.squares[i][j].vals[9] = '\0';
valnum = vals[i][j] -'0';
for (k = 0; k < 10; k++){
if ((char)(k + '0') == (char)(valnum + '0')){
char tmpStr[2] = {(char)(valnum +'0'),'\0'};
strcat(puz.squares[i][j].vals, tmpStr);
}
}
}
}
return p;
}
void print_puzzle(puzzle_t *p) {
int i, j;
for (i=0; i<9; i++) {
for (j=0; j<9; j++) {
printf(" %2s", p->squares[i][j].vals);
}
printf("\n");
}
}
In short:
In function create_puzzle(), you are returning a pointer to the local variable puz. Local variables are only known to function inside their own. So the content referenced by the pointer returned by create_puzzle is indeterminate.
More details:
In C++, local variables are usually generated as storage on a "stack" data structure. when create_puzzle() method is entered, its local variables come alive. A function's local variables will be dead when the method is over. An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can access it's original content. C++ is not a safe language, implementations let you make mistake and get away with it. Other memory-safe languages solve this problem by restricting your power. For example in C# you can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends.
This answer is very awesome:
Can a local variable's memory be accessed outside its scope?
In function create_puzzle(), you are returning a pointer of the type puzzle_t. But, the address of variable puz of the type puzzle_t is invalid once you return from the function.
Variables that are declared inside a function are local variables. They can be used only by statements that are inside that function. These Local variables are not known to functions outside their own, so returning an address of a local variable doesn't make sense as when the function returns, the local storage it was using on the stack is considered invalid by the program, though it may not get cleared right away. Logically, the value at puz is indeterminate, and accessing it results in undefined behavior.
You can make puz a global variable, and use it the way you are doing right now.
You are returning a local variable here:
return p;
Declare p and puz outside of the function, then it should work.
p point to local memory that is unavailable after the function ends. Returning that leads to problems. Instead allocate memory.
// puzzle_t puz;
// puzzle_t *p = &puz;
puzzle_t *p = malloc(sizeof *p);
assert(p);
Be sure to free() the memory after the calling code completes using it.

When is memory deallocated in c?

I am trying to find out when memory used for variables in c is freed. As an example, when would the integer i be freed in the following code snippet?
int function()
{
int i = 1;
// do some things
return 0;
}
In C, as in all other languages, lexically scoped variables, such as i here, are only valid within their scopes -- the scope of i is from its declaration through the closing brace of the function. When exactly they are freed is often not specified, but in practical C implementations local variables are allocated on the call stack and their memory is reused once the function returns.
Consider something like
int function()
{
int i; // beginning of i's scope
{
int j; // beginning of j's scope
...
} // end of j's scope
{
int k; // beginning of k's scope
...
} // end of k's scope
return 0; // all locals of the function are deallocated by the time it is exited
} // end of i's scope
Scope determines when the variables can be accessed by name and, for local (auto) variables, when their content can be validly accessed (e.g., if you set a pointer to the address of a local variable, dereferencing the pointer outside the variable's scope is undefined behavior). Deallocation is a somewhat different matter ... most implementations won't do anything at the end of j or k's scope to "deallocate" them, although they will likely reuse the same memory for both variables. When function returns, most implementations will "pop" all locals off the stack, along with the return address, by a single decrement of the stack pointer, in effect "deallocating" their memory ... although the memory is still right there on the stack, ready to be "allocated" to the next function that is called.
Note that the terminology of your question is somewhat confused ... variables have scope, but it's memory, not variables, that is allocated and deallocated. Some variables may not even have any memory allocated for them if, for instance, they are constants or are never used in the program. And only the memory for local variables is allocated or freed as described above ... the memory for static and file scope variables is never freed, and only allocated when the program is loaded. And there is other memory -- heap memory -- that is explicitly allocated and freed by your program (via calls to malloc/realloc/calloc/strdup/free etc.). But although heap memory can be referenced by pointer variables, the memory for the pointer variables themselves consists just of the reference (memory address), with the variables having either local or static/file scope.
It will be freed when it goes out of scope. Since it has function scope, this will happen when the function returns.
i is allocated on the stack. It is freed when return is executed.
Automatic variables are local to a scope in C:
A scope is basically marked by '{' '}' so when you are inside a {} pair you are inside a scope. Of course scopes can be nested.
This is why in older C standard the local variables had to be defined at the top of a scope because it made writing the C compiler easier (the compiler would see a '{' and then all the variables and then statements) so it knew the variables it had to deal with. This changed in C99 (I think?) and later on where you can define variables anywhere in between statements.
This can be quite helpful of course:
int foo()
{
int k = 0; /* inside the scope of foo() function */
for(; k < 10; k++) { /* don't need k = 0; here we set it to zero earlier */
int i = 1; /* initialize inside the scope of the for loop */
i = i * 2; /* do something with it */
printf ("k = %d, i = %d\n", k, i);
}
#if 0
printf ("i = %d\n", i); /* would cause an unknown identifier error
* because i would be out of scope, if you changed
* #if 0 to #if 1
*/
#endif
return 0;
}
int main()
{
foo();
foo();
return 0;
}
Notice that i = 2 all the time for the iterations of the loop in foo()
A more fun thing to see is how the keyword static modifies the persistence of the variable.
int bar()
{
int k = 0; /* inside the scope of bar() function */
for(; k < 10; k++) { /* don't need k = 0; here we set it to zero earlier */
static int i = 1; /* initialize inside the scope of the for loop */
i = i * 2; /* do something with it */
printf ("k = %d, i = %d\n", k, i);
}
#if 0
printf ("i = %d\n", i); /* would cause an unknown identifier error
* because i would be out of scope, if you changed
* #if 0 to #if 1
*/
#endif
return 0;
}
int main()
{
foo();
foo();
return 0;
}
Note the changes and note how a static variable is treated.
What would happen to the for loop if you put the keyword static in front of int k = 0; ?
Interesting uses
C macros are very useful. Sometimes you want to define a complicated local macro rather than a function for speed and overall simplicity. In a project I wanted to manipulate large bit maps, but using functions to perform 'and' 'or' 'xor' operations was a bit of a pain. The size of bitmaps was fixed so I created some macros:
#define BMPOR(m1, m2) do { \
int j; \
for (j = 0; j < sizeof(bitmap_t); j++ ) \
((char *)(m1))[j] |= ((char *)(m2))[j]; \
} while (0)
the do { } while(0) is a fun trick to let a scoped block be attached to if/for/while etc. without an issue, the block executes exactly once (since the while is checked at the END of the block), and when compiler sees while(0) it just removes the loop; and of course this lets you put a semi-colon at the end so IDEs and you (later) don't get confused about what it is.
The macro above was used as under:
int foo()
{
bitmap_t map_a = some_map(), map_b = some_other_map();
BITOR(map_a, map_b); /* or map_a and map_b and put the results in map_a */
}
here the do {} while(0) allowed a local scope with a local variable j, a for loop and anything else that I might have needed.
Variable gets deallocated when the variable gets out of scope.In your code variable i will get deallocated after return 0 statement is executed.
You can find more on variable scope here

Declaring variablle within a for loop

I was trying out the following C code:
void main()
{
int i;
for(i = 0; i< 10; i++)
{
int num;
printf("\nthe variable address is: %p", &num);
}
getch();
}
I had expected it to either throw an error or declare num multiple times but instead, the output shows the same value for &num, for all the iterations of the for loop.
What is the reason behind this behavior? It seems that irrespective of having the declaration in the for loop, the actual declaration/definition happens just once.
Can someone help me understand this behavior?
You're printing the address of a stack-allocated variable. The variable's scope is the for loop. Theoretically, the variable is created at the line int num; and its memory released at the closing for bracket. The memory layout is strictly compiler-dependent.
It might be that your compiler is smart enough to know it can reuse that memory, or it may be that memory is free and is chosen by the compiler for your variable storage.
It can also be that the optimizer is telling the compiler its okay to reuse num.
It's all down to the compiler, however, just because it has the same address doesn't mean it is only declared/defined once.
To help illustrate this, compare this:
int i;
int val = 0;
for(i = 0; i< 5; i++)
{
int num = val++;
printf("\nthe variable address is: %p", &num);
printf("\nthe value is: %d", num);
}
This again shows that num always has the same address, but also is initialised with a distinct value each iteration.
The idea with the stack is that its layout is defined at compile time; each stack variable maps to an address on the stack with the stack frame.
Another thing to hlep you get this is to consider that if each iteration "allocated" a new variable, how would a small machine handle a large loop?
See: Call Stack

Resources