Strange stack behavior in C - c

I'm worried that I am misunderstanding something about stack behavior in C.
Suppose that I have the following code:
int main (int argc, const char * argv[])
{
int a = 20, b = 25;
{
int temp1;
printf("&temp1 is %ld\n" , &temp1);
}
{
int temp2;
printf("&temp2 is %ld\n" , &temp2);
}
return 0;
}
Why am I not getting the same address in both printouts? I am getting that temp2 is one int away from temp1, as if temp1 was never recycled.
My expectation is for the stack to contain 20, and 25.
Then have temp1 on top, then have it removed, then have temp2 on top, then have it removed.
I am using gcc on Mac OS X.
Note that I am using the -O0 flag for compiling without optimizations.
Tho those wondering about the background for this question: I am preparing teaching materials on C, and I am trying to show the students that they should not only avoid returning pointers to automatic variables from functions, but also to avoid taking the address of variables from nested blocks and dereferencing them outside. I was trying to demonstrate how this causes problems, and couldn't get the screenshot.

The compiler is completely within its rights not to optimize temp1 and temp2 into the same location. It has been many years since compilers generated code for one stack operation at a time; these days the whole stack frame is laid out at one go. (A few years back a colleague and I figured out a particularly clever way to do this.) Naive stack layout probably puts each variable in its own slot, even when, as in your example, their lifetimes don't overlap.
If you're curious, you might get different results with gcc -O1 or gcc -O2.

There is no guarantee what address stack objects will receive regardless of the order they are declared.
The compiler can happily reorder the creation and duration of stack variables providing it does not affect the results of the function.

I believe the C standard just talks about the scope and lifetime of variables defined in a block. It makes no promises about how the variables interact with the stack or if a stack even exists.

I remember reading something about it. All I have now is this obscure link.
Just to let everybody know (and for the sake of the archives), it appears to be our kernel extension is running into a known limitation of GCC. Just to recap, we have a function in a very portable, very lightweight library, that for some reason is getting compiled with a 1600+ byte stack when compiled on/for Darwin. No matter what compiler options I tried, and what optimization levels I used, the stack was no smaller than 1400 "machine check" panic in pretty reproducible (but not frequent) situations.
After a lot of searching on the Web, learning some i386 assembly and talking to some people who are much better at assembly, I have learned that GCC is somewhat notorious for having horrid stack allocation. [...]
Apparently this is gcc's dirty little secret, except it's not much of a secret to some--Linus Torvalds has complained several times on various lists about the gcc stack allocation (search lkml.org for "gcc stack usage"). Once I knew what to search for, there was plenty of griping about gcc's subpar allocation of stack variables, and in particular, it's inability to re-use stack space for variables in different scopes.
With that said, my Linux version of gcc properly re-uses stack space, I get same address for both variables. Not sure what C standard says about it, but strict scope enforcement is only important for code correctness in C++ (due to destruction at the end of the scope), but not in C.

There is no standard that sets how variables are placed on the stack. What happens in the compiler is much more complicated. In your code, the compiler may even choose to completely ignore and suppress variables a and b.
During the many stages of the compiler, the code may be converted to it's SSA form, and all stack variables lose their addresses and meanings in this form (it may even make it harder for the debugger).
Stack space is very cheap, in the sense that the time to allocate either 2 or 20 variables is constant. Also, stack space is very dynamic for most function calls, since with the exception of a few functions (those nearer main() and thread-entry functions, with long-lived event loops or so), they tend to complete quickly. So, you just don't bother with them.

This is completely dependent on the compiler and how it is configured.

Related

How much memory is used in the following C program?

Suppose int size is 4 bytes.
Following the Code snippet in C, how much bytes is requested to store the variables?
* I read that some can be stored in the registers / stack, but I asked for the total size, therefore it doesn't matter.
{
int a,b;
{
int c;
}
{
int d, e;
}
}
Thanks in advance.
You should not care, and it depends a lot upon the optimization flags and the compiler.
A variable could stay entirely in a processor register, and then it does not consume memory (and sometimes it does not appear in the generated machine code, because the compiler figured out that it is useless). But read about the call stack and call frames and register allocation. Of course, a common sense rule is to avoid huge call frames (e.g. avoid declaring very large automatic variables such as double hugelocalarr[1000000];). A reasonable call frame should (in general) be at most a kilobyte or a few of them (often, the total call stack should not exceed a megabyte or a few of them, and you need to think about recursive functions or deeply nested calls).
In practice, if you compile with GCC, look into the command options such as -Wstack-usage=X (use it with various optimization flags, such as -O1 or -O2 ...) etc... You'll get warnings about functions using a lot of stack (more than X bytes).
Be also aware of tail calls. Recent compilers are sometimes able to cleverly optimize them. And think also of inline expansion. Compilers are able to do that when optimizing (even without any inline keyword).
Read the C is not a low-level language paper by David Chisnall.

How to undeclare (delete) variable in C?

Like we do with macros:
#undef SOMEMACRO
Can we also undeclare or delete the variables in C, so that we can save a lot of memory?
I know about malloc() and free(), but I want to delete the variables completely so that if I use printf("%d", a); I should get error
test.c:4:14: error: ‘a’ undeclared (first use in this function)
No, but you can create small minimum scopes to achieve this since all scope local variables are destroyed when the scope is exit. Something like this:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
It's not a direct answer to the question, but it might bring some order and understanding on why this question has no proper answer and why "deleting" variables is impossible in C.
Point #1 What are variables?
Variables are a way for a programmer to assign a name to a memory space. This is important, because this means that a variable doesn't have to occupy any actual space! As long as the compiler has a way to keep track of the memory in question, a defined variable could be translated in many ways to occupy no space at all.
Consider: const int i = 10; A compiler could easily choose to substitute all instances of i into an immediate value. i would occupy 0 data memory in this case (depending on architecture it could increase code size). Alternatively, the compiler could store the value in a register and again, no stack nor heap space will be used. There's no point in "undefining" a label that exists mostly in the code and not necessarily in runtime.
Point #2 Where are variables stored?
After point #1 you already understand that this is not an easy question to answer as the compiler could do anything it wants without breaking your logic, but generally speaking, variables are stored on the stack. How the stack works is quite important for your question.
When a function is being called the machine takes the current location of the CPU's instruction pointer and the current stack pointer and pushes them into the stack, replacing the stack pointer to the next location on stack. It then jumps into the code of the function being called.
That function knows how many variables it has and how much space they need, so it moves the frame pointer to capture a frame that could occupy all the function's variables and then just uses stack. To simplify things, the function captures enough space for all it's variables right from the start and each variable has a well defined offset from the beginning of the function's stack frame*. The variables are also stored one after the other.
While you could manipulate the frame pointer after this action, it'll be too costly and mostly pointless - The running code only uses the last stack frame and could occupy all remaining stack if needed (stack is allocated at thread start) so "releasing" variables gives little benefit. Releasing a variable from the middle of the stack frame would require a defrag operation which would be very CPU costly and pointless to recover few bytes of memory.
Point #3: Let the compiler do its job
The last issue here is the simple fact that a compiler could do a much better job at optimizing your program than you probably could. Given the need, the compiler could detect variable scopes and overlap memory which can't be accessed simultaneously to reduce the programs memory consumption (-O3 compile flag).
There's no need for you to "release" variables since the compiler could do that without your knowledge anyway.
This is to complement all said before me about the variables being too small to matter and the fact that there's no mechanism to achieve what you asked.
* Languages that support dynamic-sized arrays could alter the stack frame to allocate space for that array only after the size of the array was calculated.
There is no way to do that in C nor in the vast majority of programming languages, certainly in all programming languages that I know.
And you would not save "a lot of memory". The amount of memory you would save if you did such a thing would be minuscule. Tiny. Not worth talking about.
The mechanism that would facilitate the purging of variables in such a way would probably occupy more memory than the variables you would purge.
The invocation of the code that would reclaim the code of individual variables would also occupy more space than the variables themselves.
So if there was a magic method purge() that purges variables, not only the implementation of purge() would be larger than any amount of memory you would ever hope to reclaim by purging variables in your program, but also, in int a; purge(a); the call to purge() would occupy more space than a itself.
That's because the variables that you are talking about are very small. The printf("%d", a); example that you provided shows that you are thinking of somehow reclaiming the memory occupied by individual int variables. Even if there was a way to do that, you would be saving something of the order of 4 bytes. The total amount of memory occupied by such variables is extremely small, because it is a direct function of how many variables you, as a programmer, declare by hand-typing their declarations. It would take years of typing on a keyboard doing nothing but mindlessly declaring variables before you would declare a number of int variables occupying an amount of memory worth speaking of.
Well, you can use blocks ({ }) and defining a variable as late as possible to limit the scope where it exists.
But unless the variable's address is taken, doing so has no influence on the generated code at all, as the compiler's determination of the scope where it has to keep the variable's value is not significantly impacted.
If the variable's address is taken, failure of escape-analysis, mostly due to inlining-barriers like separate compilation or allowing semantic interpositioning, can make the compiler assume it has to keep it alive till later in the block than strictly neccessary. That's rarely significant (don't worry about a handful of ints, and most often a few lines of code longer keeping it alive are insignificant), but best to keep it in mind for the rare case where it might matter.
If you are that concerned about the tiny amount of memory that is on the stack, then you're probably going to be interested in understanding the specifics of your compiler as well. You'll need to find out what it does when it compiles. The actual shape of the stack-frame is not specified by the C language. It is left to the compiler to figure out. To take an example from the currently accepted answer:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
This may or may not affect the memory usage of the function. If you were to do this in a mainstream compiler like gcc or Visual Studio, you would find that they optimize for speed rather than stack size, so they pre-allocate all of the stack space they need at the start of the function. They will do analysis to figure out the minimum pre-allocation needed, using your scoping and variable-usage analysis, but those algorithms literally wont' be affected by extra scoping. They're already smarter than that.
Other compilers, especially those for embedded platforms, may allocate the stack frame differently. On these platforms, such scoping may be the trick you needed. How do you tell the difference? The only options are:
Read the documentation
Try it, and see what works
Also, make sure you understand the exact nature of your problem. I worked on a particular embedded project which eschewed the stack for everything except return values and a few ints. When I pressed the senior developers about this silliness, they explained that on this particular application, stack space was at more of a premium than space for globally allocated variables. They had a process they had to go through to prove that the system would operate as intended, and this process was much easier for them if they allocated everything up front and avoided recursion. I guarantee you would never arrive at such a convoluted solution unless you first knew the exact nature of what you were solving.
As another solution you could look at, you could always build your own stack frames. Make a union of structs, where each struct contains the variables for one stack frame. Then keep track of them yourself. You could also look at functions like alloca, which can allow for growing the stack frame during the function call, if your compiler supports it.
Would a union of structs work? Try it. The answer is compiler dependent. If all variables are stored in memory on your particular device, then this approach will likely minimize stack usage. However, it could also substantially confuse register coloring algorithms, and result in an increase in stack usage! Try and see how it goes for you!

Economizing on variable use

I am working on some (embedded) device, recently I just started thinking maybe to use less memory, in case stack size isn't that big.
I have long functions (unfortunately).
And inside I was thinking to save space in this way.
Imagine there is code
1. void f()
2. {
3. ...
4. char someArray[300];
5. char someOtherArray[300];
6. someFunc(someArray, someOtherArray);
7. ...
8. }
Now, imagine, someArray and someOtherArray are never used in f function beyond line: 6.
Would following save some stack space??
1. void f()
2. {
3. ...
4. {//added
5. char someArray[300];
6. char someOtherArray[300];
7. someFunc(someArray, someOtherArray);
8. }//added
9. ...
8. }
nb: removed second part of the question
For the compiler proper both are exactly the same and thus makes no difference. The preprocessor would replace all instances of TEXT1 with the string constant.
#define TEXT1 "SomeLongStringLiteral"
someFunc(TEXT1)
someOtherFunc(TEXT1)
After the preprocessor's job is done, the above snippet becomes
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
Thus it makes no difference performance or memory-wise.
Aside: The reason #define TEXT1 "SomeLongStringLiteral" is done is to have a single place to change all instances of TEXT1s usage; but that's a convinience only for the programmer and has no effect on the produced output.
recently I just started thinking maybe to use less memory, in case stack size isn't that big.
Never micro optimise or prematurely optimise. In case the stack size isn't that big, you'll get to know it when you benchmark/measure it. Don't make any assumptions when you optimise; 99% of the times it'd be wrong.
I am working on some device
Really? Are you? I wouldn't have thought that.
Now, imagine, someArray and someOtherArray are never used in f function beyond line 6. Would following save some stack space?
On a good compiler, it wouldn't make a difference. By the standard, it isn't specified if it saves or not, it isn't even specified if there is a stack or not.
But on a not so good compiler, the one with the additional {} may be better. It is worth a test: compile it and look at the generated assembler code.
it seems my compiler doesn't allow me to do this (this is C), so never mind...
But it should so. What happens then? Maybe you are just confusing levels of {} ...
I'll ask another one here.
Would better be a separate question...
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
vs.
someFunc(TEXT1)
someOtherFunc(TEXT1)
A #define is processed before any compilation step, so it makes absolutely no difference.
If it happens within the same compilation unit, the compiler will tie them together anyway. (At least, in this case. On an ATXmega, if you use PSTR("whatever") for having them in flash space only, each occurrence of them will be put into flash separately. But that's a completely different thing...)
Modern compilers should push stack variables before they are used, and pop them when they are no longer needed. The old thinking with { ... } marking the start and end of a stack push/pop should be rather obsolete by now.
Since 1999, C allows stack variables to be allocated anywhere and not just immediately after a {. C++ allowed this far earlier. Today, where the local variable is declared inside the scope has little to do with when it actually starts to exist in the machine code. And similarly, the } has little to do with when is ceases to exist.
So regarding adding extra { }, don't bother. It is premature optimization and only adds pointless clutter.
Regarding the #define it absolutely makes no difference in terms of efficiency. Macros are just text replacement.
Furthermore, from the generic point-of-view, data must always be allocated somewhere. Data used by a program cannot be allocated in thin air! That's a very common misunderstanding. For example, many people incorrectly believe that
int x = func();
if(x == something)
consumes more memory than
if(func() == something)
But both examples compile into identical machine code. The result of func must be stored somewhere, it cannot be stored in thin air. In the first example, the result is stored in a memory segment that the programmer may refer to as x.
In the second example, it is stored in the very same memory segment, taking up the same amount of space, for the same duration of program execution. The only difference is that the memory segment is anonymous and the programmer has no name for it. As far as the machine code is concerned, that doesn't matter, since no variable names exist in machine code.
And this would be why every professional C programmer needs to understand a certain amount of assembler. You cannot hope to ever do any kind of manual code optimization if you don't.
(Please don't ask two questions in one, this is really annoying since you get two types of answer for your two different questions.)
For your first question. Probably putting {} around the use of a variable will not help. The lifetime of automatic variables that are not VLA (see below) is not bound to the scope in which it is declared. So compilers may have a hard time in figuring out how the use of the stack may be optimized, and maybe don't do such an optimization at all. In your case this is most likely the case, since you are exporting pointers to your data to a function that is perhaps not visible, here. The compiler has no way to figure out if there is a valid use of the arrays later on in the code.
I see two ways to "force" the compiler into optimizing that space, functions or VLA. The first, functions is simple: instead of putting the block around the code, put it in a static function. Function calls are quite optimized on modern platforms, and here the compiler knows exactly how he may clear the stack at the end.
The second alternative in your case is a VLA, variable length array, if you compiler supports that c99 feature. Arrays that have a size that doesn't depend on a compile time constant have a special rule for their lifetime. That lifetime exactly ends at the end of the scope where they are defined. Even a const-qualified variable could be used for that:
{
size_t const len = 300;
char someArray[len];
char someOtherArray[len];
someFunc(someArray, someOtherArray);
}
At the end, on a given platform, you'd really have to inspect what assembler your compiler produces.

Why does C not define minimum size for an array?

C standard defines a lot of lower/upper limits (translation limits) and imposes an implementation should satisfy for each translation. Why there's no such minimum limit defined for an array size? The following program is going to compile fine and likely produce runtime error/segfault and would invoke undefined behaviour.
int main()
{
int a[99999999];
int i;
for(i=0;i<99999999;i++)
a[i]=i;
return 0;
}
A possible reason could be local arrays are allocated on automatic storage and it depends on the size of the stack frame allocated. But why not a minimum limit like other limits defined by C?
Let's forget about the undefined cases like above. Consider the following:
int main()
{
int a[10];
int i;
for(i=0;i<10;i++)
a[i]=i;
return 0;
}
In the above, what gives me the guarantee that the local array (despite a very small one) is going to work as expected and won't cause undefined behaviour due to allocation failure?
Although it's unlikely that an allocation for such a small array would fail on any modern systems. But the C standard doesn't define any requirements to satisfy and compilers don't (at least GCC doesn't) report allocation failures. Only a runtime error/undefined behaviour is possibility. The hard part is nobody can tell whether an arbitrary sized array is going cause undefined behaviour due to allocation failure.
Note that I am aware I can use dynamic arrays (via malloc & friends) for this purpose and have a better control over allocation failures. I am more interested in why there's no such limit defined for local arrays. Also, global arrays are going to be stored in static storage and is going to increase executable size which compilers can handle.
Because C, the language, should not be imposing limitations on your available stack size. C operates in many (many) different environments. How could it possibly come up with a reasonable number? Hell, automatic storage duration != stack, a stack is an implementation detail. C, the language, says nothing of a "stack".
The environment decides this stuff, and for good reason. What if a certain environment implements automatic storage duration via an alternative method which imposes no such limitation? What if a breakthrough in hardware occurs and all of a sudden modern machines do not require such a limitation?
Should we rev the standard in such an event? We would have to if C, the language, specified such implementation details.
You've already answered your own question; it's due to stack limitation.* Even this might not work:
void foo(void) {
int a;
...
}
if the ... is actually a recursive call to foo.
In other words, this is nothing to do with arrays, as the same problem affects all local variables. The standard couldn't enforce a requirement, because in practice that would translate into a requirement for an infinite-sized stack.
* Yes, I know the C standard(s) don't talk about stacks. But that's the implicit model, in the sense that the standard was really a formalisation of the implementations that existed at the time.
The MINIMUM limit is an array of 1 element. Why would you have a "limit" for that? Of course, if you call a function recursively forever, an array of 1 may not fit on the stack, or the call that calls the function next call around may not fit on the stack - the only way to solve that would be to know the size of the stack in the compiler - but the compiler doesn't actually know at that stage how big the stack is - never mind the problems of extremely complex call hierarchies were several different functions call into the same function, possibly with recursion and/or several layers of rather large consumers of stack - how do you size the stack for that - the worst possible case may not be ever encountered, because other things dictate that this doesn't happen - for example, the worst case in one function is only when an input file is empty, but the worst case in another function is when there is lots of data stored in the same file. Lots and lots of variations like this. It's just too unreliable to determine, so sooner or later it would just become guess-work or of lots of false positives.
Consider a program with thousands of functions, all of which call the same logging function that needs a 200 byte array on the stack for temporarily storing the log output. It's called from just about every function from main upwards.
The MAXIMUM for a local variable depends on the size of the stack, which, like I said above, is not something the compiler knows when compiling your code [the linker MAY know, but that's later on]. For global arrays and those allocated on the heap, the limit is "how much memory your process can get", so there's no upper limit there.
There's just no easy way to determine this. And many of the limits provided by the standard is there to guarantee that code can be compiled on "any compiler" as long as your code follows the rules. Be compiled and be able to run to completion is two different things.
int main()
{
while(1);
}
will never run to completion - but it will compile in every compiler I know of, and most won't say a thing about there being an infinite loop - it's your choice to do that.
It's also your choice to put large arrays on the stack. And it could well be that the linker is given several gigabytes of stack, in which case it'll be fine - or the stack is 200K, and you can't have 50000 array of integers...

Is variable declaration within a loop bad?

I'm referring to the main static languages today (C, C++, java, C#,). I've heard some contradicting answers about this, so I wanted to know:
If I have some code such as:
loop(...) {
type x = val;
...
}
('loop' is some type of loop, e.g. for, while)
Will it cause memory allocation in each iteration of the loop, or just once? Is it different from writing this:
type x;
loop(...) {
x = val;
...
}
where memory is only allocated once for x?
The strictly correct answer is that it depends on the implementation, as both are semantically correct. No language specification would require or prohibit such implementation details.
That said, any implementation worth its salt will be able to reuse the same stack slot or even CPU register (with native compilation, especially likely in presence of a JIT). Even the bytecode will likely be completely identical.
And finally, there's that thing with premature optimization... Unless proven otherwise, you shouldn't even bother thinking about low-level details like this (if you think knowledge and control over such issues matters, perhaps you should just program in assembler), because:
Unless you're doing a microbenchmark (or a really huge number-crunching task - but how many people freaking out about performance actually do those?), you won't even notice any difference even if it isn't optimized. If you're doing anything of interest in the loop body, it will dwarf the difference (again, if any). Especially if you're doing any I/O.
Even if there is memory allocation, it boils down to pushing and popping a few bytes on the native stack, which in turn boils down to adding an integer constant to a hardware register. All C and C++ programs use that stack for their local variables, and non of those ever complained about its performance... if you have to reserve space, you can't get faster than using the stack.
If you have to ask this kind of question, you're not someone who could do anything about it. Those people know to just (1) measure it, (2) look at the generated code and (3) look for large-scale optimizations before even thinking on this level ;)

Resources