How much memory is used in the following C program?

How much memory is used in the following C program? - c

Suppose int size is 4 bytes.
Following the Code snippet in C, how much bytes is requested to store the variables?
* I read that some can be stored in the registers / stack, but I asked for the total size, therefore it doesn't matter.
{
int a,b;
{
int c;
}
{
int d, e;
}
}
Thanks in advance.

You should not care, and it depends a lot upon the optimization flags and the compiler.
A variable could stay entirely in a processor register, and then it does not consume memory (and sometimes it does not appear in the generated machine code, because the compiler figured out that it is useless). But read about the call stack and call frames and register allocation. Of course, a common sense rule is to avoid huge call frames (e.g. avoid declaring very large automatic variables such as double hugelocalarr[1000000];). A reasonable call frame should (in general) be at most a kilobyte or a few of them (often, the total call stack should not exceed a megabyte or a few of them, and you need to think about recursive functions or deeply nested calls).
In practice, if you compile with GCC, look into the command options such as -Wstack-usage=X (use it with various optimization flags, such as -O1 or -O2 ...) etc... You'll get warnings about functions using a lot of stack (more than X bytes).
Be also aware of tail calls. Recent compilers are sometimes able to cleverly optimize them. And think also of inline expansion. Compilers are able to do that when optimizing (even without any inline keyword).
Read the C is not a low-level language paper by David Chisnall.

Related

Does C at first tries to assign a certain address? [duplicate]

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?

Stack organization is completely unspecified and is implementation specific. In practice, it depends a lot of the compiler (even of its version) and of optimization flags.
Some variables don't even sit on the stack (e.g. because they are just kept inside some registers, or because the compiler optimized them -e.g. by inlining, constant folding, etc..).
BTW, you could have some hypothetical C implementation which does not use any stack (even if I cannot name such implementation).
To understand more about stacks:
Read the wikipage on call stacks, tail calls, threads, and on continuations
Become familiar with your computer's architecture & instruction set (e.g. x86) & ABI, then ...
ask your compiler to show the assembler code and/or some intermediate compiler representations. If using GCC, compile some simple code with gcc -S -fverbose-asm (to get assembler code foo.s when compiling foo.c) and try several optimization levels (at least -O0, -O1, -O2 ....). Try also the -fdump-tree-all option (it dumps hundred of files showing some internal representations of the compiler for your source code). Notice that GCC also provides return address builtins
Read Appel's old paper on garbage collection can be faster than stack allocation, and understand garbage collection techniques (since they often need to inspect and possibly change some pointers inside call stack frames). To know more about GC, read the GC handbook.
Sadly, I know no low-level language (like C, D, Rust, C++, Go, ...) where the call stack is accessible at the language level. This is why coding a garbage collector for C is difficult (since GC-s need to scan the call stack pointers)... But see Boehm's conservative GC for a very practical and pragmatic solution.

Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.

Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

Why do variables are stored in an inverted order than the declaration order [duplicate]

I'm trying to understand how C allocates memory on stack. I always thought variables on stack could be depicted like structs member variables, they occupy successive, contiguous bytes block within the Stack. To help illustrate this issue I found somewhere, I created this small program which reproduced the phenomenon.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function(int *i) {
int *_prev_int = (int *) ((long unsigned int) i - sizeof(int)) ;
printf("%d\n", *_prev_int );
}
void main(void)
{
int x = 152;
int y = 234;
function(&y);
}
See what I'm doing? Suppose sizeof(int) is 4: I'm looking 4 bytes behind the passed pointer, as that would read the 4 bytes before where int y in the caller's stack.
It did not print the 152. Strangely when I look at the next 4 bytes:
int *_prev_int = (int *) ((long unsigned int) i + sizeof(int)) ;
and now it works, prints whatever in x inside the caller's stack. Why x has a lower address than y? Are stack variables stored upside down?

Almost all the processors architectures nowadays supports stack manipulation instruction (e.g LDM,STM instructions in ARM). Compilers with the help of those implements stack. In most of the cases when data is pushed into stack, stack pointer decrements (Growing Downwards) and Increments when data popped from stack.
So it depends on processor architecture and compiler how stack is implemented.

Depends on the compiler and platform. The same thing can be done in more than one way as long it is done consistently by a program (this case the compiler translation to assembly, i.e. machine code) and the platform supports it (good compilers try to optimize assembly to get the “most” of each platform).
A very good source to deeply understand what goes behind the scenes of c, what happens when compiling a program and why they happen, is the free book Reverse Engineering for Beginners (Understanding Assembly Language) by Dennis Yurichev, the latest version can be found at his site.

Why argument's size of function is increased to word size?

I read System V ABI for i386 and AMD64. They are telling that arguments must be rounded to multiple of word size. And i dont understand why.
Here is situation. If you pass 4 char arguments to a function on i386 architecture it will take 16 bytes (4 bytes for each char argument). Isn't it more efficient to allocate only 4 bytes for all 4 arguments like it should be with local variables?
Alignment is not the answer. Because it could take 4-12 bytes padding for 16 byte stack alignment in both situiation.

Putting the 4 chars in a single register (or stack location) would require creating and afterwards extracting the individual parameters, which is costly in terms of instructions. Note that even if you are talking about the stack, the memory access should be very quick given it will be most likely in the cache.
If you really want to save that much space, you can still do it yourself using a single 4-byte argument.

Isn't it more efficient to ...
You always have to say what you want to optimize:
Fast execution speed
Small program size
Less stack usage
Simpler compilers
...
If you want to optimize for less stack usage, passing bytes to the function really would be more efficient.
However, normally you want to optimize for fast execution speed or small program size.
Unlike modern compilers (that mov the arguments to the stack) most compilers written in the 1990s I know push the arguments to the stack. If a compiler uses push operations, putting bytes to the stack would be rather complex - it would make the program slow and long.
(Note that I have never seen that a pop operation is done on a parameter.)

I think the original C authors had their eye on portability and maintainability more than squeezing every byte and cycle. Not that C is careless with resources, but appropriate trade-offs were made.
Promoting each parameter to the stack granule size made sense, and really still does. If you are desperate to squeeze it in, you could always replace:
int f(int a, int b, int c, int d) { ... }
with
struct fparm { char a,b,c,d; }; int f(struct fparm a) { ... }
Modern C compilers are not so user friendly; or rather their only friend is a luser named benchmark....

Why does C not define minimum size for an array?

C standard defines a lot of lower/upper limits (translation limits) and imposes an implementation should satisfy for each translation. Why there's no such minimum limit defined for an array size? The following program is going to compile fine and likely produce runtime error/segfault and would invoke undefined behaviour.
int main()
{
int a[99999999];
int i;
for(i=0;i<99999999;i++)
a[i]=i;
return 0;
}
A possible reason could be local arrays are allocated on automatic storage and it depends on the size of the stack frame allocated. But why not a minimum limit like other limits defined by C?
Let's forget about the undefined cases like above. Consider the following:
int main()
{
int a[10];
int i;
for(i=0;i<10;i++)
a[i]=i;
return 0;
}
In the above, what gives me the guarantee that the local array (despite a very small one) is going to work as expected and won't cause undefined behaviour due to allocation failure?
Although it's unlikely that an allocation for such a small array would fail on any modern systems. But the C standard doesn't define any requirements to satisfy and compilers don't (at least GCC doesn't) report allocation failures. Only a runtime error/undefined behaviour is possibility. The hard part is nobody can tell whether an arbitrary sized array is going cause undefined behaviour due to allocation failure.
Note that I am aware I can use dynamic arrays (via malloc & friends) for this purpose and have a better control over allocation failures. I am more interested in why there's no such limit defined for local arrays. Also, global arrays are going to be stored in static storage and is going to increase executable size which compilers can handle.

Because C, the language, should not be imposing limitations on your available stack size. C operates in many (many) different environments. How could it possibly come up with a reasonable number? Hell, automatic storage duration != stack, a stack is an implementation detail. C, the language, says nothing of a "stack".
The environment decides this stuff, and for good reason. What if a certain environment implements automatic storage duration via an alternative method which imposes no such limitation? What if a breakthrough in hardware occurs and all of a sudden modern machines do not require such a limitation?
Should we rev the standard in such an event? We would have to if C, the language, specified such implementation details.

You've already answered your own question; it's due to stack limitation.* Even this might not work:
void foo(void) {
int a;
...
}
if the ... is actually a recursive call to foo.
In other words, this is nothing to do with arrays, as the same problem affects all local variables. The standard couldn't enforce a requirement, because in practice that would translate into a requirement for an infinite-sized stack.
* Yes, I know the C standard(s) don't talk about stacks. But that's the implicit model, in the sense that the standard was really a formalisation of the implementations that existed at the time.

The MINIMUM limit is an array of 1 element. Why would you have a "limit" for that? Of course, if you call a function recursively forever, an array of 1 may not fit on the stack, or the call that calls the function next call around may not fit on the stack - the only way to solve that would be to know the size of the stack in the compiler - but the compiler doesn't actually know at that stage how big the stack is - never mind the problems of extremely complex call hierarchies were several different functions call into the same function, possibly with recursion and/or several layers of rather large consumers of stack - how do you size the stack for that - the worst possible case may not be ever encountered, because other things dictate that this doesn't happen - for example, the worst case in one function is only when an input file is empty, but the worst case in another function is when there is lots of data stored in the same file. Lots and lots of variations like this. It's just too unreliable to determine, so sooner or later it would just become guess-work or of lots of false positives.
Consider a program with thousands of functions, all of which call the same logging function that needs a 200 byte array on the stack for temporarily storing the log output. It's called from just about every function from main upwards.
The MAXIMUM for a local variable depends on the size of the stack, which, like I said above, is not something the compiler knows when compiling your code [the linker MAY know, but that's later on]. For global arrays and those allocated on the heap, the limit is "how much memory your process can get", so there's no upper limit there.
There's just no easy way to determine this. And many of the limits provided by the standard is there to guarantee that code can be compiled on "any compiler" as long as your code follows the rules. Be compiled and be able to run to completion is two different things.
int main()
{
while(1);
}
will never run to completion - but it will compile in every compiler I know of, and most won't say a thing about there being an infinite loop - it's your choice to do that.
It's also your choice to put large arrays on the stack. And it could well be that the linker is given several gigabytes of stack, in which case it'll be fine - or the stack is 200K, and you can't have 50000 array of integers...

Strange stack behavior in C

I'm worried that I am misunderstanding something about stack behavior in C.
Suppose that I have the following code:
int main (int argc, const char * argv[])
{
int a = 20, b = 25;
{
int temp1;
printf("&temp1 is %ld\n" , &temp1);
}
{
int temp2;
printf("&temp2 is %ld\n" , &temp2);
}
return 0;
}
Why am I not getting the same address in both printouts? I am getting that temp2 is one int away from temp1, as if temp1 was never recycled.
My expectation is for the stack to contain 20, and 25.
Then have temp1 on top, then have it removed, then have temp2 on top, then have it removed.
I am using gcc on Mac OS X.
Note that I am using the -O0 flag for compiling without optimizations.
Tho those wondering about the background for this question: I am preparing teaching materials on C, and I am trying to show the students that they should not only avoid returning pointers to automatic variables from functions, but also to avoid taking the address of variables from nested blocks and dereferencing them outside. I was trying to demonstrate how this causes problems, and couldn't get the screenshot.

The compiler is completely within its rights not to optimize temp1 and temp2 into the same location. It has been many years since compilers generated code for one stack operation at a time; these days the whole stack frame is laid out at one go. (A few years back a colleague and I figured out a particularly clever way to do this.) Naive stack layout probably puts each variable in its own slot, even when, as in your example, their lifetimes don't overlap.
If you're curious, you might get different results with gcc -O1 or gcc -O2.

There is no guarantee what address stack objects will receive regardless of the order they are declared.
The compiler can happily reorder the creation and duration of stack variables providing it does not affect the results of the function.

I believe the C standard just talks about the scope and lifetime of variables defined in a block. It makes no promises about how the variables interact with the stack or if a stack even exists.

I remember reading something about it. All I have now is this obscure link.
Just to let everybody know (and for the sake of the archives), it appears to be our kernel extension is running into a known limitation of GCC. Just to recap, we have a function in a very portable, very lightweight library, that for some reason is getting compiled with a 1600+ byte stack when compiled on/for Darwin. No matter what compiler options I tried, and what optimization levels I used, the stack was no smaller than 1400 "machine check" panic in pretty reproducible (but not frequent) situations.
After a lot of searching on the Web, learning some i386 assembly and talking to some people who are much better at assembly, I have learned that GCC is somewhat notorious for having horrid stack allocation. [...]
Apparently this is gcc's dirty little secret, except it's not much of a secret to some--Linus Torvalds has complained several times on various lists about the gcc stack allocation (search lkml.org for "gcc stack usage"). Once I knew what to search for, there was plenty of griping about gcc's subpar allocation of stack variables, and in particular, it's inability to re-use stack space for variables in different scopes.
With that said, my Linux version of gcc properly re-uses stack space, I get same address for both variables. Not sure what C standard says about it, but strict scope enforcement is only important for code correctness in C++ (due to destruction at the end of the scope), but not in C.

There is no standard that sets how variables are placed on the stack. What happens in the compiler is much more complicated. In your code, the compiler may even choose to completely ignore and suppress variables a and b.
During the many stages of the compiler, the code may be converted to it's SSA form, and all stack variables lose their addresses and meanings in this form (it may even make it harder for the debugger).
Stack space is very cheap, in the sense that the time to allocate either 2 or 20 variables is constant. Also, stack space is very dynamic for most function calls, since with the exception of a few functions (those nearer main() and thread-entry functions, with long-lived event loops or so), they tend to complete quickly. So, you just don't bother with them.

This is completely dependent on the compiler and how it is configured.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight