I have learnt that memory for global variables are allocated at program startup whereas memory for local variables are allocated whenever function call is made.
Case 1:
I have declared a global integer array of size 63500000 and memory used is 256 MB
Ideone Link
include <stdio.h>
int a[63500000];
int main()
{
printf ("This code requires about 250 MB memory\n");
return 0;
}
Case 2:
I have declared a local integer array of same size in main() and memory used is 1.6 MB
Ideone link
#include <stdio.h>
int main()
{
int a[63500000]= {1,5,0};
printf ("This code requires only 1.6 MB \n");
//printf ("%d\n", a[0]);
return 0;
}
Case 3:
I have declared a local integer array of same size in another function and memory used is 1.6 MB
Ideone Link
#include <stdio.h>
void f()
{
int a[63500000];
}
int main()
{
f();
return 0;
}
Please explain why there is difference in memory used or my concept of memory allocation is wrong ??
First of all: the ideone compiler is GCC.
So, what does GCC do when you compile this?:
void foo ()
{
int a[63500000];
}
gcc -S -O2 foo.c generates:
foo:
pushl %ebp
movl %esp, %ebp
popl %ebp
ret
i.e. nothing is allocated on the stack, at all.
The array is simply optimized away by GCC because it is never used.
GCC won't do this with a global, because it is possible that a global is used in another compilation unit, and so it isn't sure that it is never used. Also: The global is not on the stack (since it is a global).
Now, lets see what happens when you actually use the local array:
int bar (int a, int b, int c)
{
int f[63500000];
f[a] = 9;
f[b] = 7;
return f[c];
}
Things are very different:
bar:
pushl %ebp
movl %esp, %ebp
subl $254000000, %esp
movl 8(%ebp), %eax
movl $9, -254000000(%ebp,%eax,4)
movl 12(%ebp), %eax
movl $7, -254000000(%ebp,%eax,4)
movl 16(%ebp), %eax
movl -254000000(%ebp,%eax,4), %eax
leave
ret
This line: subl $254000000, %esp corresponds to the size of the array. i.e. memory is allocated on the stack.
Now, what if I tried to use the bar function in a program:
int bar (int a, int b, int c)
{
int f[63500000];
f[a] = 9;
f[b] = 7;
return f[c];
}
int main (void)
{
return bar (0, 0, 0);
}
We already saw, that the bar function allocates 250 or so megabytes on the stack. On my default GNU/Linux install, the stack size is limited to 8MB. So when the program runs, it causes a "Segmentation fault". I can increase it if I want, by executing the following in a shell:
ulimit -s 1000000 #i.e. allow stack size to grow close to 1GB
Then I can run the program, and it will indeed run.
The reason why it fails on the ideone website is that they have limited the stack size when executing programs (and they should, otherwise malicious users could mess up their system).
Cases 2, 3
Variables that you define inside functions are allocated on the stack. That means that the associated memory is cleaned up (the stack is "popped") when the function exits.
Case 1
Variables defined in global scope are allocated in a data segment (or, generally, a memory space requested from the operating system) that exists for the lifetime of the process.
Additionally
Memory allocated using malloc is allocated from a heap and remains allocated until explicitly released using free.
Note that a modern OS may well provide address space requested by a program, but not physically back that address space with RAM until the memory (or a portion of the memory often called a page) is physically accessed.
case 2 and case 3 would result in stack overflow as you are asking for 64 MB of stack memory wherein your stack is typically 8 MB on Linux . this would result in random bad things and /or core dumps and crashes.
this answer greatly explains various sections of process address space (.text, .bss , .data )and how various allocations of variables is done.
Related
In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
A bit of context here: I told a friend that I wrote a program not using dynamic memory allocation ("malloc") and allocate all memory static (by modeling all my state variables in a struct, which I then declared global). He then told me that if I'm using automatic variables, I still make use of dynamic memory. I argued that my automatic variables are not state variables but control variables, so my program is still to be considered static. We then discussed that there has to be a way to make a statement about the absolute worst-case behaviour about my program, so I came up with the above question.
Bonus question: If the assumptions above hold, I could simply declare all automatic variables static and would end up with a "truly" static program?
Even if array sizes are constant a C implementation could allocate arrays and even structures dynamically. I'm not aware of any that do (anyone) and it would appear quite unhelpful. But the C Standard doesn't make such guarantees.
There is also (almost certainly) some further overhead in the stack frame (the data added to the stack on call and released on return).
You would need to declare all your functions as taking no parameters and returning void to ensure no program variables in the stack. Finally the 'return address' of where execution of a function is to continue after return is pushed onto the stack (at least logically).
So having removed all parameters, automatic variables and return values to you 'state' struct there will still be something going on to the stack - probably.
I say probably because I'm aware of a (non-standard) embedded C compiler that forbids recursion that can determine the maximum size of the stack by examining the call tree of the whole program and identify the call chain that reaches the peek size of the stack.
You could achieve this a monstrous pile of goto statements (some conditional where a functon is logically called from two places or by duplicating code.
It's often important in embedded code on devices with tiny memory to avoid any dynamic memory allocation and know that any 'stack-space' will never overflow.
I'm happy this is a theoretical discussion. What you suggest is a mad way to write code and would throw away most of (ultimately limited) services C provides to infrastructure of procedural coding (pretty much the call stack)
Footnote: See the comment below about the 8-bit PIC architecture.
Bonus question: If the assumptions above hold, I could simply declare
all automatic variables static and would end up with a "truly" static
program?
No. This would change the function of the program. static variables are initialized only once.
Compare this 2 functions:
int canReturn0Or1(void)
{
static unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}
int willAlwaysReturn0(void)
{
unsigned a=0;
a++;
if(a>1)
{
return 1;
}
return 0;
}
In a C99 program, under the (theoretical) assumption that I'm not using variable-length arrays, and each of my automatic variables can only exist once at a time in the whole stack (by forbidding circular function calls and explicit recursion), if I sum up all the space they are consuming, could I declare that this is the maximal stack size that can ever happen?
No, because of function pointers..... Read n1570.
Consider the following code, where rand(3) is some pseudo random number generator (it could also be some input from a sensor) :
typedef int foosig(int);
int foo(int x) {
foosig* fptr = (x>rand())?&foo:NULL;
if (fptr)
return (*fptr)(x);
else
return x+rand();
}
An optimizing compiler (such as some recent GCC suitably invoked with enough optimizations) would make a tail-recursive call for (*fptr)(x). Some other compiler won't.
Depending on how you compile that code, it would use a bounded stack or could produce a stack overflow. With some ABI and calling conventions, both the argument and the result could go thru a processor register and won't consume any stack space.
Experiment with a recent GCC (e.g. on Linux/x86-64, some GCC 10 in 2020) invoked as gcc -O2 -fverbose-asm -S foo.c then look inside foo.s. Change the -O2 to a -O0.
Observe that the naive recursive factorial function could be compiled into some iterative machine code with a good enough C compiler and optimizer. In practice GCC 10 on Linux compiling the below code:
int fact(int n)
{
if (n<1) return 1;
else return n*fact(n-1);
}
as gcc -O3 -fverbose-asm tmp/fact.c -S -o tmp/fact.s produces the following assembler code:
.type fact, #function
fact:
.LFB0:
.cfi_startproc
endbr64
# tmp/fact.c:3: if (n<1) return 1;
movl $1, %eax #, <retval>
testl %edi, %edi # n
jle .L1 #,
.p2align 4,,10
.p2align 3
.L2:
imull %edi, %eax # n, <retval>
subl $1, %edi #, n
jne .L2 #,
.L1:
# tmp/fact.c:5: }
ret
.cfi_endproc
.LFE0:
.size fact, .-fact
.ident "GCC: (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0"
And you can observe that the call stack is not increasing above.
If you have serious and documented arguments against GCC, please submit a bug report.
BTW, you could write your own GCC plugin which would choose to randomly apply or not such an optimization. I believe it stays conforming to the C standard.
The above optimization is essential for many compilers generating C code, such as Chicken/Scheme or Bigloo.
A related theorem is Rice's theorem. See also this draft report funded by the CHARIOT project.
See also the Compcert project.
I wrote a simple function in order to check if malloc works. I create 1 Gb array, fill it with numbers, but the heap does not seem to change. Here is the code:
#include <stdio.h>
#include <assert.h> // For assert()
#include <stdlib.h> // For malloc(), free() and realloc()
#include <unistd.h> // For sleep()
static void create_array_in_heap()
{
double* b;
b = (double*)malloc(sizeof(double) * 1024 * 1024 * 1024);
assert(b != NULL); // Check that the allocation succeeded
int i;
for (i=0; i<1024*1024*1024; i++);
b[i] = 1;
sleep(10);
free(b);
}
int main()
{
create_array_in_heap();
return 0;
}
screenshot of Linux' system monitor
Any ideas why ?
EDIT: a simpler explanation is given in the comments. But my answer applies once the ; has been removed.
An agressive optimizing compiler, such as Clang (Compiler Explorer link), can see that the only important part of your function create_array_in_heap is the call to sleep. The rest has no functional value, since you only fill a memory block to eventually discard it, and is removed by the compiler. This is the entirety of your program compiled by Clang 7.0.0 with -O2:
main: # #main
pushq %rax
movl $10, %edi
callq sleep
xorl %eax, %eax
popq %rcx
retq
In order to benchmark any aspect of a program, the program should have been designed to output a result (computing and discarding the result is too easy for the compiler to optimize into nothing). The result should also be computed from inputs that aren't known at compile-time, otherwise the computation always produces the same result and can be optimized by constant propagation.
Consider a C function (with external linkage) like the following one:
void f(void **p)
{
/* do something with *p */
}
Now assume that f is being called in a way such that p points to the return address of f on the stack, as in the following code (assuming the System V AMD64 ABI):
leaq -8(%rsp), %rdi
callq f
What may happen is that the code of f modifies the return address on the stack by assigning a value to *p. Thus the compiler will have to treat the return address on the stack as a volatile value. How can I tell the compiler, gcc in my case, that the return address is volatile?
Otherwise, the compiler could, at least in principle, generate the following code for f:
pushq %rbp
movq 8(%rsp), %r10
pushq %r10
## do something with (%rdi)
popq %r10
popq %rbp
addq 8,%rsp
jmpq *%r10
Admittedly, it is unlikely that a compiler would ever generate code like this but it does not seem to be forbidden without any further function attributes. And this code wouldn't notice if the return address on the stack is being modified in the middle of the function because the original return address is already retrieved at the beginning of the function.
P.S.: As has been suggested by Peter Cordes, I should better explain the purpose of my question: It is about garbage collecting dynamically generated machine code using a moving garbage collector: The function f stands for the garbage collector. The callee of f may be a function whose code is being moved around while f is running, so I came up with the idea of letting f know the return address so that f may modify it accordingly to whether the memory area the return address points to has been moved around or not.
Using the SysV ABI (Linux, FreeBSD, Solaris, Mac OS X / macOS) on AMD64/x86-64, you only need a trivial assembly function wrapped around the actual garbage collector function.
The following f.s defines void f(void *), and calls the real GC, real_f(void *, void **), with the added second parameter pointing to the return address.
.file "f.s"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
movq %rsp, %rsi
call real_f
ret
.size f, .-f
If real_f() already has two other parameters, use %rdx (for the third) instead of %rsi. If three to five, use %rcx, %r8, or %r9, respectively. SysV ABI on AMD64/x86-64 only supports up to six non-floating-point parameters in registers.
Let's test the above with a small example.c:
#include <stdlib.h>
#include <stdio.h>
extern void f(void *);
void real_f(void *arg, void **retval)
{
printf("real_f(): Returning to %p instead of %p.\n", arg, *retval);
*retval = arg;
}
int main(void)
{
printf("Function and label addresses:\n");
printf("%p f()\n", f);
printf("%p real_f()\n", real_f);
printf("%p one_call:\n", &&one_call);
printf("%p one_fail:\n", &&one_fail);
printf("%p one_skip:\n", &&one_skip);
printf("\n");
printf("f(one_skip):\n");
fflush(stdout);
one_call:
f(&&one_skip);
one_fail:
printf("At one_fail.\n");
fflush(stdout);
one_skip:
printf("At one_skip.\n");
fflush(stdout);
return EXIT_SUCCESS;
}
Note that the above relies on both GCC behaviour (&& providing the address of a label) as well as GCC behaviour on AMD64/x86-64 architecture (object and function pointers being interchangeable), as well as the C compiler not making any of the myriad optimizations they are allowed to do to the code in main().
(It does not matter if real_f() is optimized; it's just that I was too lazy to work out a better example in main(). For example, one that creates a small function in an executable data segment that calls f(), with real_f() moving that data segment, and correspondingly adjusting the return address. That would match OP's scenario, and is just about the only practical use case for this kind of manipulation I can think of. Instead, I just hacked a crude example that might or might not work for others.)
Also, we might wish to declare f() as having two parameters (they would be passed in %rdi and %rsi) too, with the second being irrelevant, to make sure the compiler does not expect %rsi to stay unchanged. (If I recall correctly, the SysV ABI lets us clobber it, but I might remember wrong.)
On this particular machine, compiling the above with
gcc -Wall -O0 f.s example.c -o example
running it
./example
produces
Function and label addresses:
0x400650 f()
0x400659 real_f()
0x400729 one_call:
0x400733 one_fail:
0x40074c one_skip:
f(one_skip):
real_f(): Returning to 0x40074c instead of 0x400733.
At one_skip.
Note that if you tell GCC to optimize the code (say, -O2), it will make assumptions about the code in main() it is perfectly allowed to do by the C standard, but which may lead to all three labels having the exact same address. This happens on my particular machine and GCC-5.4.0, and of course causes an endless loop. It does not reflect on the implementation of f() or real_f() at all, only that my example in main() is quite poor. I'm lazy.
When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise
(Sorry for bad English.)
Question 1.
void foo(void)
{
goto inside;
for (;;) {
int stack_var = 42;
inside:
...
}
}
Will be a place in stack allocated for the stack_var when I goto the inside label? I.e. can I correctly use the stack_var variable within ...?
Question 2.
void foo(void)
{
for (;;) {
int stack_var = 42;
...
goto outside;
}
outside:
...
}
Will be a place in stack of the stack_var deallocated when I goto the outside label? E.g. is it correct to do return within ...?
In other words, is goto smart for correct working with stack variables (automatic (de)allocation when I walk through blocks), or it's just a stupid jump?
Question 1:
can I correctly use the stack_var variable within ...?
The code in ... can write to stack_var. However, this variable is uninitialized because the execution flow jumped over the initialization, so the code should not read from it without having written to it first.
From the C99 standard, 6.8:3
The initializers of objects that have automatic storage duration […] are evaluated and the values are stored in the objects (including storing an indeterminate value in objects without an initializer) each time the declaration is reached in the order of execution
My compiler compiles the function below to a piece of assembly that sometimes returns the uninitialized contents of x:
int f(int c){
if (c) goto L;
int x = 42;
L:
return x;
}
cmpl $0, %eax
jne LBB1_2
movl $42, -16(%rbp)
LBB1_2:
movl -16(%rbp), %eax
...
popq %rbp
ret
Question 2:
Will be a place in stack of the stack_var deallocated when I goto the outside label?
Yes, you can expect the memory reserved for stack_var to be reclaimed as soon as the variable goes out of scope.
There are two different issues:
lexical scoping of variables inside C code. A C variable only makes sense inside the block in which it is declared. You could imagine that the compiler is renaming variables to unique names, which have sense only inside the scope block.
call frames in the generated code. A good optimizing compiler usually allocate the call frame of the current function on the machine class stack at the beginning of the function. A given location in that call frame, called a slot can (and usually is) reused by the compiler for several local variables (or other purposes).
And a local variable can be kept in a register only (without any slot in the call frame), and that register will obviously be reused for various purposes.
You are probably hurting undefined behavior for your first case. After the goto inside the stack_var is uninitialized.
I suggest you to compile with gcc -Wall and to improve the code till no warnings are given.