Mechanism of allocating memory for variable size array - c

I am not able to understand how variable size array works , whether memory for it is allocated on stack or somewhere else and how information about its size is obtained.
i tried the following code
#include<stdio.h>
int main()
{
int n;
scanf("%d",&n);
int arr[n];
printf("%d\n",sizeof(arr));
return 0;
}
i mean i memory is allocted on stack ,then before running this function the stack frame is to be allocated and memory for local variables has to be allocated ,but the size of array is known after the function calls scanf().

On most contemporary systems with memory protection and so on, you can just grow the stack. If accessing the grown stack causes accesses to memory which is actually outside the valid range of virtual memory for the process, the operating system will catch that and map some more memory your way.
So there's no problem in doing that "on the fly", and of course "allocating n bytes on the stack" is generally about as complex as "stackpointer -= n".
There might be some additional complexity if the function has many exit paths, since they need to unwind the proper amount of stack depending on wether the variable-length array has been allocated or not, not sure how that is generally solved. That would be an easy code-reading exercise to find out.

In C++, this isn't (yet) allowed, although some compilers allow it as a non-standard extension. Dynamically-sized arrays, similar to those in C, are due to be introduced in C++14.
How to implement this is up to the compiler writer, as long as the memory is allocated somewhere and freed automatically. This is typically done by extending the stack frame once the size is known. There may or may not be a check that the stack is large enough, so beware creating large arrays like this.

Related

Why use fixed length allocation instead of static allocation?

I understand that what I'm asking may be quite simple for some of you, but bear with me, I'm trying to understand memory management. Since we use a certain fixed length (N) for the size, why we also do this:
int *arr = (int*)malloc(N * sizeof(int))
..instead of the conventional static way:
int arr[N];
malloc provides memory that remains available after execution of the block it is in ends, such as when the function it is in returns.
int A[N], if it appears inside a function, uses memory that is guaranteed to be available only while execution of the block it is in has not ended. If it appears outside a function, the memory is available for all of program execution, but N must be a constant. (Further, even inside a function, int A[N] where N is not a constant is not supported in some C implementations.)
In typical C implementations in general-purpose operating systems, malloc has a large amount of memory available to provide, but int A[N] inside a function uses a stack region that is limited, commonly one to eight mebibytes total, depending on the system.
int arr[N]; goes either into static memory if it's in filescope (or is preceded by static) or on the stack.
A static-memory int arr[N]; is cheapest to allocate (both in terms of size overhead and allocation time) but it can never be freed and the memory won't always be cache-local.
If the int arr[N]; is inside a block (and not preceded by static), int arr[N]; goes on the stack. This memory will pretty much certainly be cache-local, but you might not wish to use this form if N is large and your stack is limited (or you're risking stack overflow) or if the lifetime of arr needs to exceed that of its containing block.
malloc'd memory takes some time to allocate (tens to hundreds of ns), it may carry some size overhead, and the allocation may fail, but you can free the memory later and it stays until you do free it. It might also be resizable (via realloc) without the need to copy memory. Cache-locality-wise, it's sort of like static memory (i.e., the block may be potentially far from the cache-hot end of the stack).
So those are the considerations: lifetime, stack size, allocation time, size overhead, free-ability, and possibly cache-locality.
A final consideration might be a peculiar feature of C: effective type. A declared array has a fixed effective type and therefore cannot be retyped without violating C's strict aliasing rules but malloc'd memory can be. In other words, you can use a malloc'd block as a backing storage for a generic allocator of your own, but a declared char array cannot be used that way, strictly speaking.
malloc. This is dynamic allocation. You can free or change the size of the allocated memory. Heap is usually much larger than the stack or global variables memory area. The memory has the same storage duration as static objects
The latter cannot be resized or freed (except the automatic objects by exiting the scope). Static and automatic storage is usually smaller than the heap.

Check array created successfully in C

In C there are 2 ways to create arrays:
int array[100];
and
int * array = malloc(sizeof(int)*100);
With the second statement its easy to check if there was enough memory available to create the array for example:
if(array == NULL){
goto OutOfMemory;
}
But how would you check that the first worked successfully? Assuming this was running on a microcontroller and not a computer.
There is no such thing as a recoverable failure from allocation of an array on the stack (the first way). It will only fail if allocating it causes a stack overflow, at which point your program has aborted/terminated anyway.
When you allocate the array the first way, it is being allocated on the stack, usually at function call time. If there isn't enough room on the stack to allocate it, the program aborts with a stack overflow/segfault error.
When you allocate the second way, you are asking the memory manager for memory on the heap at the time you actually call malloc.
EDIT: As mentioned by #Deduplicator, if you're on a system without memory protection, not having enough free stack space to allocate an array can leave to overruns and much subtler problems (although most likely it will fail on an illegal instruction soon enough).
The first piece of code stores the array in the stack
The second stores the array in the heap
Stack memory is pre-allocated thorughout the thread, having said that, unless u'r allocating huge amounts of data on the stack, you shouldn't be generally worried about stack space.
Checking available stack size in C
Edit:
In that case you ought to make sure, in-advance that you have enough stack (defined in your IDE/Compiler/Linker/Proprietary software) for the depthest of the calls throughout your code execution. This can be known in advance in compile-time, no need for runtime checks.

why is array size limited when declared at compile time?

for example I can do
int *arr;
arr = (int *)malloc(sizeof(int) * 1048575);
but I cannot do this without the program crashing:
int arr[1048575];
why is this so?
Assuming arr is a local variable, declaring it as an array uses memory from the (relatively limited) stack, while malloc() uses memory from the (comparatively limitless) heap.
If you're allocating these as local variables in functions (which is the only place you could have the pointer declaration immediately followed by a malloc call), then the difference is that malloc will allocate a chunk of memory from the heap and give you its address, while directly doing int arr[1048575]; will attempt to allocate the memory on the stack. The stack has much less space available to it.
The stack is limited in size for two main reasons that I'm aware of:
Traditional imperative programming makes very little use of recursion, so deep recursion (and heavy stack growth) is "probably" a sign of infinite recursion, and hence a bug that's going to kill the process. It's therefore best if it is caught before the process consumes the gigabytes of virtual memory (on a 32 bit architecture) that will cause the process to exhaust its address space (at which point the machine is probably using far more virtual memory than it actually has RAM, and is therefore operating extremely slowly).
Multi-threaded programs need multiple stacks. Therefore, the runtime system needs know that the stack will never grow beyond a certain bound, so it can put another stack after that bound if a new thread is created.
When you declare an array, you are placing it on the stack.
When you call malloc(), the memory is taken from the heap.
The stack is usually more limited compared to the heap, and is usually transient (but it depends on how often you enter and exit the function that this array is declared in.
For such a large (maybe not by today's standards?) memory, it is good practice to malloc it, assuming you want the array to last around for a bit.

Why would you ever want to allocate memory on the heap rather than the stack? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
When is it best to use a Stack instead of a Heap and vice versa?
I've read a few of the other questions regarding the heap vs stack, but they seem to focus more on what the heap/stack do rather than why you would use them.
It seems to me that stack allocation would almost always be preferred since it is quicker (just moving the stack pointer vs looking for free space in the heap), and you don't have to manually free allocated memory when you're done using it. The only reason I can see for using heap allocation is if you wanted to create an object in a function and then use it outside that functions scope, since stack allocated memory is automatically unallocated after returning from the function.
Are there other reasons for using heap allocation instead of stack allocation that I am not aware of?
There are a few reasons:
The main one is that with heap allocation, you have the most flexible control over the object's lifetime (from malloc/calloc to free);
Stack space is typically a more limited resource than heap space, at least in default configurations;
A failure to allocate heap space can be handled gracefully, whereas running out of stack space is often unrecoverable.
Without the flexible object lifetime, useful data structures such as binary trees and linked lists would be virtually impossible to write.
You want an allocation to live beyond a function invocation
You want to conserve stack space (which is typically limited to a few MBs)
You're working with re-locatable memory (Win16, databases, etc.), or want to recover from allocation failures.
Variable length anything. You can fake around this, but your code will be really nasty.
The big one is #1. As soon as you get into any sort of concurrency or IPC #1 is everywhere. Even most non-trivial single threaded applications are tricky to devise without some heap allocation. That'd practically be faking a functional language in C/C++.
So I want to make a string. I can make it on the heap or on the stack. Let's try both:
char *heap = malloc(14);
if(heap == NULL)
{
// bad things happened!
}
strcat(heap, "Hello, world!");
And for the stack:
char stack[] = "Hello, world!";
So now I have these two strings in their respective places. Later, I want to make them longer:
char *tmp = realloc(heap, 20);
if(tmp == NULL)
{
// bad things happened!
}
heap = tmp;
memmove(heap + 13, heap + 7);
memcpy(heap + 7, "cruel ", 6);
And for the stack:
// umm... What?
This is only one benefit, and others have mentioned other benefits, but this is a rather nice one. With the heap, we can at least try to make our allocated space larger. With the stack, we're stuck with what we have. If we want room to grow, we have to declare it all up front, and we all know how it stinks to see this:
char username[MAX_BUF_SIZE];
The most obvious rationale for using the heap is when you call a function and need something of unknown length returned. Sometimes the caller may pass a memory block and size to the function, but at other times this is just impractical, especially if the returned stuff is complex (e.g. a collection of different objects with pointers flying around, etc.).
Size limits are a huge dealbreaker in a lot of cases. The stack is usually measured in the low megabytes or even kilobytes (that's for everything on the stack), whereas all modern PCs allow you a few gigabytes of heap. So if you're going to be using a large amount of data, you absolutely need the heap.
just to add
you can use alloca to allocate memory on the stack, but again memory on the stack is limited and also the space exists only during the function execution only.
that does not mean everything should be allocated on the heap. like all design decisions this is also somewhat difficult, a "judicious" combination of both should be used.
Besides manual control of object's lifetime (which you mentioned), the other reasons for using heap would include:
Run-time control over object's size (both initial size and it's "later" size, during the program's execution).
For example, you can allocate an array of certain size, which is only known at run time.
With the introduction of VLA (Variable Length Arrays) in C99, it became possible to allocate arrays of fixed run-time size without using heap (this is basically a language-level implementation of 'alloca' functionality). However, in other cases you'd still need heap even in C99.
Run-time control over the total number of objects.
For example, when you build a binary tree stucture, you can't meaningfully allocate the nodes of the tree on the stack in advance. You have to use heap to allocated them "on demand".
Low-level technical considerations, as limited stack space (others already mentioned that).
When you need a large, say, I/O buffer, even for a short time (inside a single function) it makes more sense to request it from the heap instead of declaring a large automatic array.
Stack variables (often called 'automatic variables') is best used for things you want to always be the same, and always be small.
int x;
char foo[32];
Are all stack allocations, These are fixed at compile time too.
The best reason for heap allocation is that you cant always know how much space you need. You often only know this once the program is running. You might have an idea of limits but you would only want to use the exact amount of space required.
If you had to read in a file that could be anything from 1k to 50mb, you would not do this:-
int readdata ( FILE * f ) {
char inputdata[50*1024*1025];
...
return x;
}
That would try to allocate 50mb on the stack, which would usually fail as the stack is usually limited to 256k anyway.
The stack and heap share the same "open" memory space and will have to eventually come to a point where they meet, if you use the entire segment of memory. Keeping the balance between the space that each of them use will have amortized cost later for allocation and de-allocation of memory a smaller asymptotic value.

Why should I use malloc() when "char bigchar[ 1u << 31 - 1 ];" works just fine?

What's the advantage of using malloc (besides the NULL return on failure) over static arrays? The following program will eat up all my ram and start filling swap only if the loops are uncommented. It does not crash.
...
#include <stdio.h>
unsigned int bigint[ 1u << 29 - 1 ];
unsigned char bigchar[ 1u << 31 - 1 ];
int main (int argc, char **argv) {
int i;
/* for (i = 0; i < 1u << 29 - 1; i++) bigint[i] = i; */
/* for (i = 0; i < 1u << 31 - 1; i++) bigchar[i] = i & 0xFF; */
getchar();
return 0;
}
...
After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit? Apparently I can have as many of of them as I want. It will segfault, but only if I ask for (and try to use) more than malloc would give me anyway.
Is there a way to determine if a static array was actually allocated and safe to use?
EDIT: I'm interested in why malloc is used to manage the heap instead of letting the virtual memory system handle it. Apparently I can size an array to many times the size I think I'll need and the virtual memory system will only keep in ram what is needed. If I never write to e.g. the end (or beginning) of these huge arrays then the program doesn't use the physical memory. Furthermore, if I can write to every location then what does malloc do besides increment a pointer in the heap or search around previous allocations in the same process?
Editor's note: 1 << 31 causes undefined behaviour if int is 32-bit, so I have modified the question to read 1u. The intent of the question is to ask about allocating large static buffers.
Well, for two reasons really:
Because of portability, since some systems won't do the virtual memory management for you.
You'll inevitably need to divide this array into smaller chunks for it to be useful, then to keep track of all the chunks, then eventually as you start "freeing" some of the chunks of the array you no longer require you'll hit the problem of memory fragmentation.
All in all you'll end up implementing a lot of memory management functionality (actually pretty much reimplementing the malloc) without the benefit of portability.
Hence the reasons:
Code portability via memory management encapsulation and standardisation.
Personal productivity enhancement by the way of code re-use.
Please see:
malloc() and the C/C++ heap
Should a list of objects be stored on the heap or stack?
C++ Which is faster: Stack allocation or Heap allocation
Proper stack and heap usage in C++?
About C/C++ stack allocation
Stack,Static and Heap in C++
Of Memory Management, Heap Corruption, and C++
new on stack instead of heap (like alloca vs malloc)
with malloc you can grow and shrink your array: it becomes dynamic, so you can allocate exactly for what you need.
This is called custom memory management, I guess.
You can do that, but you'll have to manage that chunk of memory yourself.
You'd end up writing your own malloc() woring over this chunk.
Regarding:
After some trial and error I found the
above is the largest static array
allowed on my 32-bit Intel machine
with GCC 4.3. Is this a standard
limit, a compiler limit, or a machine
limit?
One upper bound will depend on how the 4GB (32-bit) virtual address space is partitioned between user-space and kernel-space. For Linux, I believe the most common partitioning scheme has a 3 GB range of addresses for user-space and a 1 GB range of addresses for kernel-space. The partitioning is configurable at kernel build-time, 2GB/2GB and 1GB/3GB splits are also in use. When the executable is loaded, virtual address space must be allocated for every object regardless of whether real memory is allocated to back it up.
You may be able to allocate that gigantic array in one context, but not others. For example, if your array is a member of a struct and you wish to pass the struct around. Some environments have a 32K limit on struct size.
As previously mentioned, you can also resize your memory to use exactly what you need. It's important in performance-critical contexts to not be paging out to virtual memory if it can be avoided.
There is no way to free stack allocation other than going out of scope. So when you actually use global allocation and VM has to alloc you real hard memory, it is allocated and will stay there until your program runs out. This means that any process will only grow in it's virtual memory use (functions have local stack allocations and those will be "freed").
You cannot "keep" the stack memory once it goes out of scope of function, it is always freed. So you must know how much memory you will use at compile time.
Which then boils down to how many int foo[1<<29]'s you can have. Since first one takes up whole memory (on 32bit) and will be (lets lie: 0x000000) the second will resolve to 0xffffffff or thereaobout. Then the third one would resolve to what? Something that 32bit pointers cannot express. (remember that stack reservations are resolved partially at compiletime, partially runtime, via offsets, how far the stack offset is pushed when you alloc this or that variable).
So the answer is pretty much that once you have int foo [1<<29] you cant have any reasonable depth of functions with other local stack variables anymore.
You really should avoid doing this unless you know what you're doing. Try to only request as much memory as you need. Even if it's not being used or getting in the way of other programs it can mess up the process its self. There are two reasons for this. First, on certain systems, particularly 32bit ones it can cause address space to be exhausted prematurely in rare circumstances. Additionally many kernels have some kind of per process limit on reserved/virtual/not in use memory. If your program asks for memory at points in run time the kernel can kill the process if it asks for memory to be reserved that exceeds this limit. I've seen programs that have either crashed or exited due to a failed malloc because they are reserving GBs of memory while only using a few MB.

Resources