Segfault while allocating staticaly an int array - c

If I do this
int wsIdx[length];
I've a segFault
but if I do this
int *wsIdx;
wsIdx = (int *)malloc(sizeof(int) * length );
there's no problem.
This problem appears only when length is high, 2560000 during my tests. I've widely enough memory. Could you explain me the differences between the two allocation method, and why the first does not work? Thank you.

The first one gets allocated on the "stack" (an area usually used for local variables), while the second one gets allocated on the "heap" an area for dynamically allocated memory.
You don't have enough stack space to allocate in the first way, your heap is large.
This SO discussion might be helpful: What and where are the stack and heap?.
When you are allocating memory dynamically, you can always check for success or failure of the allocation by examining the return value of malloc/calloc/etc .. no such mechanism exists unfortunately for allocating memory on the stack.
Aside: You might enjoy reading this in the context of this question, especially this part :)

Assuming length is not a constant, then the first form is a variable-length array (VLA), and you have just encountered one of their biggest problems.
Best practice is to avoid VLAs for "large" arrays, and use malloc instead, for two reasons:
There is no mechanism for them to report allocation failure, other than to crash or cause some other undefined behaviour.
VLAs are typically allocated on the stack, which is typically relatively limited in size. So the chance of it failing to allocate is much higher!

implied auto considered harmful
To concur with the answers already given, if the faulting code had been written with an explicit storage class, this common problem might be more obvious.
void
not_enough_stack(void)
{
auto int on_stack[2560 * 1000];
printf("sizeof(stack) %d\n", sizeof(on_stack));
}

Related

Growing an array on the stack

This is my problem in essence. In the life of a function, I generate some integers, then use the array of integers in an algorithm that is also part of the same function. The array of integers will only be used within the function, so naturally it makes sense to store the array on the stack.
The problem is I don't know the size of the array until I'm finished generating all the integers.
I know how to allocate a fixed size and variable sized array on the stack. However, I do not know how to grow an array on the stack, and that seems like the best way to solve my problem. I'm fairly certain this is possible to do in assembly, you just increment stack pointer and store an int for each int generated, so the array of ints would be at the end of the stack frame. Is this possible to do in C though?
I would disagree with your assertion that "so naturally it makes sense to store the array on the stack". Stack memory is really designed for when you know the size at compile time. I would argue that dynamic memory is the way to go here
C doesn't define what the "stack" is. It only has static, automatic and dynamic allocations. Static and automatic allocations are handled by the compiler, and only dynamic allocation puts the controls in your hands. Thus, if you want to manually deallocate an object and allocate a bigger one, you must use dynamic allocation.
Don't use dynamic arrays on the stack (compare Why is the use of alloca() not considered good practice?), better allocate memory from the heap using malloc and resize it using realloc.
Never Use alloca()
IMHO this point hasn't been made well enough in the standard references.
One rule of thumb is:
If you're not prepared to statically allocate the maximum possible size as a
fixed length C array then you shouldn't do it dynamically with alloca() either.
Why? The reason you're trying to avoid malloc() is performance.
alloca() will be slower and won't work in any circumstance static allocation will fail. It's generally less likely to succeed than malloc() too.
One thing is sure. Statically allocating the maximum will outdo both malloc() and alloca().
Static allocation is typically damn near a no-op. Most systems will advance the stack pointer for the function call anyway. There's no appreciable difference for how far.
So what you're telling me is you care about performance but want to hold back on a no-op solution? Think about why you feel like that.
The overwhelming likelihood is you're concerned about the size allocated.
But as explained it's free and it gets taken back. What's the worry?
If the worry is "I don't have a maximum or don't know if it will overflow the stack" then you shouldn't be using alloca() because you don't have a maximum and know it if it will overflow the stack.
If you do have a maximum and know it isn't going to blow the stack then statically allocate the maximum and go home. It's a free lunch - remember?
That makes alloca() either wrong or sub-optimal.
Every time you use alloca() you're either wasting your time or coding in one of the difficult-to-test-for arbitrary scaling ceilings that sleep quietly until things really matter then f**k up someone's day.
Don't.
PS: If you need a big 'workspace' but the malloc()/free() overhead is a bottle-neck for example called repeatedly in a big loop, then consider allocating the workspace outside the loop and carrying it from iteration to iteration. You may need to reallocate the workspace if you find a 'big' case but it's often possible to divide the number of allocations by 100 or even 1000.
Footnote:
There must be some theoretical algorithm where a() calls b() and if a() requires a massive environment b() doesn't and vice versa.
In that event there could be some kind of freaky play-off where the stack overflow is prevented by alloca(). I have never heard of or seen such an algorithm. Plausible specimens will be gratefully received!
The innards of the C compiler requires stack sizes to be fixed or calculable at compile time. It's been a while since I used C (now a C++ convert) and I don't know exactly why this is. http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html provides a useful comparison of the pros and cons of the two approaches.
I appreciate your assembly code analogy but C is largely managed, if that makes any sense, by the Operating System, which imposes/provides the task, process and stack notations.
In order to address your issue dynamic memory allocation looks ideal.
int *a = malloc(sizeof(int));
and dereference it to store the value .
Each time a new integer needs to be added to the existing list of integers
int *temp = realloc(a,sizeof(int) * (n+1)); /* n = number of new elements */
if(temp != NULL)
a = temp;
Once done using this memory free() it.
Is there an upper limit on the size? If you can impose one, so the size is at most a few tens of KiB, then yes alloca is appropriate (especially if this is a leaf function, not one calling other functions that might also allocate non-tiny arrays this way).
Or since this is C, not C++, use a variable-length array like int foo[n];.
But always sanity-check your size, otherwise it's a stack-clash vulnerability waiting to happen. (Where a huge allocation moves the stack pointer so far that it ends up in the middle of another memory region, where other things get overwritten by local variables and return addresses.) Some distros enable hardening options that make GCC generate code to touch every page in between when moving the stack pointer by more than a page.
It's usually not worth it to check the size and use alloc for small, malloc for large, since you also need another check at the end of your function to call free if the size was large. It might give a speedup, but this makes your code more complicated and more likely to get broken during maintenance if future editors don't notice that the memory is only sometimes malloced. So only consider a dual strategy if profiling shows this is actually important, and you care about performance more than simplicity / human-readability / maintainability for this particular project.
A size check for an upper limit (else log an error and exit) is more reasonable, but then you have to choose an upper limit beyond which your program will intentionally bail out, even though there's plenty of RAM you're choosing not to use. If there is a reasonable limit where you can be pretty sure something's gone wrong, like the input being intentionally malicious from an exploit, then great, if(size>limit) error(); int arr[size];.
If neither of those conditions can be satisfied, your use case is not appropriate for C automatic storage (stack memory) because it might need to be large. Just use dynamic allocation autom don't want malloc.
Windows x86/x64 the default user-space stack size is 1MiB, I think. On x86-64 Linux it's 8MiB. (ulimit -s). Thread stacks are allocated with the same size. But remember, your function will be part of a chain of function calls (so if every function used a large fraction of the total size, you'd have a problem if they called each other). And any stack memory you dirty won't get handed back to the OS even after the function returns, unlike malloc/free where a large allocation can give back the memory instead of leaving it on the free list.
Kernel thread stack are much smaller, like 16 KiB total for x86-64 Linux, so you never want VLAs or alloca in kernel code, except maybe for a tiny max size, like up to 16 or maybe 32 bytes, not large compared to the size of a pointer that would be needed to store a kmalloc return value.

Is it a bad manner to use the stack to avoid dynamic allocation?

I understand the answer given by this very similar question:
When is it best to use the stack instead of the heap and vice versa?
But I would like to confirm:
Here is a very simple program:
void update_p(double * p, int size){
//do something to the values refered to by p
}
void test_alloc(int size){
double p[size];
unsigned int i;
for(i=0;i<size;i++)p[i]=(double)i;
update(&p,size);
for(i=0;i<size;i++)printf("%f\n",p[i]);
}
int main(){
test_alloc(5);
return 1;
}
and an alternative version of test_alloc:
void test_alloc(int size){
double * p = calloc(sizeof(double),size);
unsigned int i;
for(i=0;i<size;i++)p[i]=(double)i;
update(p,size);
for(i=0;i<size;i++)printf("%f\n",p[i]);
free(p);
}
It seems to be both version are valid (?). One is using the stack, the other is using heap (?).
Are both approach valid? What are their advantages or issues?
If indeed the array is not to be used once exited from test_alloc, is the first approach preferable as not requiring memory management (i.e. forgetting to free the memory) ?
If test_alloc is called massively, is it more time consuming to reserve memory in the heap and might this have an impact on performance?
Would there be any reason to use the second approach?
The question of allocation in stack (or) heap is dependent on the following parameters:
The scope and life time in which you want the variable to be available: Say in your program if you need to use members of the array of double after test_alloc in main, you should go for dynamic allocation .
The size of your stack: Say you are using the test_alloc function in a context of a thread which has a stack of 4K, and your array size which in in turn dependent on the size variable accounts to 1K, then stack allocation is not advisable, you should go for dynamic allocation.
Complexity of code: While using dynamic allocations, enusre to take care of avoiding memory leaks, dangling pointers etc. This adds some amount of complexity to the code and are more prone to bugs.
The frequency of allocation: Calling malloc and free in a loop may have lesser performance. This point is based on the fact that allocation in stack is just the matter of offseting the SP, on contrary while allocation in heap needs some more book keeping.
In stack version, you can't allocate an array with 100000000 element. When allocate small array using stack, it's faster. When allocate very large array, using heap.
More information about stack, heap memory allocation, consider these writing:
Stack-based memory allocation,Heap-based memory allocation
"If test_alloc is called massively, is it more time consuming to reserve memory in the heap and might this have an impact on performance?"
Well I guess you are looking at the performance in allocating from stack and heap. Always stack allocation is faster as all it does is move the stack pointer. Heap allocation definitely takes more time as it has to do the memory management. So if you dont have too many operations inside the above said function which will not cause a stack overflow(as creating too many objects on the stack will increase the chances of stack overflow) and provided you dont need the scope of this variable outside you can probably use the stack itself.
So to answer your question, it is not a bad manner to use the stack to avoid dynamic allocation if you do not use too many objects on the stack and cause a stack overflow.
Hope this helps.

C: Malloc and Free

I am trying to undestand the C functions malloc and free. I know this has been discussed a lot on StackOverflow. However, I think I kind of know what these functions do by now. I want to know why to use them. Let's take a look at this piece of code:
int n = 10;
char* array;
array = (char*) malloc(n * sizeof(char));
// Check whether memory could be allocated or not...
// Do whatever with array...
free(array);
array = NULL;
I created a pointer of type char which I called array. Then I used malloc to find a chunk of memory that is currently not used and (10 * sizeof(char)) bytes large. That address I casted to type char pointer before assigning it to my previously created char pointer. Now I can work with my char array. When I am done, I'll use free to free that chunk of memory since it's not being used anymore.
I have one question: Why wouldn't I just do char array[10];? Wikipedia has only one small sentence to give to answer that, and that sentence I unfortunately don't understand:
However, the size of the array is fixed at compile time. If one wishes to allocate a similar array dynamically...
The slide from my university is similarily concise:
It is also possible to allocate memory from the heap.
What is the heap? I know a data structure called heap. :)
However, I've someone could explain to me in which case it makes sense to use malloc and free instead of the regular declaration of a variable, that'd be great. :)
C provides three different possible "storage durations" for objects:
Automatic - local storage that's specific to the invocation of the function it's in. There may be more than one instance of objects created with automatic storage, if a function is called recursively or from multiple threads. Or there may be no instances (if/when the function isn't being called).
Static - storage that exists, in exactly one instance, for the entire duration of the running program.
Allocated (dynamic) - created by malloc, and persists until free is called to free it or the program terminates. Allocated storage is the only type of storage with which you can create arbitrarily large or arbitrarily many objects which you can keep even when functions return. This is what malloc is useful for.
First of all there is no need to cast the malloc
array = malloc(n * sizeof(char));
I have one question: Why wouldn't I just do char array[10];?
What will you do if you don't know how many storage space do you want (Say, if you wanted to have an array of arbitrary size like a stack or linked list for example)?
In this case you have to rely on malloc (in C99 you can use Variable Length Arrays but for small memory size).
The function malloc is used to allocate a certain amount of memory during the execution of a program. The malloc function will request a block of memory from the heap. If the request is granted, the operating system will reserve the requested amount of memory.
When the amount of memory is not needed anymore, you must return it to the operating system by calling the function free.
In simple: you use an array when you know the number of elements the array will need to hold at compile time. you use malloc with pointers when you don't know how many elements the array will need to be at compile time.
For more detail read Heap Management With malloc() and free().
Imagine you want to allocate 1,000 arrays.
If you did not have malloc and free... but needed a declaration in your source for each array, then you'd have to make 1,000 declarations. You'd have to give them all names. (array1, array2, ... array1000).
The idea in general of dynamic memory management is to handle items when the quantity of items is not something you can know in advance at the time you are writing your program.
Regarding your question: Why wouldn't I just do char array[10];?. You can, and most of the time, that will be completely sufficient. However, what if you wanted to do something similar, but much much bigger? Or what if the size of your data needs to change during execution? These are a few of the situations that point to using dynamically allocated memory (calloc() or malloc()).
Understanding a little about how/when the stack and heap are used would be good: When you use malloc() or calloc(), it uses memory from the heap, where automatic/static variables are given memory on the stack, and are freed when you leave the scope of that variable, i.e the function or block it was declared in.
Using malloc and calloc become very useful when the size of the data you need is not known until run-time. When the size is determined, you can easily call one of these to allocate memory onto the heap, then when you are finished, free it with free()
Regarding What is the heap? There is a good discussion on that topic here (slightly different topic, but good discussion)
In response to However, I've someone could explain to me in which case it makes sense to use malloc() and free()...?
In short, If you know what your memory requirements are at build time (before run-time) for a particular variable(s), use static / automatic creation of variables (and corresponding memory usage). If you do not know what size is necessary until run-time, use malloc() or calloc() with a corresponding call to free() (for each use) to create memory. This is of course a rule-of-thumb, and a gross generalization. As you gain experience using memory, you will find scenarios where even when size information is known before run-time, you will choose to dynamically allocate due to some other criteria. (size comes to mind)
If you know in advance that you only require an array of 10 chars, you should just say char array[10]. malloc is useful if you don't know in advance how much storage you need. It is also useful if you need storage that is valid after the current function returns. If you declare array as char array[10], it will be allocated on the stack. This data will not be valid after your function returns. Storage that you obtain from malloc is valid until you call free on it.
Also, there is no need to cast the return value of malloc.
Why to use free after malloc can be understood in the way that it is a good style to free memory as soon as you don't need it. However if you dont free the memory then it would not harm much but only your run time cost will increase.
You may also choose to leave memory unfreed when you exit the program. malloc() uses the heap and the complete heap of a process is freed when the process exits. The only reason why people insist on freeing the memory is to avoid memory leaks.
From here:
Allocation Myth 4: Non-garbage-collected programs should always
deallocate all memory they allocate.
The Truth: Omitted deallocations in frequently executed code cause
growing leaks. They are rarely acceptable. but Programs that retain
most allocated memory until program exit often perform better without
any intervening deallocation. Malloc is much easier to implement if
there is no free.
In most cases, deallocating memory just before program exit is
pointless. The OS will reclaim it anyway. Free will touch and page in
the dead objects; the OS won't.
Consequence: Be careful with "leak detectors" that count allocations.
Some "leaks" are good!
Also the wiki has a good point in Heap base memory allocation:-
The heap method suffers from a few inherent flaws, stemming entirely
from fragmentation. Like any method of memory allocation, the heap
will become fragmented; that is, there will be sections of used and
unused memory in the allocated space on the heap. A good allocator
will attempt to find an unused area of already allocated memory to use
before resorting to expanding the heap. The major problem with this
method is that the heap has only two significant attributes: base, or
the beginning of the heap in virtual memory space; and length, or its
size. The heap requires enough system memory to fill its entire
length, and its base can never change. Thus, any large areas of unused
memory are wasted. The heap can get "stuck" in this position if a
small used segment exists at the end of the heap, which could waste
any magnitude of address space, from a few megabytes to a few hundred.

Arrays of which size require malloc (or global assignment)?

While taking my first steps with C, I quickly noticed that int array[big number] causes my programs to crash when called inside a function. Not quite as quickly, I discovered that I can prevent this from happening by defining the array with global scope (outside the functions) or using malloc.
My question is:
Starting at which size is it necessary to use one of the above methods to make sure my programs won't crash?
I mean, is it safe to use just, e.g., int i; for counters and int chars[256]; for small arrays or should I just use malloc for all local variables?
You should understand what the difference is between int chars[256] in a function and using malloc().
In short, the former places the entire array on the stack. The latter allocates the memory you requested from the heap. Generally speaking, the heap is much larger than the stack, but the size of each can be adjusted.
Another key difference is that a variable allocated on the stack will technically be gone after you return from the method. (Oh, your program may function as though it's not gone for a bit if you continue to access that array, but ho ho ho danger lurks.) A hunk of memory allocated with malloc will remain allocated until you explicitly free it or the program exits.
You should use malloc for your dynamic memory allocation. For statically sized arrays (or any other object) within functions, if the memory required is to big you will quickly get a segmentation fault. I don't think a 'safe limit' can be defined, its probably implementation specific and other factors come in play too, like the current stack and objects within it created by callers to the current function. I would be tempted to say that anything below the page size (usually 4kb) should be safe as long as there is no recursion involved, but I do not think there are such guarantees.
It depends. If you have some guarantee that a line will never be longer than 100 ... 1000 characters you can get away with a fixed size buffer. If you don't: you don't. There is a difference between reading in a neat x KB config file and a x GB XML file (with no CR/LF). It depends.
The next choice is: do you want your program to die gracefully? It is only a design choice.

Why would you ever want to allocate memory on the heap rather than the stack? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
When is it best to use a Stack instead of a Heap and vice versa?
I've read a few of the other questions regarding the heap vs stack, but they seem to focus more on what the heap/stack do rather than why you would use them.
It seems to me that stack allocation would almost always be preferred since it is quicker (just moving the stack pointer vs looking for free space in the heap), and you don't have to manually free allocated memory when you're done using it. The only reason I can see for using heap allocation is if you wanted to create an object in a function and then use it outside that functions scope, since stack allocated memory is automatically unallocated after returning from the function.
Are there other reasons for using heap allocation instead of stack allocation that I am not aware of?
There are a few reasons:
The main one is that with heap allocation, you have the most flexible control over the object's lifetime (from malloc/calloc to free);
Stack space is typically a more limited resource than heap space, at least in default configurations;
A failure to allocate heap space can be handled gracefully, whereas running out of stack space is often unrecoverable.
Without the flexible object lifetime, useful data structures such as binary trees and linked lists would be virtually impossible to write.
You want an allocation to live beyond a function invocation
You want to conserve stack space (which is typically limited to a few MBs)
You're working with re-locatable memory (Win16, databases, etc.), or want to recover from allocation failures.
Variable length anything. You can fake around this, but your code will be really nasty.
The big one is #1. As soon as you get into any sort of concurrency or IPC #1 is everywhere. Even most non-trivial single threaded applications are tricky to devise without some heap allocation. That'd practically be faking a functional language in C/C++.
So I want to make a string. I can make it on the heap or on the stack. Let's try both:
char *heap = malloc(14);
if(heap == NULL)
{
// bad things happened!
}
strcat(heap, "Hello, world!");
And for the stack:
char stack[] = "Hello, world!";
So now I have these two strings in their respective places. Later, I want to make them longer:
char *tmp = realloc(heap, 20);
if(tmp == NULL)
{
// bad things happened!
}
heap = tmp;
memmove(heap + 13, heap + 7);
memcpy(heap + 7, "cruel ", 6);
And for the stack:
// umm... What?
This is only one benefit, and others have mentioned other benefits, but this is a rather nice one. With the heap, we can at least try to make our allocated space larger. With the stack, we're stuck with what we have. If we want room to grow, we have to declare it all up front, and we all know how it stinks to see this:
char username[MAX_BUF_SIZE];
The most obvious rationale for using the heap is when you call a function and need something of unknown length returned. Sometimes the caller may pass a memory block and size to the function, but at other times this is just impractical, especially if the returned stuff is complex (e.g. a collection of different objects with pointers flying around, etc.).
Size limits are a huge dealbreaker in a lot of cases. The stack is usually measured in the low megabytes or even kilobytes (that's for everything on the stack), whereas all modern PCs allow you a few gigabytes of heap. So if you're going to be using a large amount of data, you absolutely need the heap.
just to add
you can use alloca to allocate memory on the stack, but again memory on the stack is limited and also the space exists only during the function execution only.
that does not mean everything should be allocated on the heap. like all design decisions this is also somewhat difficult, a "judicious" combination of both should be used.
Besides manual control of object's lifetime (which you mentioned), the other reasons for using heap would include:
Run-time control over object's size (both initial size and it's "later" size, during the program's execution).
For example, you can allocate an array of certain size, which is only known at run time.
With the introduction of VLA (Variable Length Arrays) in C99, it became possible to allocate arrays of fixed run-time size without using heap (this is basically a language-level implementation of 'alloca' functionality). However, in other cases you'd still need heap even in C99.
Run-time control over the total number of objects.
For example, when you build a binary tree stucture, you can't meaningfully allocate the nodes of the tree on the stack in advance. You have to use heap to allocated them "on demand".
Low-level technical considerations, as limited stack space (others already mentioned that).
When you need a large, say, I/O buffer, even for a short time (inside a single function) it makes more sense to request it from the heap instead of declaring a large automatic array.
Stack variables (often called 'automatic variables') is best used for things you want to always be the same, and always be small.
int x;
char foo[32];
Are all stack allocations, These are fixed at compile time too.
The best reason for heap allocation is that you cant always know how much space you need. You often only know this once the program is running. You might have an idea of limits but you would only want to use the exact amount of space required.
If you had to read in a file that could be anything from 1k to 50mb, you would not do this:-
int readdata ( FILE * f ) {
char inputdata[50*1024*1025];
...
return x;
}
That would try to allocate 50mb on the stack, which would usually fail as the stack is usually limited to 256k anyway.
The stack and heap share the same "open" memory space and will have to eventually come to a point where they meet, if you use the entire segment of memory. Keeping the balance between the space that each of them use will have amortized cost later for allocation and de-allocation of memory a smaller asymptotic value.

Resources