I am writing a program to do some analysis on DNA sequences.
Everything works fine except for this thing.
I want to declare a 2D array of size m*n where m and n are read from an input file.
Now the issue is that if m and n goes too large. As an example if m = 200 and n = 50000
then I get a seg fault at the line where I declare my array.
array[m][n];
Any ideas how to overcome this. I do need such an array as my entire logic depends on how to process this array.
Probably you are running out of stack space.
Can you not allocate the array dynamically on heap using malloc?
You may want to have a look at this answer if you do not know how to do that.
As others have said it is not a good idea to allocate a large VLA (variable length array) on the stack. Allocate it with malloc:
double (*array)[n] = malloc(sizeof(double[m][n]));
and you have an object as before, that is that the compiler perfectly knows how to address individual elements array[i][j] and the allocation still gives you one consecutive blob in memory.
Just don't forget to do
free(array);
at the end of your scope.
Not sure what type you're using but for the following code I've assumed int.
Rather than doing this:
int array[200][50000];
Try doing this:
int** array = (int**)malloc(200);
for (int i = 0; i < 200; i++)
{
array[i] = (int*)malloc(50000);
}
This will allocate "heap" memory rather than "stack" memory. You are asking for over 300mb (if you're using a 32bit type) so you probably don't have that much "stack" memory.
Make sure to cleanup after you're done with the array with:
for (int i = 0; i < 200; i++)
{
free(array[i]);
}
free(array);
Feel free to use m and n instead of the constants I used above!
Edit: I originally wrote this in C++, and converted to C. I am a little more rusty with C memory allocation/deallocation, but I believe I got it right.
You are likely running out of stack space.
Windows for instance gives each thread 1MB stack. Assuming the array contains integers and you are creating it on the stack you are creating a 40MB stack variable.
You should instead dynamically allocate it on the heap.
The array (if local) is allocated in the stack. There is certain limits imposed on the stack size for a process/thread. If the stack is overgrown it will cause issues.
But you can allocate the array in heap using malloc . Typical heap size could be 4GB (this can be more or less depending on OS/Architecture). Check the return value of malloc to make sure that memory for the array is correctly allocated.
Related
What is the difference between defining an array of a length that is definied before runtime (depends on command line arguments) with array[size] and using malloc()? array[i] leads to the data put on the stack, malloc() uses the heap[see this stackoverflow]
So with large Data I can run into stackoverflows, but on a new machine a total of 30 chars and ints should not be problematic (Stack is around 1MB on windows according to this).
So I am probably missing something obvious.
As far as I understand, when defined in main(),the two should be the same:
Example 1
int size; // depends on some logic
char array[size];
Example 2
int size; // depends on some logic
array= (char *)malloc(size * sizeof(char));
free(array); // later after use
But when I use the array inside of functions and hand it over as a pointer (func(char* array)) or as an array (funct(char array[])), sometimes the gdb-debugger let's me know that the function gets handed corrupted data in #1 , using malloc() fixed the issue.
Is array[i], not okay to use when it is not determined at compile time? Is it some scoping issue? This answer has a comment suggesting such a thing, but I don't quite understand if that applies here.
I am using C99.
The main difference is that arrays declared with a fixed size are stack allocated (and references to it or its elements are valid only within its scope) while mallocd arrays are heap allocated.
Stack allocation means that variables are stored directly to the memory. Access to this memory is usually very fast as and it's allocation is done during compilation.
On the other hand, variables allocated on the heap have their memory allocated at runtime and - while accessing this memory is slower - your only limit is the size of virtual memory.
Have a look here:
https://gribblelab.org/CBootCamp/7_Memory_Stack_vs_Heap.html
I'm just starting to learn coding in c, and I have a few questions regarding 2d matrices in combination with the free() command.
I know that you first need to create an array with pointer, pointing to the different columns of the matrix:
double **array = (double **)malloc(5*sizeof(double *));
for(int n = 0; n<5; n++){
array[n] = (double *) malloc(6*sizeof(double));
I know that the correct way to then deallocate this matrix is to first deallocate the individual rows and then the array pointer itself. Something along the lines of:
for (i = 0; i < nX; i++){
free(array[i]); }
free(array);
My question is: Why is this necessary? I know that this incorrect, but why can't you just use: free(array)? This would deallocate the pointer array, to my understanding. Won't the memory that is used by the columns just be overwritten when something else needs acces to it? Would free(array) lead to corrupted memory in any way?
Any help is much appreciated!
Your code, not only allocate memory for array of pointers (the blue array), but in the for loop, you also allocate memory for the red arrays as well. So, free(array) line, alone, will just free the memory allocated by the blue array, but not the red ones. You need to free the red ones, just before loosing the contact with them; that is, before freeing the blue array.
And btw;
Won't the memory that is used by the columns just be overwritten when something else needs acces to it?
No. The operating system will keep track of the memory allocated by your process (program) and will not allow any other process to access the allocated memory until your process terminates. Under normal circumstances —I mean, remembering the C language not having a garbage collector— the OS never knows that you've lost connection with the allocated memory space and will never attempt like, "well, this memory space is not useful for this process anymore; so, let's de-allocate it and serve it for another process."
It would not lead to corruption, no, but would create a memory leak.
If done once in your program, it probably doesn't matter much (a lot of professional/expensive applications have - small,unintentional - memory leaks), but repeat this in a loop, and you may run out of memory after a while. Same thing if your code is called from an external program (if your code is in a library).
Aside: Not freeing buffers can be a useful way (temporarily) to check if the crashes you're getting in your programs originate from corrupt memory allocation or deallocation (when you cannot use Valgrind). But in the end you want to free everything, once.
If you want to perform only one malloc, you could also allocate one big chunk, then compute the addresses of the rows. In the end, just deallocate the big chunk (example here: How do we allocate a 2-D array using One malloc statement)
This is needed because C does not have a garbage collector.
Once you allocate memory with malloc or similar function, it is marked as "in use" for as long as your program is running.
It does not matter if you no longer hold a pointer to this memory in your program.
There is no mechanism in the C language to check this and automatically free the memory.
Also, when you allocate memory with malloc the function does not know what you are using the memory for. For the allocator it is all just bytes.
So when you free a pointer (or array of pointers), there is no logic to "realize" these are pointers that contain memory addresses.
This is simply how the C language is designed: the dynamic memory management is almost1 completely manual - left to the programmer, so you must call free for every call to malloc.
1 C language does handle some of the more tedious tasks needed to dynamically allocate memory in a program such as finding where to get a free continuous chunk of memory of the size you asked for.
Let's take this simple example:
int **ptr = malloc(2*sizeof *ptr);
int *foo = malloc(sizeof *foo);
int *bar = malloc(sizeof *bar);
ptr[0] = foo;
ptr[1] = bar;
free(ptr);
If your suggestion were implemented, foo and bar would now be dangling pointers. How would you solve the scenario if you just want to free ptr?
I want to operate on 10^9 elements. For this they should be stored somewhere but in c, it seems that an array can only store 10^6 elements. So is there any way to operate on such a large number of elements in c?
The error thrown is error: size of array ‘arr’ is too large".
For this they should be stored somewhere but in c it seems that an
array only takes 10^6 elements.
Not at all. I think you're allocating the array in a wrong way. Just writing
int myarray[big_number];
won't work, as it will try to allocate memory on the stack, which is very limited (several MB in size, often, so 10^6 is a good rule of thumb). A better way is to dynamically allocate:
int* myarray;
int main() {
// Allocate the memory
myarray = malloc(big_number * sizeof(int));
if (!myarray) {
printf("Not enough space\n");
return -1;
}
// ...
// Free the allocated memory
free(myarray);
return 0;
}
This will allocate the memory (or, more precise, big_number * 4 bytes on a 32-bit machine) on the heap. Note: This might fail, too, but is mainly limited by the amount of free RAM which is much closer to or even above 10^9 (1 GB).
An array uses a contiguous memory space. Therefore, if your memory is fragmented, you won't be able to use such array. Use a different data structure, like a linked list.
About linked lists:
Wikipedia definition - http://en.wikipedia.org/wiki/Linked_list
Implementation in C - http://www.macs.hw.ac.uk/~rjp/Coursewww/Cwww/linklist.html
On a side note, I tried on my computer, and while I can't create an int[1000000], a malloc(1000000*sizeof(int)) works.
int numbers*;
numbers = malloc ( sizeof(int) * 10 );
I want to know how is this dynamic memory allocation, if I can store just 10 int items to the memory block ? I could just use the array and store elemets dynamically using index. Why is the above approach better ?
I am new to C, and this is my 2nd day and I may sound stupid, so please bear with me.
In this case you could replace 10 with a variable that is assigned at run time. That way you can decide how much memory space you need. But with arrays, you have to specify an integer constant during declaration. So you cannot decide whether the user would actually need as many locations as was declared, or even worse , it might not be enough.
With a dynamic allocation like this, you could assign a larger memory location and copy the contents of the first location to the new one to give the impression that the array has grown as needed.
This helps to ensure optimum memory utilization.
The main reason why malloc() is useful is not because the size of the array can be determined at runtime - modern versions of C allow that with normal arrays too. There are two reasons:
Objects allocated with malloc() have flexible lifetimes;
That is, you get runtime control over when to create the object, and when to destroy it. The array allocated with malloc() exists from the time of the malloc() call until the corresponding free() call; in contrast, declared arrays either exist until the function they're declared in exits, or until the program finishes.
malloc() reports failure, allowing the program to handle it in a graceful way.
On a failure to allocate the requested memory, malloc() can return NULL, which allows your program to detect and handle the condition. There is no such mechanism for declared arrays - on a failure to allocate sufficient space, either the program crashes at runtime, or fails to load altogether.
There is a difference with where the memory is allocated. Using the array syntax, the memory is allocated on the stack (assuming you are in a function), while malloc'ed arrays/bytes are allocated on the heap.
/* Allocates 4*1000 bytes on the stack (which might be a bit much depending on your system) */
int a[1000];
/* Allocates 4*1000 bytes on the heap */
int *b = malloc(1000 * sizeof(int))
Stack allocations are fast - and often preferred when:
"Small" amount of memory is required
Pointer to the array is not to be returned from the function
Heap allocations are slower, but has the advantages:
Available heap memory is (normally) >> than available stack memory
You can freely pass the pointer to the allocated bytes around, e.g. returning it from a function -- just remember to free it at some point.
A third option is to use statically initialized arrays if you have some common task, that always requires an array of some max size. Given you can spare the memory statically consumed by the array, you avoid the hit for heap memory allocation, gain the flexibility to pass the pointer around, and avoid having to keep track of ownership of the pointer to ensure the memory is freed.
Edit: If you are using C99 (default with the gnu c compiler i think?), you can do variable-length stack arrays like
int a = 4;
int b[a*a];
In the example you gave
int *numbers;
numbers = malloc ( sizeof(int) * 10 );
there are no explicit benefits. Though, imagine 10 is a value that changes at runtime (e.g. user input), and that you need to return this array from a function. E.g.
int *aFunction(size_t howMany, ...)
{
int *r = malloc(sizeof(int)*howMany);
// do something, fill the array...
return r;
}
The malloc takes room from the heap, while something like
int *aFunction(size_t howMany, ...)
{
int r[howMany];
// do something, fill the array...
// you can't return r unless you make it static, but this is in general
// not good
return somethingElse;
}
would consume the stack that is not so big as the whole heap available.
More complex example exists. E.g. if you have to build a binary tree that grows according to some computation done at runtime, you basically have no other choices but to use dynamic memory allocation.
Array size is defined at compilation time whereas dynamic allocation is done at run time.
Thus, in your case, you can use your pointer as an array : numbers[5] is valid.
If you don't know the size of your array when writing the program, using runtime allocation is not a choice. Otherwise, you're free to use an array, it might be simpler (less risk to forget to free memory for example)
Example:
to store a 3-D position, you might want to use an array as it's alwaays 3 coordinates
to create a sieve to calculate prime numbers, you might want to use a parameter to give the max value and thus use dynamic allocation to create the memory area
Array is used to allocate memory statically and in one go.
To allocate memory dynamically malloc is required.
e.g. int numbers[10];
This will allocate memory statically and it will be contiguous memory.
If you are not aware of the count of the numbers then use variable like count.
int count;
int *numbers;
scanf("%d", count);
numbers = malloc ( sizeof(int) * count );
This is not possible in case of arrays.
Dynamic does not refer to the access. Dynamic is the size of malloc. If you just use a constant number, e.g. like 10 in your example, it is nothing better than an array. The advantage is when you dont know in advance how big it must be, e.g. because the user can enter at runtime the size. Then you can allocate with a variable, e.g. like malloc(sizeof(int) * userEnteredNumber). This is not possible with array, as you have to know there at compile time the (maximum) size.
Here is what I am working with:
char* qdat[][NUMTBLCOLS];
char** tdat[];
char* ptr_web_data;
// Loop thru each table row of the query result set
for(row_index = 0; row_index < number_rows; row_index++)
{
// Loop thru each column of the query result set and extract the data
for(col_index = 0; col_index < number_cols; col_index++)
{
ptr_web_data = (char*) malloc((strlen(Data) + 1) * sizeof(char));
memcpy (ptr_web_data, column_text, strlen(column_text) + 1);
qdat[row_index][web_data_index] = ptr_web_data;
}
}
tdat[row_index] = qdat[col_index];
After the data is used, the memory allocated is released one at a time using free().
for(row_index = 0; row_index < number_rows; row_index++)
{
// Loop thru all columns used
for(col_index = 0; col_index < SARWEBTBLCOLS; col_index++)
{
// Free memory block pointed to by results set array
free(tdat[row_index][col_index]);
}
}
Is there a way to release all the allocated memory at once, for this array?
Thank You.
Not with the standard malloc() allocator - you need to investigate the use of memory pools. These work by allocating a big block of memory up-front, allocating from it and freeing back to it as you request via their own allocation functions, and then freeing the whole lot with a special "deallocate all" function.
I must say I've always found these things a bit ugly - it really isn't that hard to write code that doesn't leak. The only reason I can see for using them is to mitigate heap fragmentation, if that is a real problem for you.
No there is not. Memory which is separately allocated must be separately freed.
The only way you could free it as once is if you allocated it at once is a giant block. You would then have to do a bit of pointer math to assign every row the correct index into the array but it's not terribly difficult. This approach does have a few downsides though
Extra pointer math
Requires one giant contiguous block of memory vs. N smaller blocks of memory. Can be an issue in low memory or high fragmentation environments.
Extra work for no real stated gain.
If you want to release it all at once, you have to allocate it all at once.
A simple manual solution, if you know the total size you'll need in advance, is to allocate it all in once chunk and index into it as appropriate. If you don't know the size in advance you can use realloc to grow the memory, so long as you only access it indexed from the initial pointer, and don't store additional pointers anywhere.
That being said, direct allocation and deallocation is a simple solution, and harder to get wrong than the alternatives. Unless the loop to deallocate is causing you real difficulties, I would stick with what you have.