How to insert more than 10^6 elements in a array - c

I want to operate on 10^9 elements. For this they should be stored somewhere but in c, it seems that an array can only store 10^6 elements. So is there any way to operate on such a large number of elements in c?
The error thrown is error: size of array ‘arr’ is too large".

For this they should be stored somewhere but in c it seems that an
array only takes 10^6 elements.
Not at all. I think you're allocating the array in a wrong way. Just writing
int myarray[big_number];
won't work, as it will try to allocate memory on the stack, which is very limited (several MB in size, often, so 10^6 is a good rule of thumb). A better way is to dynamically allocate:
int* myarray;
int main() {
// Allocate the memory
myarray = malloc(big_number * sizeof(int));
if (!myarray) {
printf("Not enough space\n");
return -1;
}
// ...
// Free the allocated memory
free(myarray);
return 0;
}
This will allocate the memory (or, more precise, big_number * 4 bytes on a 32-bit machine) on the heap. Note: This might fail, too, but is mainly limited by the amount of free RAM which is much closer to or even above 10^9 (1 GB).

An array uses a contiguous memory space. Therefore, if your memory is fragmented, you won't be able to use such array. Use a different data structure, like a linked list.
About linked lists:
Wikipedia definition - http://en.wikipedia.org/wiki/Linked_list
Implementation in C - http://www.macs.hw.ac.uk/~rjp/Coursewww/Cwww/linklist.html
On a side note, I tried on my computer, and while I can't create an int[1000000], a malloc(1000000*sizeof(int)) works.

Related

Declaring a 5d array of size 26^5 in C

My question is pretty straightforward.
I'm building a small program to analyse and simulate random text using Markov chains. My first MC had memory of size 2, working on the alphabet {a, b, ..., z}. Therefore, my transition matrix was of size 26 * 26 * 26.
But now, I'd like to enhance my simulation using a MC with memory of size 4. Therefore, I need to store my probabilities of transitions in a 5D array of size 26*26*26*26*26.
The problem is (I believe), that C doesn't allow me to declare and manipulate such a array, as it might be too big. In fact, I got a segmentation faults 11 prompt when writing :
int count[26][26][26][26][26]
Is there a way to get around this restriction?
Thanks!
On a typical PC architecture with 32-bit integers, int count[26][26][26][26][26] creates an object of size 47525504 bytes, 47MB, which is manageable on most current computers, but is likely too large for automatic allocation (aka on the stack).
You can declare count as a global or a static variable, or you can allocate it from the heap and make count a pointer with this declaration:
int (*count)[26][26][26][26] = calloc(sizeof(*count), 26);
if (count == NULL) {
/* handle allocation failure gracefully */
fprintf(stderr, "cannot allocate memory for 5D array\n");
exit(1);
}
Make it global1 or make it static or dynamically allocate the same amount of memory. Dynamic memory allocation allocates memory from a portion of memory which doesn't have the constraint to an extent larger than the one you faced. Variables having automatic storage duration are likely to stored in stack in most implementations. Dynamic memory belongs to heap in most implementations.
You can do this (Illustration):-
int (*a)[26][26][26][26] = malloc(sizeof *a *26);
if(!a){ perror("malloc");exit(1);}
...
free(a);
1static storage duration - all variables defined in file scope have static storage duration.
With this kind of array declaration, your data will be stored in stack. And stack have usually only 8 MB on Unix like systems and 1 MB on Windows. But you need at least 4*26^5 B (roughly 46 MB).
Prefered solution would be allocate this array on heap using malloc.
But you can also instruct compiler to increase the stack size...
Try this
#define max=11881376 //answer of 26*26*26*26*26
int count[max]; //array

Malloc or normal array definition?

When shall i use malloc instead of normal array definition in C?
I can't understand the difference between:
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
and:
array=(int *)malloc(sizeof(int)*sizeof(a));
In general, use malloc() when:
the array is too large to be placed on the stack
the lifetime of the array must outlive the scope where it is created
Otherwise, use a stack allocated array.
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
If used as local variables, both a and array would be allocated on the stack. Stack allocation has its pros and cons:
pro: it is very fast - it only takes one register subtraction operation to create stack space and one register addition operation to reclaim it back
con: stack size is usually limited (and also fixed at link time on Windows)
In both cases the number of elements in each arrays is a compile-time constant: 3 is obviously a constant while sizeof(a)/sizeof(int) can be computed at compile time since both the size of a and the size of int are known at the time when array is declared.
When the number of elements is known only at run-time or when the size of the array is too large to safely fit into the stack space, then heap allocation is used:
array=(int *)malloc(sizeof(int)*sizeof(a));
As already pointed out, this should be malloc(sizeof(a)) since the size of a is already the number of bytes it takes and not the number of elements and thus additional multiplication by sizeof(int) is not necessary.
Heap allocaiton and deallocation is relatively expensive operation (compared to stack allocation) and this should be carefully weighted against the benefits it provides, e.g. in code that gets called multitude of times in tight loops.
Modern C compilers support the C99 version of the C standard that introduces the so-called variable-length arrays (or VLAs) which resemble similar features available in other languages. VLA's size is specified at run-time, like in this case:
void func(int n)
{
int array[n];
...
}
array is still allocated on the stack as if memory for the array has been allocated by a call to alloca(3).
You definately have to use malloc() if you don't want your array to have a fixed size. Depending on what you are trying to do, you might not know in advance how much memory you are going to need for a given task or you might need to dynamically resize your array at runtime, for example you might enlarge it if there is more data coming in. The latter can be done using realloc() without data loss.
Instead of initializing an array as in your original post you should just initialize a pointer to integer like.
int* array; // this variable will just contain the addresse of an integer sized block in memory
int length = 5; // how long do you want your array to be;
array = malloc(sizeof(int) * length); // this allocates the memory needed for your array and sets the pointer created above to first block of that region;
int newLength = 10;
array = realloc(array, sizeof(int) * newLength); // increase the size of the array while leaving its contents intact;
Your code is very strange.
The answer to the question in the title is probably something like "use automatically allocated arrays when you need quite small amounts of data that is short-lived, heap allocations using malloc() for anything else". But it's hard to pin down an exact answer, it depends a lot on the situation.
Not sure why you are showing first an array, then another array that tries to compute its length from the first one, and finally a malloc() call which tries do to the same.
Normally you have an idea of the number of desired elements, rather than an existing array whose size you want to mimic.
The second line is better as:
int array[sizeof a / sizeof *a];
No need to repeat a dependency on the type of a, the above will define array as an array of int with the same number of elements as the array a. Note that this only works if a is indeed an array.
Also, the third line should probably be:
array = malloc(sizeof a);
No need to get too clever (especially since you got it wrong) about the sizeof argument, and no need to cast malloc()'s return value.

A suitable replacement for *****double

I have this big data structure which is a list of lists of lists of lists of lists of doubles. Clearly it's extremely inefficient to handle. Around 70% of time spent to run my application is used to write zeros in the doubles at the end of the lists. I need a faster replacement which satisfies two constraints:
1)All the memory must be allocated continuously (that is, a huge chunk of memory)
2)I must access this chunk using the usual A[][][][][] syntax
As for now, I thought of using a *double to hold the entire chunk and reuse my list of lists of... to store pointers to the appropriate areas in the chunk.
Any better ideas?
One example of how to achieve this with a 2D array, I'm to lazy to do the 5D case, is
double **a;
a = malloc (n * sizeof(*double));
a[0] = malloc (n * m * sizeof(double));
for (int i = 1; i < n; ++i)
a[i] = a[0][i*n];
This way you can decide if you wish to index it with a[0][i*n], or a[i][j]. The memory is contiguous, and you get away with only two allocations. Of course, this also requires a free n*m*sizeof(double) block in memory, but since you demand the memory to be allocated continuously I expect this to be satisfied. This also means that you will have to delete it correctly with:
free(a[0]);
free(a);
so I would make a create5Darray (n,m,k,l,t) and a delete5Darray function to make this easier.

Seg fault in case of large 2D array

I am writing a program to do some analysis on DNA sequences.
Everything works fine except for this thing.
I want to declare a 2D array of size m*n where m and n are read from an input file.
Now the issue is that if m and n goes too large. As an example if m = 200 and n = 50000
then I get a seg fault at the line where I declare my array.
array[m][n];
Any ideas how to overcome this. I do need such an array as my entire logic depends on how to process this array.
Probably you are running out of stack space.
Can you not allocate the array dynamically on heap using malloc?
You may want to have a look at this answer if you do not know how to do that.
As others have said it is not a good idea to allocate a large VLA (variable length array) on the stack. Allocate it with malloc:
double (*array)[n] = malloc(sizeof(double[m][n]));
and you have an object as before, that is that the compiler perfectly knows how to address individual elements array[i][j] and the allocation still gives you one consecutive blob in memory.
Just don't forget to do
free(array);
at the end of your scope.
Not sure what type you're using but for the following code I've assumed int.
Rather than doing this:
int array[200][50000];
Try doing this:
int** array = (int**)malloc(200);
for (int i = 0; i < 200; i++)
{
array[i] = (int*)malloc(50000);
}
This will allocate "heap" memory rather than "stack" memory. You are asking for over 300mb (if you're using a 32bit type) so you probably don't have that much "stack" memory.
Make sure to cleanup after you're done with the array with:
for (int i = 0; i < 200; i++)
{
free(array[i]);
}
free(array);
Feel free to use m and n instead of the constants I used above!
Edit: I originally wrote this in C++, and converted to C. I am a little more rusty with C memory allocation/deallocation, but I believe I got it right.
You are likely running out of stack space.
Windows for instance gives each thread 1MB stack. Assuming the array contains integers and you are creating it on the stack you are creating a 40MB stack variable.
You should instead dynamically allocate it on the heap.
The array (if local) is allocated in the stack. There is certain limits imposed on the stack size for a process/thread. If the stack is overgrown it will cause issues.
But you can allocate the array in heap using malloc . Typical heap size could be 4GB (this can be more or less depending on OS/Architecture). Check the return value of malloc to make sure that memory for the array is correctly allocated.

how is dynamic memory allocation better than array?

int numbers*;
numbers = malloc ( sizeof(int) * 10 );
I want to know how is this dynamic memory allocation, if I can store just 10 int items to the memory block ? I could just use the array and store elemets dynamically using index. Why is the above approach better ?
I am new to C, and this is my 2nd day and I may sound stupid, so please bear with me.
In this case you could replace 10 with a variable that is assigned at run time. That way you can decide how much memory space you need. But with arrays, you have to specify an integer constant during declaration. So you cannot decide whether the user would actually need as many locations as was declared, or even worse , it might not be enough.
With a dynamic allocation like this, you could assign a larger memory location and copy the contents of the first location to the new one to give the impression that the array has grown as needed.
This helps to ensure optimum memory utilization.
The main reason why malloc() is useful is not because the size of the array can be determined at runtime - modern versions of C allow that with normal arrays too. There are two reasons:
Objects allocated with malloc() have flexible lifetimes;
That is, you get runtime control over when to create the object, and when to destroy it. The array allocated with malloc() exists from the time of the malloc() call until the corresponding free() call; in contrast, declared arrays either exist until the function they're declared in exits, or until the program finishes.
malloc() reports failure, allowing the program to handle it in a graceful way.
On a failure to allocate the requested memory, malloc() can return NULL, which allows your program to detect and handle the condition. There is no such mechanism for declared arrays - on a failure to allocate sufficient space, either the program crashes at runtime, or fails to load altogether.
There is a difference with where the memory is allocated. Using the array syntax, the memory is allocated on the stack (assuming you are in a function), while malloc'ed arrays/bytes are allocated on the heap.
/* Allocates 4*1000 bytes on the stack (which might be a bit much depending on your system) */
int a[1000];
/* Allocates 4*1000 bytes on the heap */
int *b = malloc(1000 * sizeof(int))
Stack allocations are fast - and often preferred when:
"Small" amount of memory is required
Pointer to the array is not to be returned from the function
Heap allocations are slower, but has the advantages:
Available heap memory is (normally) >> than available stack memory
You can freely pass the pointer to the allocated bytes around, e.g. returning it from a function -- just remember to free it at some point.
A third option is to use statically initialized arrays if you have some common task, that always requires an array of some max size. Given you can spare the memory statically consumed by the array, you avoid the hit for heap memory allocation, gain the flexibility to pass the pointer around, and avoid having to keep track of ownership of the pointer to ensure the memory is freed.
Edit: If you are using C99 (default with the gnu c compiler i think?), you can do variable-length stack arrays like
int a = 4;
int b[a*a];
In the example you gave
int *numbers;
numbers = malloc ( sizeof(int) * 10 );
there are no explicit benefits. Though, imagine 10 is a value that changes at runtime (e.g. user input), and that you need to return this array from a function. E.g.
int *aFunction(size_t howMany, ...)
{
int *r = malloc(sizeof(int)*howMany);
// do something, fill the array...
return r;
}
The malloc takes room from the heap, while something like
int *aFunction(size_t howMany, ...)
{
int r[howMany];
// do something, fill the array...
// you can't return r unless you make it static, but this is in general
// not good
return somethingElse;
}
would consume the stack that is not so big as the whole heap available.
More complex example exists. E.g. if you have to build a binary tree that grows according to some computation done at runtime, you basically have no other choices but to use dynamic memory allocation.
Array size is defined at compilation time whereas dynamic allocation is done at run time.
Thus, in your case, you can use your pointer as an array : numbers[5] is valid.
If you don't know the size of your array when writing the program, using runtime allocation is not a choice. Otherwise, you're free to use an array, it might be simpler (less risk to forget to free memory for example)
Example:
to store a 3-D position, you might want to use an array as it's alwaays 3 coordinates
to create a sieve to calculate prime numbers, you might want to use a parameter to give the max value and thus use dynamic allocation to create the memory area
Array is used to allocate memory statically and in one go.
To allocate memory dynamically malloc is required.
e.g. int numbers[10];
This will allocate memory statically and it will be contiguous memory.
If you are not aware of the count of the numbers then use variable like count.
int count;
int *numbers;
scanf("%d", count);
numbers = malloc ( sizeof(int) * count );
This is not possible in case of arrays.
Dynamic does not refer to the access. Dynamic is the size of malloc. If you just use a constant number, e.g. like 10 in your example, it is nothing better than an array. The advantage is when you dont know in advance how big it must be, e.g. because the user can enter at runtime the size. Then you can allocate with a variable, e.g. like malloc(sizeof(int) * userEnteredNumber). This is not possible with array, as you have to know there at compile time the (maximum) size.

Resources