Problem with array declaration of large size in C [duplicate] - c

I'm implementing a sequential program for sorting like quicksort. I would like to test the performance of my program in a huge array of 1 or 10 billions of integers.
But the problem is that I obtain a segmentation error due to the size of the array.
A sample code of declaration of this array:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 1000000000
int main(int argc, char **argv)
{
int list[N], i;
srand(time(NULL));
for(i=0; i<N; i++)
list[i] = rand()%1000;
return 0;
}
I got a proposition to use mmap function. But I don't know how to use it ? can anybody help me to use it ?
I'm working on Ubuntu 10.04 64-bit, gcc version 4.4.3.
Thanks for your replies.

Michael is right, you can't fit that much on the stack. However, you can make it global (or static) if you don't want to malloc it.
#include <stdlib.h>
#include <time.h>
#define N 1000000000
static int list[N];
int main(int argc, char **argv)
{
size_t i;
srand(time(NULL));
for(i=0; i<N; i++)
list[i] = rand()%1000;
return 0;
}

You must use malloc for this sort of allocation. That much on the stack will fail nearly every time.
int *list;
list = malloc(N * sizeof(int));
This puts the allocation on the heap where there is a lot more memory available.

You probably don't create so large an array and if you do you certainly don't create it on the stack; the stack just isn't that big.
If you have a 32-bit address space and a 4-byte int, then you can't create an array with a billion ints; there just won't be enough contiguous space in memory for that large an object (there probably won't be enough contiguous space for an object a fraction of that size). If you have a 64-bit address space, you might get away with allocating that much space.
If you really want to try, you'll need either to create it statically (i.e., declare the array at file scope or with the static qualifier in the function) or dynamically (using malloc).

On linux systems malloc of very large chunks just does a mmap under the hood, so it is perhaps too tedious to look into that.
Be careful that you don't have neither overflow (signed integers) nor silent wrap (unsigned integers) for your array bounds and indices. Use size_t as a type for that, since you are on a 64bit machine, this then should work.
But as a habit you should definitively check your bounds against SIZE_MAX, something like assert(N*sizeof(data[0]) <= SIZE_MAX), to be sure.

The stack allocations makes it break. N=1Gig ints => 4Gig of memory (both with a 32-bit and a 64-bit compiler). But
if you want to measure the performance of quicksort, or a similar algorithm of yours, this is not the way to go about it.
Try instead to use multiple quicksorts in sequence on prepared samples with a large size.
-create a large random sample not more than half your available memory.
make sure it doesn''t fill your ram!
If it does all measuring efforts are in vain.
500 M elements is more than enough on a 4 gig system.
-decide on a test size ( e.g. N = 100 000 elements)
-start timer
--- do the algoritm for ( *start # i*N, *end # (i+1)*N)
(rinse repeat for next i until the large random sample is depleted)
-end timer
Now you have a very precise answer to how much time your algorithm has consumed. Run it a few times to get a feel of "how precise" (use a new srand(seed) seed each time). And change the N for more inspection.

Another option is to dynamically allocate a linked list of smaller arrays. You'll have to wrap them with accessor functions, but it's far more likely that you can grab 16 256 MB chunks of memory than a single 4 GB chunk.
typedef struct node_s node, *node_ptr;
struct node_s
{
int data[N/NUM_NODES];
node_ptr next;
};

Related

What maximum size of static arrays are allowed in C?

In my algorithm I know work with static arrays, no dynamic ones. But I sometimes
reach the limit of the stack. Am I right, that static arrays are stored to the stack?
Which parameters affect my maximum stack size for one C programm?
Are there many system parameters which affect the maximal array size? Does the maximunm no. of elements depend of the array type? Does it depend on the total system RAM? Or does every C programm have a static maximum stack size?
Am I right, that static arrays are stored to the stack?
No, static arrays are stored in the static storage area. The automatic ones (i.e. ones declared inside functions, with no static storage specifier) are allocated on the stack.
Which parameters affect my maximum stack size for one C program?
This is system-dependent. On some operating systems you can change stack size programmatically.
Running out of stack space due to automatic storage allocation is a clear sign that you need to reconsider your memory strategy: you should either allocate the buffer in the static storage area if re-entrancy is not an issue, or use dynamic allocation for the largest of your arrays.
Actually, it depends on the C compiler for the platform you use.
As an example, there are even systems which don't have a real stack so recursion won't work.
A static array is compiled as a continuous memory area with pointers. The pointers might be two or four bytes in size (or maybe even only one on exotic platforms).
There are platforms which use memory pages which have "near" and "far" pointers which differ in size (and speed, of course). So it could be the case that the pointers representing the array and the objects need to fit into the same memory page.
On embedded systems, static data usually is collected in the memory area which will later be represented by the read-only memory. So your array will have to fit in there.
On platforms which run arbitrary applications, RAM is the limiting factor if none of the above applies.
Most of your questions have been answered, but just to give an answer that made my life a lot easier:
Qualitatively the maximum size of the non-dynamically allocated array depends on the amount of RAM that you have. Also it depends on the type of the array, e.g. an int may be 4 bytes while a double may be 8 bytes (they are also system dependent), thus you will be able to have an array that is double in number of elements if you use int instead of double.
Having said that and keeping in mind that sometimes numbers are indeed important, here is a very noobish code snippet to help you extract the maximum number in your system.
#include <stdio.h>
#include <stdlib.h>
#define UPPER_LIMIT 10000000000000 // a very big number
int main (int argc, const char * argv[])
{
long int_size = sizeof(int);
for (int i = 1; i < UPPER_LIMIT; i++)
{
int c[i];
for (int j = 0; j < i; j++)
{
c[j] = j;
}
printf("You can set the array size at %d, which means %ld bytes. \n", c[i-1], int_size*c[i-1]);
}
}
P.S.: It may take a while until you reach your system's maximum and produce the expected Segmentation Fault, so you may want to change the initial value of i to something closer to your system's RAM, expressed in bytes.

Segmentation Fault on Small(ish) 2d array

I keep getting a segmentation with the following code. Changing the 4000 to 1000 makes the code run fine. I would think that I have enough memory here... How can I fix this?
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#define MAXLEN 4000
void initialize_mx(float mx[][MAXLEN])
{
int i, j;
float c=0;
for(i=0;i<MAXLEN;i++){
for(j=0;j<MAXLEN;j++) mx[i][j]=c;
}
}
int main(int ac, char *av[])
{
int i, j;
float confmx[MAXLEN][MAXLEN];
initialize_mx(confmx);
return 0;
}
The problem is you're overflowing the stack.
When you call initialize_mx() it allocates stack space for it's local variables (confmx in your case). This space, which is limited by your OS (check ulimit if you're on linux), can get overflowed if local variables are too big.
Basically you can:
Declare confmx as a global variable as cnicutar suggests.
Allocate memory space for your array dynamically. and pass a pointer to initialize_mx()
EDIT: Just realized you must still allocate memory space if you pass a pointer so you have those two options :)
You are using 4000*4000*4 bytes on your stack, if I didn't make any calculation errors, that's 61MB, which is a lot. It works with 1000 because in that case you are only using nearly 4MB on your stack.
4000*4000*sizeof(float)==64000000. I suspect your operating system may have a limit on the stack size between 4 and 64 MB.
As others have noted smallish isn't small for auto class variables which are allocated on the stack.
Depending on your needs, you could
static float confmx[MAXLEN][MAXLEN];
which would allocate the storage in the BSS. You might want to consider a different storage system as one often only needs a sparse matrix and there are more efficient ways to store and access matrices where many of the cells are zero.

Array of size 2^25

I am trying to create an array of size 2^25 in c and then perform some elementary operations on it (memsweep function). The c code is
#include <stdio.h>
#include <time.h>
#define S (8191*4096)
main()
{
clock_t start = clock();
unsigned i;
volatile char large[S];
for (i = 0; i < 10*S; i++)
large[(4096*i+i)%S]=1+large[i%S];
printf("%f\n",((double)clock()-start)/CLOCKS_PER_SEC);
}
I am able to compile it but on execution it gives segmentation fault.
That might be bigger than your stack. You can
Make large global
Use malloc
The array is too big to fit on your stack. Use the heap with char *large = malloc(S) instead.
You don't have that much stack space to allocate an array that big ... on Linux for instance, the stack-size is typically 8192 bytes. You've definitely exceeded that.
The best option would be to allocate the memory on the heap using malloc(). So you would write char* large = malloc(S);. You can still access the array using the [] notation.
Optionally, if you're on Linux, you could on the commandline call sudo ulimit -s X, where X is some number large enough for your array to fit on the stack ... but I'd generally discourage that solution.
Large is being allocated on the stack and you are overflowing it.
Try using char *large = malloc(S)

Basic array usage in C?

Is this how you guys get size of an array in ANSI-C99? Seems kind of, um clunky coming from higher language.
int tests[7];
for (int i=0; i<sizeof(tests)/sizeof(int); i++) {
tests[i] = rand();
}
Also this Segmentation faults.
int r = 10000000;
printf ("r: %i\n", r);
int tests[r];
run it:
r: 10000000
Segmentation fault
10000000 seg faults, but 1000000 works.
How do I get more info out of this? What should I be checking and how would I debug something like this? Is there a limit on C arrays? What's a segmentation fault?
Getting size of an array in C is easy. This will give you the size of array in bytes.
sizeof(x)
But I guess what you require is number of elements, in that case it would be:
sizeof(x) / sizeof(x[0])
You can write a simple macro for this:
#define NumElements(x) (sizeof(x) / sizeof(x[0]))
For example:
int a[10];
int size_a = sizeof(a); /* size in bytes */
int numElm = NumElements(a); /* number of elements, here 10 */
Why calculate the size?
Define a constant containing the size and use that when declaring the array. Reference the constant whenever you want the size of the array.
As a primarily C++ programmer, I'll say that historically the constant was often defined as an enum value or a #define. In C, that may be current rather than historic, though - I don't know how current C handles "const".
If you really want to calculate the size, define a macro to do it. There may even be a standard one.
The reason for the segfault is most likely because the array you're trying to declare is about 40 megabytes worth, and is declared as a local variable. Most operating systems limit the size of the stack. Keep your array on the heap or in global memory, and 40 megabytes for one variable will probably be OK for most systems, though some embedded systems may still cry foul. In a language like Java, all objects are on the heap, and only references are kept on the stack. This is a simple and flexible system, but often much less efficient than storing data on the stack (heap allocation overheads, avoidable heap fragmentation, indirect access overheads...).
Arrays in C don't know how big they are, so yes, you have to do the sizeof array / sizeof array[0] trick to get the number of elements in an array.
As for the segfault issue, I'm guessing that you exceeded your stack size by attempting to allocate 10000000 * sizeof int bytes. A rule of thumb is that if you need more than a few hundred bytes, allocate it dynamically using malloc or calloc instead of trying to create a large auto variable:
int r = 10000000;
int *tests = malloc(sizeof *test * r);
Note that you can treat tests as though it were an array type in most circumstances (i.e., you can subscript it, you can pass it to any function that expects an array, etc.), but it is not an array type; it is a pointer type, so the sizeof tests / sizeof tests[0] trick won't work.
Traditionally, an array has a static size. So we can do
#define LEN 10
int arr[LEN];
but not
int len;
scanf("%d", &len);
int arr[len]; // bad!
Since we know the size of an array at compile time, getting the size of an array tends to trivial. We don't need sizeof because we can figure out the size by looking at our declaration.
C++ provides heap arrays, as in
int len;
scanf("%d", &len);
int *arr = new int[len];
but since this involves pointers instead of stack arrays, we have to store the size in a variable which we pass around manually.
I suspect that it is because of integer overflow. Try printing the value using a printf:
printf("%d", 10000000);
If it prints a negative number - that is the issue.
Stack Overflow! Try allocating on the heap instead of the stack.

How to declare and use huge arrays of 1 billion integers in C?

I'm implementing a sequential program for sorting like quicksort. I would like to test the performance of my program in a huge array of 1 or 10 billions of integers.
But the problem is that I obtain a segmentation error due to the size of the array.
A sample code of declaration of this array:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 1000000000
int main(int argc, char **argv)
{
int list[N], i;
srand(time(NULL));
for(i=0; i<N; i++)
list[i] = rand()%1000;
return 0;
}
I got a proposition to use mmap function. But I don't know how to use it ? can anybody help me to use it ?
I'm working on Ubuntu 10.04 64-bit, gcc version 4.4.3.
Thanks for your replies.
Michael is right, you can't fit that much on the stack. However, you can make it global (or static) if you don't want to malloc it.
#include <stdlib.h>
#include <time.h>
#define N 1000000000
static int list[N];
int main(int argc, char **argv)
{
size_t i;
srand(time(NULL));
for(i=0; i<N; i++)
list[i] = rand()%1000;
return 0;
}
You must use malloc for this sort of allocation. That much on the stack will fail nearly every time.
int *list;
list = malloc(N * sizeof(int));
This puts the allocation on the heap where there is a lot more memory available.
You probably don't create so large an array and if you do you certainly don't create it on the stack; the stack just isn't that big.
If you have a 32-bit address space and a 4-byte int, then you can't create an array with a billion ints; there just won't be enough contiguous space in memory for that large an object (there probably won't be enough contiguous space for an object a fraction of that size). If you have a 64-bit address space, you might get away with allocating that much space.
If you really want to try, you'll need either to create it statically (i.e., declare the array at file scope or with the static qualifier in the function) or dynamically (using malloc).
On linux systems malloc of very large chunks just does a mmap under the hood, so it is perhaps too tedious to look into that.
Be careful that you don't have neither overflow (signed integers) nor silent wrap (unsigned integers) for your array bounds and indices. Use size_t as a type for that, since you are on a 64bit machine, this then should work.
But as a habit you should definitively check your bounds against SIZE_MAX, something like assert(N*sizeof(data[0]) <= SIZE_MAX), to be sure.
The stack allocations makes it break. N=1Gig ints => 4Gig of memory (both with a 32-bit and a 64-bit compiler). But
if you want to measure the performance of quicksort, or a similar algorithm of yours, this is not the way to go about it.
Try instead to use multiple quicksorts in sequence on prepared samples with a large size.
-create a large random sample not more than half your available memory.
make sure it doesn''t fill your ram!
If it does all measuring efforts are in vain.
500 M elements is more than enough on a 4 gig system.
-decide on a test size ( e.g. N = 100 000 elements)
-start timer
--- do the algoritm for ( *start # i*N, *end # (i+1)*N)
(rinse repeat for next i until the large random sample is depleted)
-end timer
Now you have a very precise answer to how much time your algorithm has consumed. Run it a few times to get a feel of "how precise" (use a new srand(seed) seed each time). And change the N for more inspection.
Another option is to dynamically allocate a linked list of smaller arrays. You'll have to wrap them with accessor functions, but it's far more likely that you can grab 16 256 MB chunks of memory than a single 4 GB chunk.
typedef struct node_s node, *node_ptr;
struct node_s
{
int data[N/NUM_NODES];
node_ptr next;
};

Resources