Large dynamic 2D array allocation fails - C language - c

I am trying to define a large dynamic size 2D array in C. When the array size is not very big (e.g. 20,000 x 20,000), memory allocation works perfectly. However, when the array size is greater than 50K x 50K memory allocation fails. Here is the sample code.
long int i = 0;
int **Cf = calloc(30000, sizeof *Cf);
for( i = 0; i < 30000; i++){
Cf[i] = calloc(30000, sizeof *(Cf[i]));}
The What is the maximum size of buffers memcpy/memset etc. can handle? post has explained why this issue occurs. However, it does not say what to do when you need large dynamic 2D arrays in your program. I am sure there should be an efficient way of doing this since many scientific applications need this feature. Any suggestion?

Related

Size of Array not showing the expectation

The goal is to get the number of even arrays and that of the odd arrays.
The output should be approximately 50% each, I have tried:
#include <stdio.h>
#include<stdlib.h>
#include<time.h>
int main()
{
int arr[10]={0}, random_number, i, odd_saver[]={0}, even_saver[]={0};
srand( time( NULL ) );
for (i=0; i<10000; i++){
random_number= (10*rand())/(RAND_MAX+1);
arr[random_number]+=1;
if (arr[random_number]%2==0){
even_saver[random_number]+=1;
}else{
odd_saver[random_number]+=1;
}
}
printf("\n");
size_t even_size = sizeof(even_saver[random_number]) / sizeof(even_saver[0]);
size_t odd_size = sizeof(odd_saver[random_number]) / sizeof(odd_saver[0]);
printf("%d %.2f%%\n", (int)even_size, (double)(even_size*100)/10000);
printf("%d %.2f%%\n", (int)odd_size, (double)(odd_size*100)/10000);
return 0;
}
but the output is not according to my expectation.
I need help, and more explanation about what I am doing wrong will be highly appreciated.
The output:
1 0.01%
1 0.01%
As H.R. Emon has implied, in C you cannot create odd_saver[]={0}, even_saver[]={0}; as arrays of size 1 and later try to increase their size by adding to them.
You aim to calculate your indexes for accessing all arrays in your code to be 0..9, which matches the array arr of size 10. (Though the method of calculating random numbers could be discussed...)
With that assumption, you can create all of your arrays the same way:
int arr[10]={0}, random_number, i, odd_saver[10]={0}, even_saver[10]={0};
I think your goal is to output the number of different even and different odd numbers (i.e. not counting multiple occurences of the same).
For that you cannot use the size of the even/odd arrays. For one because in C there are no dynamically growing arrays (as H.R. Emon has pointed out). But also because once you have an 8 or 9 occurring, incrementing that index in the arrays would (if such arrays would exist in C) falsely get you too high a size.
You will simply have to count the non-zeros in your even/odd arrays.
(By the way, it should be possible to use even/odd arrays of half the size, by dividing the index by 2 and using appropriate offsets.)
i am afraid you are trying to do things that c doesn't allow you to do. you are trying to invoke dynamic array in c. but c don't support dynamic array. so it will lead to you undefined behavior. if you need dynamic allocation you can use vector std::vector it's a stl function which helps you to use to allocate memory dynamically.
vector details

create two-dimensional array/matrix using C

I need to read a file with some kind of a matrix from CSV file(number of matrix columns and rows may be different every time) using C.
The file will be something like that:
#,#,#,#,#,#,.,#,.,.,.$
#,.,#,.,.,#,.,#,#,#,#$
#,.,#,.,.,.,.,.,.,#,#$
#,.,#,.,.,#,#,#,#,#,#$
#,.,.,#,.,.,.,.,.,.,#$
#,.,.,.,#,.,#,#,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,#,.,.,.,.,#$
#,.,.,.,.,.,.,.,.,.,#$
#,#,#,#,#,#,#,#,#,.,#$
I need to read the file and save it to a two-dimensional array to be able to iterate through it and find the path out of the labyrinth using Lee algorithm.
So I want to do somenthing like:
int fd = open (argv[i], O_RDONLY);
while (read(fd, &ch, 1)) {
here should be some for loops to find the number of colums and rows.
}
Unfortunately, I don't know how to do that if heigth and width of the matrix is unknown.
I was trying to do that:
while (read (fd, &ch, 1)) {
for (int i = 0; arr[i] != '\0'; i++) {
for (int j = 0; j != '\n'; j++) {
somehow save the values, number of columns and rows.
}
}
}
However, number of rows could be greater than number of columns.
Any help will be appreciated
If the size isn't known but has to be determined as you parse the file, then a simple but a bit naive idea would be to use a char** rows = malloc(n); where n is a sufficiently large number to cover most normal use-cases. realloc if you go past n.
Then for each row you read, store it inside rows[i] through another malloc followed by strcpy/memcpy.
A smarter version of the same would be to first read the first row, find the row length and then assume that all rows in the file have that size. You can do do a char (*rows)[n] = malloc (n * (row_length+1) ); to allocate a true 2D array. This has advantages over the char**, since you get a proper cache-friendly 2D array with faster access, faster allocation and less heap fragmentation. See Correctly allocating multi-dimensional arrays for details about that.
Another big advantage of the char (*rows)[n] is that if you know n in advance, you can actually read/fread the whole file in one go, which would be a significant performance boost since file I/O will be the bottleneck in this program.
If you don't know n in advance, you would still have to realloc in case you end up reading more than n rows. So a third option would be to use a linked list, which is probably the worst option since it is slow and adds complexity. The only advantage being that a link list allows you to swiftly add/remove rows on the fly.

copying 2d array of type (double **2darray) to GPU using cuda [duplicate]

I am looking into how to copy a 2D array of variable width for each row into the GPU.
int rows = 1000;
int cols;
int** host_matrix = malloc(sizeof(*int)*rows);
int *d_array;
int *length;
...
Each host_matrix[i] might have a different length, which I know length[i], and there is where the problem starts. I would like to avoid copying dummy data. Is there a better way of doing it?
According to this thread, that won't be a clever way of doing it:
cudaMalloc(d_array, rows*sizeof(int*));
for(int i = 0 ; i < rows ; i++) {
cudaMalloc((void **)&d_array[i], length[i] * sizeof(int));
}
But I cannot think of any other method. Is there any other smarter way of doing it?
Can it be improved using cudaMallocPitch and cudaMemCpy2D ??
The correct way to allocate an array of pointers for the GPU in CUDA is something like this:
int **hd_array, **d_array;
hd_array = (int **)malloc(nrows*sizeof(int*));
cudaMalloc(d_array, nrows*sizeof(int*));
for(int i = 0 ; i < nrows ; i++) {
cudaMalloc((void **)&hd_array[i], length[i] * sizeof(int));
}
cudaMemcpy(d_array, hd_array, nrows*sizeof(int*), cudaMemcpyHostToDevice);
(disclaimer: written in browser, never compiled, never tested, use at own risk)
The idea is that you assemble a copy of the array of device pointers in host memory first, then copy that to the device. For your hypothetical case with 1000 rows, that means 1001 calls to cudaMalloc and then 1001 calls to cudaMemcpy just to set up the device memory allocations and copy data into the device. That is an enormous overhead penalty, and I would counsel against trying it; the performance will be truly terrible.
If you have very jagged data and need to store it on the device, might I suggest taking a cue of the mother of all jagged data problems - large, unstructured sparse matrices - and copy one of the sparse matrix formats for your data instead. Using the classic compressed sparse row format as a model you could do something like this:
int * data, * rows, * lengths;
cudaMalloc(rows, nrows*sizeof(int));
cudaMalloc(lengths, nrows*sizeof(int));
cudaMalloc(data, N*sizeof(int));
In this scheme, store all the data in a single, linear memory allocation data. The ith row of the jagged array starts at data[rows[i]] and each row has a length of length[i]. This means you only need three memory allocation and copy operations to transfer any amount of data to the device, rather than nrows in your current scheme, ie. it reduces the overheads from O(N) to O(1).
I would put all the data into one array. Then compose another array with the row lengths, so that A[0] is the length of row 0 and so on. so A[i] = length[i]
Then you need just to allocate 2 arrays on the card and call memcopy twice.
Of course it's a little bit of extra work, but i think performance wise it will be an improvement (depending of course on how you use the data on the card)

Benefits of contiguous memory allocation

In terms of performance, what are the benefits of allocating a contiguous memory block versus separate memory blocks for a matrix? I.e., instead of writing code like this:
char **matrix = malloc(sizeof(char *) * 50);
for(i = 0; i < 50; i++)
matrix[i] = malloc(50);
giving me 50 disparate blocks of 50 bytes each and one block of 50 pointers, if I were to instead write:
char **matrix = malloc(sizeof(char *) * 50 + 50 * 50);
char *data = matrix + sizeof(char *) * 50;
for(i = 0; i < 50; i++) {
matrix[i] = data;
data += 50;
}
giving me one contiguous block of data, what would the benefits be? Avoiding cache misses is the only thing I can think of, and even that's only for small amounts of data (small enough to fit on the cache), right? I've tested this on a small application and have noticed a small speed-up and was wondering why.
It's complicated - you need to measure.
Using an intermediate pointer instead of calculating addresses in a two-dimensional array is most likely a loss on current processors, and both of your examples do that.
Next, everything fitting into L1 cache is a big win. malloc () most likely rounds up to multiples of 64 bytes. 180 x 180 = 32,400 bytes might fit into L1 cache, while individual mallocs might allocate 180 x 192 = 34,560 bytes might not fit, especially if you add another 180 pointers.
One contiguous array means you know how the data fits into cache lines, and you know you'll have the minimum number of page table lookups in the hardware. With hundreds of mallocs, no guarantee.
Watch Scott Meyers' "CPU Caches and Why You Care" presentation on Youtube. The performance gains can be entire orders of magnitude.
https://www.youtube.com/watch?v=WDIkqP4JbkE
As for the discussion above, the intermediate pointer argument died a long time ago. Compilers optimize them away. An N-Dimensional array is allocated as a flat 1D vector, ALWAYS. If you do std::vector>, THEN you might get the equivalent of an ordered forward list of vectors, but for raw arrays, they're always allocated as one long, contiguous strip in a flat manner, and multi-dimensional access reduces to pointer arithmetic the same way 1-Dimensional access does.
To access array[i][j][k] (assume width, height, depth of {A, B, C}), you add i*(BC) + (jC) + k to the address at the front of the array. You'd have to do this math manually in a 1-D representation anyway.

Optimising C for performance vs memory optimisation using multidimensional arrays

I am struggling to decide between two optimisations for building a numerical solver for the poisson equation.
Essentially, I have a two dimensional array, of which I require n doubles in the first row, n/2 in the second n/4 in the third and so on...
Now my difficulty is deciding whether or not to use a contiguous 2d array grid[m][n], which for a large n would have many unused zeroes but would probably reduce the chance of a cache miss. The other, and more memory efficient method, would be to dynamically allocate an array of pointers to arrays of decreasing size. This is considerably more efficient in terms of memory storage but would it potentially hinder performance?
I don't think I clearly understand the trade-offs in this situation. Could anybody help?
For reference, I made a nice plot of the memory requirements in each case:
There is no hard and fast answer to this one. If your algorithm needs more memory than you expect to be given then you need to find one which is possibly slower but fits within your constraints.
Beyond that, the only option is to implement both and then compare their performance. If saving memory results in a 10% slowdown is that acceptable for your use? If the version using more memory is 50% faster but only runs on the biggest computers will it be used? These are the questions that we have to grapple with in Computer Science. But you can only look at them once you have numbers. Otherwise you are just guessing and a fair amount of the time our intuition when it comes to optimizations are not correct.
Build a custom array that will follow the rules you have set.
The implementation will use a simple 1d contiguous array. You will need a function that will return the start of array given the row. Something like this:
int* Get( int* array , int n , int row ) //might contain logical errors
{
int pos = 0 ;
while( row-- )
{
pos += n ;
n /= 2 ;
}
return array + pos ;
}
Where n is the same n you described and is rounded down on every iteration.
You will have to call this function only once per entire row.
This function will never take more that O(log n) time, but if you want you can replace it with a single expression: http://en.wikipedia.org/wiki/Geometric_series#Formula
You could use a single array and just calculate your offset yourself
size_t get_offset(int n, int row, int column) {
size_t offset = column;
while (row--) {
offset += n;
n << 1;
}
return offset;
}
double * array = calloc(sizeof(double), get_offset(n, 64, 0));
access via
array[get_offset(column, row)]

Resources