Understanding efficient contiguous memory allocation for a 2D array

Understanding efficient contiguous memory allocation for a 2D array - c

The following code is from pg. 93 of Parallel and High Performance Computing and is a single contiguous memory allocation for a 2D array:
double **malloc_2D(int nrows, int ncols) {
double **x = (double **)malloc(
nrows*sizeof(double*)
+ nrows*ncols*sizeof(double)); // L1
x[0] = (double *)x + nrows; // L2
for (int j = 1; j < nrows; j++) { // L3
x[j] = x[j-1] + ncols;
}
return x;
}
The book states that this improves memory allocation and cache efficiency. Is there any reason w.r.t efficiency to prefer the first code to something like the below code? It seems like the below code is more readable, and it's also easily usable with MPI (I only mention this because the book also covers MPI later).
double *malloc_2D(int nrows, int ncols) {
double *M = (double *)malloc(nrows * ncols * sizeof(double))
return M
}
I include the below image to make sure that my mental model of the first code is correct. If it is not, please mention that in the answer. The image is the result of calling the first function to create a 5 x 2 matrix. Note that I just write the indices in the boxes in the below image for clarity, of course the values stored at these memory locations will not be 0 through 14. Also note that L# refers to lines in the first code.

The book states that this improves memory allocation and cache efficiency.
The book’s code improves efficiency relative to a too-often seen method of allocating pointers separately, as in:
double **x = malloc(nrows * sizeof *x);
for (size_t i = 0; i < nrows; ++i)
x[i] = malloc(ncols * sizeof *x[i]);
(Note that all methods should test the malloc result and handle allocation failures. This is elided for the discussion here.)
That method allocates each row separately (from other rows and from the pointers). The book’s method has some benefit that only one allocation is done and that the memory for the array is contiguous. Also, the relationships between elements in different rows are known, and that may allow programmers to take advantage of the relationships in designing algorithms that work well with cache and memory access.
Is there any reason w.r.t efficiency to prefer the first code to something like the below code?
Not for efficiency, no. Both the book’s method and the method above have the disadvantage that they generally require a pointer lookup for every array access (aside from the base pointer, x). Before the processor can get an element from the memory of a row, it has to get the address of the row from memory.
With the method you show, this additional lookup is unnecessary. Further, the processor and/or the compiler may be able to predict some things about the accesses. For example, with your method, the compiler may be able to see that M[(i+1)*ncols + j] is a different element from M[(i+2)*cols + j], whereas with x[i+1][j] and x[i+2][j], it generally cannot know the two pointers x[i+1] and x[i+2] are different.
The book’s code is also defective. The number of bytes it allocates is nrows*sizeof(double*) + nrows*ncols*sizeof(double). Lets say r is nrows, c is ncols, p is sizeof(double*) and d is sizeof(double). Then the code allocates rp + rcd bytes. Then the code sets x[0] to (double *)x + nrows. Because it casts to double *, the addition of nrows is done in units of the pointed-to type, double. So this adds rd bytes to the starting address. And, after that, it expects to have all the elements of the array, which is rcd bytes. So the code is using rd + rcd bytes even though it allocated rp + rcd. If p > d, some elements at the end of the array will be outside of the allocated memory. In current ordinary C implementations, the size of double * is less than or equal to the size of double, but this should not be relied on. Instead of setting x[0] to (double *)x + nrows;, it should calculate x plus the size of nrows elements of type double * plus enough padding to get to the alignment requirement of double, and it should include that padding in the allocation.
If we cannot use variable length arrays, then the array indexing can be provided by a macro, as by defining a macro that replaces x(i, j) with x[i*ncols+j], such as #define x(i, j) x[(i)*ncols + (j)].

Related

Having a little trouble understanding memory allocation in C

So I am learning how to program in C, and am starting to learn about dynamic memory allocation. What I know is that not all the time will your program know how much memory it needs at run time.
I have this code:
#include <stdio.h>
int main() {
int r, c, i, j;
printf("Rows?\n");
scanf("%d", &r);
printf("Columns?\n");
scanf("%d", &c);
int array[r][c];
for (i = 0; i < r; i++)
for (j = 0; j < c; j++)
array[i][j] = rand() % 100 + 1;
return 0;
}
So if I wanted to create a 2D array, I can just declare one and put numbers in the brackets. But here in this code, I am asking the user how many rows and columns they would like, then declaring an array with those variables, I then filled up the rows and columns with random integers.
So my question is: Why don't I have to use something like malloc here? My code doesn't know how many rows and columns I am going to put in at run time, so why do I have access to that array with my current code?

So my question is: why don't I have to use something like malloc here?
My code doesn't know how many rows and columns I am going to put in at
run time, so why do I have access to that array with my current code?
You are using a C feature called "variable-length arrays". It was introduced in C99 as a mandatory feature, but support for it is optional in C11 and C18. This alternative to dynamic allocation carries several limitations with it, among them:
because the feature is optional, code that unconditionally relies on it is not portable to implementations that do not support the feature
implementations that support VLAs typically store local VLAs on the stack, which is prone to producing stack overflows if at runtime the array dimension is large. (Dynamically-allocated space is usually much less sensitive to such issues. Large, fixed-size automatic arrays can be an issue too, but the potential for trouble with these is obvious in the source code, and it is less likely to evade detection during testing.)
the program still needs to know the dimensions of your array before its declaration, and the dimensions at the point of the declaration are fixed for the lifetime of the array. Unlike dynamically-allocated space, VLAs cannot be resized.
there are contexts that accommodate ordinary, fixed length arrays, but not VLAs, such as file-scope variables.

Your array is allocated on the stack, so when the function (in your case, main()) exits the array vanishes into the air. Had you allocated it with malloc() the memory would be allocated on the heap, and would stay allocated forever (until you free() it). The size of the array IS known at run time (but not at compile time).

In your program, the array is allocated with automatic storage, aka on the stack, it will be released automatically when leaving the scope of definition, which is the body of the function main. This method, passing a variable expression as the size of an array in a definition, introduced in C99, is known as variable length array or VLA.
If the size is too large, or negative, the definition will have undefined behavior, for example causing a stack overflow.
To void such potential side effects, you could check the values of the dimensions and use malloc or calloc:
#include <stdio.h>
#include <stdlib.h>
int main() {
int r, c, i, j;
printf("Rows?\n");
if (scanf("%d", &r) != 1)
return 1;
printf("Columns?\n");
if (scanf("%d", &c) != 1)
return 1;
if (r <= 0 || c <= 0) {
printf("invalid matrix size: %dx%d\n", r, c);
return 1;
}
int (*array)[c] = calloc(r, sizeof(*array));
if (array == NULL) {
printf("cannot allocate memory for %dx%d matrix\n", r, c);
return 1;
}
for (i = 0; i < r; i++) {
for (j = 0; j < c; j++) {
array[i][j] = rand() % 100 + 1;
}
}
free(array);
return 0;
}
Note that int (*array)[c] = calloc(r, sizeof(*array)); is also a variable length array definition: array is a pointer to arrays of c ints. sizeof(*array) is sizeof(int[c]), which evaluates at run time to (sizeof(int) * c), so the space allocated for the matrix is sizeof(int) * c * r as expected.

The point of dynamic memory allocation (malloc()) is not that it allows for supplying the size at run time, even though that is also one of its important features. The point of dynamic memory allocation is, that it survives the function return.
In object oriented code, you might see functions like this:
Object* makeObject() {
Object* result = malloc(sizeof(*result));
result->someMember = ...;
return result;
}
This creator function allocates memory of a fixed size (sizeof is evaluated at compile time!), initializes it, and returns the allocation to its caller. The caller is free to store the returned pointer wherever it wants, and some time later, another function
void destroyObject(Object* object) {
... //some cleanup
free(object);
}
is called.
This is not possible with automatic allocations: If you did
Object* makeObject() {
Object result;
result->someMember = ...;
return &result; //Wrong! Don't do this!
}
the variable result ceases to exist when the function returns to its caller, and the returned pointer will be dangling. When the caller uses that pointer, your program exhibits undefined behavior, and pink elephants may appear.
Also note that space on the call stack is typically rather limited. You can ask malloc() for a gigabyte of memory, but if you try to allocate the same amount as an automatic array, your program will most likely segfault. That is the second reason d'etre for malloc(): To provide a means to allocate large memory objects.

The classic way of handling a 2D array in 'C' where the dimensions might change is to declare it as a sufficiently sized one dimensional array and then have a routine / macro / calculation that calculates the element number of that 1D array given the specified row, column, element size, and number of columns in that array.
So, let's say you want to calculate the address offset in a table for 'specifiedRow' and 'specifiedCol' and the array elements are of 'tableElemSize' size and the table has 'tableCols' columns. That offset could be calculated as such:
addrOffset = specifiedRow * tableCols * tableElemSize + (specifiedCol * tableElemSize);
You could then add this to the address of the start of the table to get a pointer to the element desired.
This is assuming that you have an array of bytes, not integers or some other structure. If something larger than a byte, then the 'tableElemSize' is not going to be needed. It depends upon how you want to lay it out in memory.
I do not think that the way that you are doing it is something that is going to be portable across a lot of compilers and would suggest against it. If you need a two dimensional array where the dimensions can be dynamically changed, you might want to consider something like the MATRIX 'object' that I posted in a previous thread.
How I can merge two 2D arrays according to row in c++
Another solution would be dynamically allocated array of dynamically allocated arrays. This takes up a bit more memory than a 2D array that is allocated at compile time and the elements in the array are not contiguous (which might matter for some endeavors), but it will still give you the 'x[i][j]' type of notation that you would normally get with a 2D array defined at compile time. For example, the following code creates a 2D array of integers (error checking left out to make it more readable):
int **x;
int i, j;
int count;
int rows, cols;
rows = /* read a value from user or file */
cols = /* read a value from user of file */
x = calloc(sizeof(int *), rows);
for (i = 0; i < rows; i++)
x[i] = calloc(sizeof(int), cols);
/* Initial the 2D array */
count = 0;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
count++;
x[i][j] = count;
}
}
One thing that you need to remember here is that because we are using an array of arrays, we cannot always guarantee that each of the arrays is going to be in the next block of memory, especially if any garbage collection has been going on in the meantime (like might happen if your code was multithreaded). Even without that though, the memory is not going to be contiguous from one array to the next array (although the elements within each array will be). There is overhead associated with the memory allocation and that shows up if you look at the address of the 2D array and the 1D arrays that make up the rows. You can see this by printing out the address of the 2D array and each of the 1D arrays like this:
printf("Main Array: 0x%08X\n", x);
for (i = 0; i < rows; i++)
printf(" 0x08X [%04d], x[i], (int) x[i] - (int) x);
When I tested this with a 2D array with 4 columns, I found that each row took up 24 bytes even though it only needs 16 bytes for the 4 integers in the columns.

Equivalence between Subscript Notation and Pointer Dereferencing

It is more than one questions. I need to deal with an NxN matrix A of integers in C. How can I allocate the memory in the heap? Is this correct?
int **A=malloc(N*sizeof(int*));
for(int i=0;i<N;i++) *(A+i)= malloc(N*sizeof(int));
I am not absolutely sure if the second line of the above code should be there to initiate the memory.
Next, suppose I want to access the element A[i, j] where i and j are the row and column indices starting from zero. It it possible to do it via dereferencing the pointer **A somehow? For example, something like (A+ni+j)? I know I have some conceptual gap here and some help will be appreciated.

not absolutely sure if the second line of the above code should be there to initiate the memory.
It needs to be there, as it actually allocates the space for the N rows carrying the N ints each you needs.
The 1st allocation only allocates the row-indexing pointers.
to access the element A[i, j] where i and j are the row and column indices starting from zero. It it possible to do it via dereferencing the pointer **
Sure, just do
A[1][1]
to access the element the 2nd element of the 2nd row.
This is identical to
*(*(A + 1) + 1)
Unrelated to you question:
Although the code you show is correct, a more robust way to code this would be:
int ** A = malloc(N * sizeof *A);
for (size_t i = 0; i < N; i++)
{
A[i] = malloc(N * sizeof *A[i]);
}
size_t is the type of choice for indexing, as it guaranteed to be large enough to hold any index value possible for the system the code is compiled for.
Also you want to add error checking to the two calls of malloc(), as it might return NULL in case of failure to allocate the amount of memory requested.

The declaration is correct, but the matrix won't occupy continuous memory space. It is array of pointers, where each pointer can point to whatever location, that was returned by malloc. For that reason addressing like (A+ni+j) does not make sense.
Assuming that compiler has support for VLA (which became optional in C11), the idiomatic way to define continuous matrix would be:
int (*matrixA)[N] = malloc(N * sizeof *matrixA);
In general, the syntax of matrix with N rows and M columns is as follows:
int (*matrix)[M] = malloc(N * sizeof *matrixA);
Notice that both M and N does not have to be given as constant expressions (thanks to VLA pointers). That is, they can be ordinary (e.g. automatic) variables.
Then, to access elements, you can use ordinary indice syntax like:
matrixA[0][0] = 100;
Finally, to relase memory for such matrices use single free, e.g.:
free(matrixA);
free(matrix);

You need to understand that 2D and higher arrays do not work well in C 89. Beginner books usually introduce 2D arrays in a very early chapter, just after 1D arrays, which leads people to assume that the natural way to represent 2-dimensional data is via a 2D array. In fact they have many tricky characteristics and should be considered an advanced feature.
If you don't know array dimensions at compile time, or if the array is large, it's almost always easier to allocate a 1D array and access via the logic
array[y*width+x];
so in your case, just call
int *A;
A = malloc(N * N * sizeof(int))
A[3*N+2] = 123; // set element A[3][2] to 123, but you can't use this syntax
It's important to note that the suggestion to use a flat array is just a suggestion, not everyone will agree with it, and 2D array handling is better in later versions of C. However I think you'll find that this method works best.

2-Dimensional array in memory in ANSI C

I've read many people here and in other websites saying that if a declare something like this:
double a[5][2];
it will be allocated in memory like a contiguous block like:
a[0][0] | a[0][1] | a[1][0] | a[1][1] | ....etc
But is this always a rule?
I would like to create a function to multiply matrices of variable sizes but in pure C I won't be able to pass matrices by parameters without knowing at least one dimension. So I've made this:
void MatMult(double* m1, double* m2, double* res, int h, int w, int l)
{
int i, j, k;
for (i = 0; i < h; i++)
{
for (j = 0; j < w; j++)
{
double p_res = 0;
for (k = 0; k < l; k++)
{
p_res += (*(m1+i*l+k))*(*(m2+k*w+j));
}
*(res+i*w+j)=p_res;
}
}
}
with call:
double m1[2][3], m2[3][1], m3[2][1];
...
MatMult(&(m1[0][0]),&(m2[0][0]),&(m3[0][0]),2,1,3);
And it worked. But will this always work or there are exceptions that I should be aware of like memory aligment or something like this?

To pass 2D arrays to functions you'd have to change your interface
void MatMult(size_t h, size_t w, size_t l. double m1[h][w], double m2[w][l], double res[h][l]);
or similar:
have the sizes first
then use them to declare the dimensions
Also, I have used size_t, here, since this is the correct type for all index calculations.
This should work for all compilers that implement C99. (Basically all do but Microsoft.)

Yes it will always work. Arrays declared like you did ara guaranteed to be "contiguous" which means that their items will be tightly packed up. So, if you declare double[55] you know that always the [50]-th element will come right after [49]-th element with no perceivable gaps.
I think, but I'm not perfectly sure, that for some very uncommon data types (like including unbalanced bitfields, etc), the "alignment" can still kick in and offset something. I'm not sure. But even if (see Jens comment) If the compiler adds some alignment offsets, it will do so either inside a single data element, or at the boudary between elements, and in both cases the compiler will know about it. So, it should apply all required corrections at every [], ->, . operation, as long as it still has all the required type information (array-of-doubles). If you erase the type information and start accessing the array by "untyped" (or wrong-typed) pointers, for example:
double array[50];
char* p = (char*)array;
int size = sizeof(double);
for(i=0;i<50;++i)
.. *(double*)(p+size) ..
then of course the compiler will not have the type information and will be unable to apply proper alignment offsets. But if you do things as above, you probably know the risks already.
Next thing is, that there is no such thing as two-dimensional array. Neither in C nor in C++.
Array defined as double[5][2] is an array(5) of array(2) of double. CMIIW, I could swap them. Anyways, the point is that 'double' is a datatype and is an element of an higher-level 1D-array. Then, double[2] is a datatype and an element of an higher-level 1D-array, and so on.
Now, remember the 'sequential+contiguous' layout of arrays:
double -> DD
double[2] -> [DD | DD]
double[5][2] -> [ {DD:DD} | {DD:DD} | {DD:DD} | {DD:DD} | {DD:DD} ]
Since array has to be sequential and contiguous, the double[2] must layout its element as above - obvious.
However, since double[5][2] is an array, and its an array of [5] and since its elements are double[2] - it must layout its elements just in the same way. First whole element first, then second whole element, and so on.
Just like double[2] can't "split" its doubles into scattered 1-byte chunks, the double[5][2] can't split its array[2].

By using double a[5][2], you are creating two dimensional array in stack which will always be a contiguous block of memory. On the other hand, if you try to create 2-D array on heap (i.e. using malloc), it is not guaranteed that you will get contiguous block of memory hence you might not be able traverse your allocated memory in a linear way.

Matrix representation in C

I want to find out what is the best representation of a m x n real matrix in C programming language.
What are advantages of matrix representation as a single pointer:
double* A;
With this representation you could allocate memory:
A = (double* )malloc(m * n * sizeof(double));
In such representation matrix access requires an extra multiplication:
aij = A[i * m + j];
What are disadvantages of matrix representation as a double pointer:
double** B;
Memory allocation requires a loop:
double** B = (double **) malloc(m * sizeof(double*));
for (i = 0; i < m; i++)
A[i] = (double *) malloc(n * sizeof(double))
In such representation you could use intuitive double indexing `bij = B[i][j], but is there some drawback that would affect performance. I would want to know what is the best presentation in terms of performance.
These matrices should be used in numerical algorithms such as singular value decomposition. I need to define a function:
void svd(Matrix A, Matrix U, Matrix Sigma, Matrix V);
and I am looking for the best way to represent Matrix. If there is any other efficient way to represent a matrix in C, please, let me know.
I have seen that most people use single pointer representation. I would like to know if there are some performance benefits as opposed to double array representation?

Look at the memory accesses required.
For the single-pointer case, you have:
read a pointer (the base address), probably from a register
read the four integers, probably from registers or hard-coded into instruction set. For array[i*m+j], the 4 values are i, m, j and sizeof(array[0]).
multiply and add
access the memory address
For the double-pointer case, you have:
read a pointer (the base address), probably from a register
read an index, probably from a register
multiply the index by the size of a pointer and add.
fetch the base address from memory (unlikely to be a register, might be in cache with luck).
read another index, probably from a register
multiply by the size of the object and add
access the memory address
The fact that you have to access two memory locations probably makes the double-pointer solution quite a bit slower than the single-pointer solution. Clearly, caching will be critical; that's one reason why it is important to access arrays so that the accesses are cache-friendly (so you access adjacent memory locations as often as possible).
You can nit-pick about details in my outline, and some 'multiplication' operations may be shift operations, etc, but the general concept remains: the double-pointer requires two memory accesses versus one for the single-pointer solution, and that will be slower.

Here are a couple of articles about row major format.
http://en.wikipedia.org/wiki/Row-major_order
http://fgiesen.wordpress.com/2011/05/04/row-major-vs-column-major-and-gl-es/
These are common constructs in CUDA programming; hence my interest.

How does the c compiler know how to place different types of values in allocated memory? (using malloc)

In this piece of code, I noticed themalloc() frees up memory of size of (double *) and double. So when the rest of the code is storing values in DataArray, how does the compiler know where to store those values in the memory?
int rows=100;
int columns=100;
double **DataArray, *DataRow;
DataArray = (double **)malloc(rows *sizeof(double *)+ rows * columns *sizeof(double));
for (i = 0, DataRow = (double *)(DataArray+rows); i < rows; i++, DataRow += columns)
DataArray[i]=DataRow;
Thank You!

The compiler does not know how to organize the elements of a two-dimensional array in this way.
That memory is only one-dimensional as the compiler is concerned.
The code you show is explicitly compensating for that by allocating the space for a pointer to each row and enough space for all the rows and columns.
The for loop you posted is initializing all the row pointers, allowing later code to do a two-dimensional lookup by indexing, one-dimensionally, into the row pointers and then indexing again into the desired column.

When the array operations ([]) are used, they generate the needed offsets to access/index the data pointed to by DataArray automatically.
Ex:
DataArray[0] is located at (char*)DataArray+(sizeof(double*) * 0 )bytes
DataArray[1] is located at (char*)DataArray+(sizeof(double*) * 1 )bytes
DataArray[2] is located at (char*)DataArray+(sizeof(double*) * 2 )bytes
Note, the previous pseudocode should not be confused with array arithmetic operations:
DataArray+0
is located at DataArray, but
DataArray+1
is located at DataArray+sizeof(double*)

malloc takes an argument of type size_t (basically an integer type matching the 32 or 64 bits architecture you're compiling for). It doesn't "know" what your are allocating, in fact the expression row * sizeof(double *)+ rows * columns * sizeof(double) is here to compute the space you'll need to store rows array of pointer then the arrays.
Note that to store a matrix, you don't need such a
complicated and error-prone system. Remove the first part of DataArray containing pointers and access the row, col element using DataArray[row*columns + col]. The integer multiplication is not that a big deal compared to the savings in memory (especially on 64 bits systems). But then again you decide how you balance between CPU and memory.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Understanding efficient contiguous memory allocation for a 2D array - c

Related

Having a little trouble understanding memory allocation in C

Equivalence between Subscript Notation and Pointer Dereferencing

2-Dimensional array in memory in ANSI C

Matrix representation in C

How does the c compiler know how to place different types of values in allocated memory? (using malloc)

Categories

Resources