Is using realloc() on a dynamically allocated 2D array a good idea? - c

I am mainly interested in the viability of shrinking such an array.
I'm working on a project where I have used single malloc() calls to each create individual moderately large 2D arrays. (Each only few tens of MiB, at the largest.) The thing is that over the life of one of the arrays, its content dramatically shrinks in size (by more than half). Obviously, I could just leave the array size alone for the life of the program. (It's only a x MiB on a system with GiB of RAM available.) But, we are talking about more than half of the allocated space falling into disuse well before the program terminates, and, due to the nature of how I am using the array, all the surviving data is kept in a contiguous set of rows (at the beginning of the block). It seems like a waste to hold on to all that RAM if I really don't need it.
While I know realloc() can be used to shrink dynamically created arrays, a 2D array is more complex. I think I understand the memory layout of it (as I implemented the function that structures it), but this is pushing the limits of my understanding of the language and the workings of its compilers. Obviously, I would have to work with rows (and deal with the row pointers), not merely bytes, but I don't know how predictable the outcome of all this would be.
And, yes, I need to create the array with a single malloc(). The object in question has several million rows. I tried using a loop to malloc() each row separately, but the program always froze at around 100,000 malloc()s.
For background, the source I'm using to construct these array is as follows:
char ** alloc_2d_arr(int cnum, int rnum) {
/* ((bytes for row pointers + (bytes for data)) */
char **mtx = malloc(rnum * sizeof (char *) + rnum * cnum * sizeof (char));
/* Initialize each row pointer to the first cell of its row */
char *p = (char *) (mtx + rnum);
for (int i = 0; i < rnum; i++) {
mtx[i] = p + i * cnum;
}
return mtx;
}

Using multidimensional arrays, this can be done with or without pointers to variable length arrays. Since you probably don't want to allocate any additional memory, this will be done in place.
First allocate a 20 by 10 array:
int ( *array )[10] = malloc( sizeof(int ) * 20 * 10 );
for( size_t i = 0 ; i < 20 ; i++ )
for( size_t j = 0 ; j < 10 ; j++ )
array[i][j] = i * 100 + j;
If you want to change the number of rows, no elements have to be moved, only
a realloc is needed. Changing the row count to 15 is trivial:
array = realloc( array , sizeof( int ) * 15 * 10 );
If you want to change the column count, then the elements will have to be moved. Since we don't need to copy the first column, the copying starts at the second one. Function memmove is used to avoid memory overlap, which cannot happen in this case, but it could if the new column count were larger. Also it avoids aliasing problems. Note that this code is defined only because we are using allocated memory. Let's change the column count to 3:
int (*newarray)[3] = ( int(*)[3] )array;
for( size_t j = 1 ; j < 15 ; j++ )
memmove( newarray[j] , array[j] , sizeof( int ) * 3 );
newarray = realloc( array , sizeof( int ) * 15 * 3 );
Working example: https://ideone.com/JMdJO0
If the new column count happens to be larger than the old one, then the memory will have to be reallocated first (to simply get more space), and then the column copying will take place, instead starting at the last column.

Related

Possible to allocate 2D array in a single malloc and still get to use [ ][ ] syntax? [duplicate]

This question already has answers here:
convert array to two dimensional array by pointer
(5 answers)
Closed 5 years ago.
Normally, when I create a dynamic 2D array I will first malloc the row pointers, then loop through rows and malloc each of rows. So, for example:
array = malloc( row_count * sizeof( int* ) );
for( int x = 0; x < row_count; x++ ){
array[x] = malloc( column_count * sizeof( int ) );
}
Once this is done I can use a syntax like:
data[3][5] = 52;
to set or get the values. The problem with this is that many mallocs are done which is both cpu-intensive and results in many fragments of memory, all of which have to be deallocated individually. The alternative is to allocate the array as a single block of memory. However, if I do this, I can no longer use the [ ][ ] syntax to refer to elements in the array and instead I have to do something like this:
data[ row_index * column_size + column_index ] = 52;
manually calculating the correct offset into the contiguous block. Is there a way to allocate a single block of memory for an array, yet still use the [ ][ ] syntax?
If variable length array types are available (since C99; made optional in C11, but widely available), you can use something like:
size_t rows, cols;
/* calculate rows and cols based on user input, etc. */
int (*data)[cols] = malloc(sizeof *data * rows);
to allocate contiguous space that can be indexed as a 2d array.
If the number of rows is unknown, but the number of columns is known at compile time, you can use, even without VLAs:
#define COLS 10 /* or some known compile-time value */
size_t rows;
/* calculate rows based on user input, etc. */
int (*data)[COLS] = malloc(sizeof *data * rows);

How can i use dynamic 2d array in c

I tried to make a dynamic 5x5 int array
int **data=malloc(5*5);
But I get segmentation fault on trying to access it.
You need to allocate memory for the 2d-array you want to make (which I think you understand). But first, you will have to allocate the space for pointers where you will store the rows of the 2D-array.
int **data=(int**)malloc(sizeof(*data)*5); //Here 5 is the number of rows
Now you can allocate space for each row.
for(int r=0;r<5;r++){
data[r]=(int*)malloc(sizeof(**data)*5);//here 5 is the width of the array
}
If you want contiguous block of memory for the whole array, you can allocate a single dimension array of size 25, and access it like data[r*5+c].
PS: Instead of sizeof(*data) and sizeof(**data), you can use sizeof(int*) and sizeof(int) to avoid confusion with *
PS: If you are not using C++, removing the casts from return value of malloc is better (see comments).
If you want a single contiguous memory block to hold 5x5=25 integers :
int *data = malloc(5*5*sizeof(*data));
If you want a 2d array with size 5x5
int **data = malloc(5*sizeof(*data));
for (int i=0; i<5; ++i)
data[i] = malloc(5*sizeof(**data));
There are two possibilities. The first one is indeed to allocate a two-dimensional array:
int ( *data )[5] = malloc( 5 * 5 * sizeof( int ) );
In this case one contiguous extent is allocated for the array.
The second one is to allocate at first a one-dimensional array of pointers and then allocate one-dimensional arrays pointed to by the already allocated pointers.
For example
int **data = malloc( 5 * sizeof( int * ) );
for ( size_t i = 0; i < 5; i++ )
{
data[i] = malloc( 5 * sizeof( int ) );
}
In this case there are allocated in fact 6 extents of memory: one for the array of the pointers and other 5 for arrays of integers.
To free the allocated memory in the first example it is enough to write
free( data );
and in the second example you need to write the following
for ( size_t i = 0; i < 5; i++ ) free( data[i] );
free( data );
If you want to treat the array as a 2D array (a[i][j]) and you want all the array elements to be contiguous in memory, do the following:
int (*data)[5] = malloc( sizeof *data * 5 );
If you also want to be table to determine the size of the array at run time and your compiler supports variable-length arrays1:
size_t rows, cols;
...
int (*data)[rows] = malloc( sizeof *data * cols );2
If your compiler does not support VLAs and you still want to determine the array size at runtime, you would do:
size_t rows, cols;
...
int **data = malloc( sizeof *data * rows );
if ( data )
{
for ( size_t i = 0; i < rows; i++ )
{
data[i] = malloc( sizeof *data[i] * cols );
}
}
The downside of this approach is that the rows of the array are not guaranteed to be contiguous in memory (they most likely won't be). Elements within a single row will be contiguous, but rows will not be contiguous with each other.
If you want to determine the array size at runtime and have all the array elements be contiguous in memory but your compiler does not support variable-length arrays, you would need to allocate a 1D array and manually compute your indices (a[i * rows + j]):
int *data = malloc( sizeof *data * rows * cols );
1. VLAs were introduced with C99, but then made optional in C2011. A post-C99 compiler that does not define the macro __STDC_NO_VLA__ should support VLAs.
2. Caution - there is some question whether sizeof *data is well-defined in this example; the sizeof expression is normally evaluated at compile time, but when the operand is a VLA the expression is evaluated at run time. data doesn't point to anything yet, and attempting to dereference an invalid pointer leads to undefined behavior. All I can say is that I've used this idiom a lot and never had an issue, but that may be due more to bad luck than design.
Here is the answer:
int ** squaredMatrix;
int szMatrix=10;
squaredMatrix= (int**)malloc(szMatrix*sizeof(int*));
for making 2d arrays you should view them as one array which every block is an array again .
for example in above picture , blue blocks make an array which each blue block is pointing to an array ( every 4 green blocks in a row are an array and blue blocks in a column are the main array)

Allocating memory for global multidimensional array [duplicate]

This question already has answers here:
How do I correctly set up, access, and free a multidimensional array in C?
(5 answers)
Closed 8 years ago.
I want to read in a file whose first line gives me the dimensions for the array. For example:
4 3
Then I want my program to assign rows = 4 and columns = 3. What is the proper way to allocate memory for this multidimensional array assuming that it holds integers?
Since the size of an integer is 4 bytes, am I right in assuming that it is:
int** multiArray;
... code to read in first line of file and assign value to rows and columns
multiArray = malloc(sizeof(int) * rows * columns));
Or in other words, is it correct to allocate 48 bytes of memory for my [4][3] integer array?
One way could be:
int * array = ( int * ) malloc ( rows * columns * sizeof ( int ) );
int ** multiarray = ( int ** ) malloc ( rows * sizeof ( int * ) );
for ( int i = 0; i < rows; i++ )
multiarray[i] = array + i * columns * sizeof ( int );
There's two main options:
Allocate each row separately
Allocate a single bloc containing all rows
If you want to sometimes extend individual rows then you have to use the first option. The second option forces you to have all rows the same length.
The only problem with the second option is that you cannot then use double-dereference syntax, since there is only a single dereference happening. You'd have to access in a different way, for example:
// global
int *multiArray;
size_t multiArray_num_columns;
#define MARRAY(row, col) my_multiarray[(row) * multiArray_num_columns + (col)]
// in a function
multiArray_num_columns = columns;
multiArray = malloc(rows * columns * sizeof *multiArray);
MARRAY(3, 2) = 20;
There is a third option (which I don't like but some people do):
Allocate a single bloc, and put a table of pointers at the start of it
This gains you the double-dereference syntax but has no other benefits; its downside is that it's a lot more coding, and it may incur a runtime penalty (two dereferences can cost more than one dereference).
Update: here is the code for allocating each row separately (this gets asked a lot but I couldn't find a good duplicate!)
// global
int **multiArray;
// in function
multiArray = malloc( rows * sizeof *multiArray );
for (size_t row = 0; row < rows; ++row)
multiArray[row] = malloc( columns * sizeof **multiArray );
You should check all of these results against NULL and exit if they failed.
Using the pattern ptr = malloc( NUM * sizeof *ptr ); guarantees to allocate the right amount of memory even if you later change the type of ptr.

difference b/w allocating memory to 2D array in 1 go or row by row

what would be the impact in accessing of array or in memory allocation for these two cases:
1.
int **arr;
arr = malloc( sizeof(int) * row * column );
2.
int **arr;
arr = malloc( sizeof(*arr) * row);
for(i=0; i<row; i++)
arr[i] = malloc( sizeof( **arr) * column));
Firstly, the "impact" is that your first method is broken. It will not work through an int ** pointer.
In order to allocate a 2D array in one shot as you are trying to do it in your first method, you actually have to allocate a 1D array of sufficient size
int *arr = malloc( row * column * sizeof *arr );
// Note: `int *`, not `int **`
and perform access by manual index re-calculation, e.g. instead of doing arr[i][j] you have to do arr[i * column + j].
Trying to store the allocated pointer in int **arr and then access your array as arr[i][j] will simply lead to crashes.
Secondly, your second method is OK. It is just that in the second method your are not really required to allocate the second-level memory by multiple independent malloc calls. You can allocate the whole second-level memory in one shot
int **arr = malloc( row * sizeof *arr );
int *arr_data = malloc( row * column * sizeof *arr_data );
and then just distribute that pre-allocated second-level memory between the rows
for (i = 0; i < row; i++)
arr[i] = arr_data + i * column;
(Of course, you can allocate the rows independently, if you wish so. It will also work. The reason I wanted to allocate them in one shot is to better illustrate the similarity between the first and the second approach, as commented below.)
Now, by looking at these two methods you can easily see that both of them essentially do the same thing. The only difference is that in the first method you find the beginning of the row on-the-fly by calculating arr + i * column every time (note that arr[i * column + j] is equivalent to (arr + i * column)[j]). In the second method you pre-calculate all row beginnings in advance by using the same arr_data + i * column formula, and store them for further usage in a separate "row index" array arr.
So, it basically boils down to trade-off between memory usage (first method requires less memory) and speed (the second method is potentially, but not necessarily, faster). At the same time the second method supports the "natural" syntax for 2D array access - arr[i][j], while in the first method you have to use more convoluted 1D access syntax with index recalculation.

realloc fails (in C) for a pointer to an array

I'm trying to dynamically allocate memory for (what is essentially) a 2-dimensional array of chars - i.e - an array of strings.
My code is as follows:
typedef char LineType[MAX_CHARS+1];
LineType* lines;
int c = 0;
int N = 2;
lines = (LineType *) malloc (N * sizeof( LineType) );
do {
if (c > N ) {
N *=2;
lines = (LineType*) realloc (lines, N * sizeof( LineType));
}
.
.
.
c++;
} while ( . . . )
This compiles fine but fails at runtime, giving a warning about possible HEAP CORRUPTION and breaking at dbgheap.c (in : _CrtIsValidHeapPointer)
What am I doing wrong? I figured it's probably due to the mix of a fixed/dynamic dimensions in the data structure... But what is then the best way to declare and then dynamically allocate (and reallocate) memory for an array (of varying size) of strings (each of which is of a fixed size)?
Thanks a lot in advance
UPDATE 26/8/2012
I changed the code a bit to adjust it to people's comments and suggestions. The problem still persists...
Assuming c is used to index into lines, you need to test for c >= N, not c > N.
As an aside, I suggest using a typedef to make your code more readable. I would also avoid the redundant allocation code:
typedef char fixed_string[MAX_CHARS + 1];
int c = 0;
int N = 0;
fixed_string *lines = NULL;
do {
if (c >= N ) {
N = N ? 2*N : 2;
lines = (fixed_string*) realloc (lines, N * sizeof(fixed_string));
}
⋮
c++;
} while (…);
As a further aside, be careful when using a growth factor of 2. It leaves behind holes that can never be reused by the same array. A factor of 1.5 (3*N/2) is safer.
EDIT: I note from other comments that you experience the crash at the point of reallocation. This is consistent with writing past the end of the array. A debug memory allocator will fill the space immediately surrounding an allocated block of memory with special bytes and check that those bytes are preserved the next time it does something with that block of memory. The HEAP CORRUPTION message signals that you have corrupted those surrounding bytes by writing outside the memory you were given.
Make things more readable:
Instead of
char (*lines) [MAX_CHARS +1];
do a
typedef char LineType[MAX_CHARS+1];
LineType* lines;
In a similar fashion,
lines = (char (*) [MAX_CHARS +1]) calloc (N, sizeof( char (*) [MAX_CHARS +1]));
...
lines = (char (*) [MAX_CHARS + 1]) realloc (lines, N * sizeof( char (*) [MAX_CHARS +1]));
turns into
lines = malloc(N*sizeof(LineType));
...
lines = realloc(lines, N * sizeof(LineType));
Note: I replaced the calloc with malloc, simply because I never use calloc, so I'm not sure whether it tries to play alignment tricks.
Either way, a small typedef can improve readability a lot. Readable code is easier to get straight.
This is hugely wrong
lines = (LineType *) malloc (N, sizeof( LineType *) );
as it allocates space for just a single pointer sizeof(LineType*) instead of a number of strings.
(It also happens to use the odd "comma operator" that allows you to have two expressions where only one is expected. It evaluates and discards what is on the left side, and keeps what is on its right side. Not very good here!)
A better allocation would be
lines = malloc(N * sizeof(LineType));
where we allocate space for N objects of type LineType.
There is a similar problem with the realloc, where you allocate space for pointers instead of whole objects.

Resources