I have a variable N. I need a 6xNxN array.
Something like this:
int arr[6][N][N];
But, obviously, that doesn't work.
I'm not sure how I'd go about allocating this so that I can access, e.g. arr[5][4][4] if N is 5, and arr[5][23][23] if N is 24.
Note that N will never change, so I'll never have to reallocate arr.
What should I do? Will int ***arr = malloc(6 * N * N * sizeof(int)); work?
You can allocate your 3-dimensional array on the heap as
int (*arr)[N][N] = malloc(sizeof(int[6][N][N]));
After use, you can free as
free(arr);
Another way of writing the same as suggested by #StoryTeller is -
int (*arr)[N][N] = malloc(6u * sizeof(*arr));
But here you need to be careful about the u after 6 to prevent signed arithmetic overflow.
Also, there can still be issues on platforms where size_t is smaller in width that int as suggested by #chqrlie, but that won't be the case on "most" commonly used platforms and hence you are fine using it.
int arr[6][N][N]; will work just fine. You merely need to update your compiler and C knowledge to the year 1999 or later, when variable-length arrays (VLA) were introduced to the language.
(If you have an older version of GCC than 5.0, you must explicitly tell it to not use an ancient version of the C standard, by passing -std=c99 or -std=c11.)
Alternatively if you need heap allocation, you can do:
int (*arrptr)[Y][Z] = malloc( sizeof(int[X][Y][Z]) );
You cannot do int ***arr = malloc(6 * N * N * sizeof(int)); since a int*** cannot point at a 3D array. In general, more than two levels of indirection is a certain sign that your program design is completely flawed.
Detailed info here: Correctly allocating multi-dimensional arrays.
What you want can't work directly. For indexing a multi-dimensional array, all but the very first dimension need to be part of the type and here's why:
The indexing operator operates on pointers by first adding an index to the pointer and then dereferencing it. The identifier of an array evaluates to a pointer to its first element (except when e.g. used with sizeof, _Alignof and &), so indexing on arrays works as you would expect.
It's very simple in the case of a single-dimension array. With
int a[42];
a evaluates to a pointer of type int * and indexing works the following way: a[18] => *(a + 18).
Now in a 2-dimensional array, all the elements are stored contiguously ("row" after "row" if you want to understand it as a matrix), and what's making the indexing "magic" work is the types involved. Take for example:
int a[16][42];
Here, the elements of a have the type int ()[42] (42-element array of int). According to the rules above, evaluating an expression of this type in most contexts again yields an int * pointer. But what about a itself? Well, it's an array of int ()[42] so a will evaluate to a pointer to 42-element array of int: int (*)[42]. Then let's have a look at what the indexing operator does:
a[3][18] => *(*(a + 3) + 18)
With a evaluating to the address of a with type int (*)[42], this inner addition of 3 can properly add 42 * sizeof(int). This would be impossible if the second dimension wasn't known in the type.
I guess it's simple to deduce the example for the n-dimensional case.
In your case, you have two possibilities to achieve something similar to what you want.
Use a dynamically allocated flat array with size 6*N*N. You can calculate the indices yourself if you save N somewhere.
Somewhat less efficient, but yielding better readable code, you could use an array of pointers to arrays of pointers to int (multiple indirection). You could e.g. do
int ***a = malloc(6 * sizeof *int);
for (size_t i = 0; i < 6; ++i)
{
a[i] = malloc(N * sizeof *(a[i]));
for (size_t j = 0; j < N ++j)
{
a[i][j] = malloc(N* sizeof *(a[i][j]));
}
}
// add error checking to malloc calls!
Then your accesses will look just like those to a normal 3d array, but it's stored internally as many arrays with pointers to the other arrays instead of in a big contiguous block.
I don't think it's worth using this many indirections, just to avoid writing e.g. a[2*N*N+5*N+4] to access the element at 2,5,4, so my recommendation would be the first method.
Making a simple change to the declaration on this line and keeping the malloc can easily solve your problem.
int ***arr = malloc(6 * N * N * sizeof(int));
However, int *** is unnecessary (and wrong). Use a flat array, which is easy to allocate:
int *flatarr = malloc(6 * N * N * sizeof(int));
This works for three dimensions, and instead of accessing arr[X][Y][Z] as in the question, you access flatarr[(X*N*N) + (Y*N) + Z]. In fact, you could even write a handy macro:
#define arr(X,Y,Z) flatarr[((X)*N*N) + ((Y)*N) + (Z)]
This is basically what I've done in my language Cubically to allow for multiple-size cubes. Thanks to Programming Puzzles & Code Golf user Dennis for giving me this idea.
Related
So, I am learning C right now, and wanted some clarification on some things.
I've learned that if we wanted to create a dynamic arrays we could use the following line of code:
int *arr = malloc(10 * sizeof(int));
I understand that, in this case, arr is a pointer being allocated the equivalent of an array of 10 ints in terms of bytes. I also understand that you can treat arr as an array (from arr[0] to arr[9].
Does that mean all pointers that are allocated memory can be treated as an array?
Like could this be treated as an array?
int *single = malloc(sizeof(int));
Or could this be treated as an array?
int *half = malloc(sizeof(int) * 1.5)
Ignoring the array size, yes all pointers can be used arrays (meaning you can index them).
The number of elements should be an integer, with truncation for valid access (i.e., 1.5 means 1 item).
You request number of bytes from malloc, it makes sense that this is a multiple of the item size.
You should read about pointer arithmetic.
Array names can also be used as pointers (e.g., *array) but you can't assign to them or modify them (e.g., ++array).
Like could this be treated as an array?
int *single = malloc(sizeof(int));
Sure, it can be treated as an array int single[1]
Or could this be treated as an array?
int *half = malloc(sizeof(int) * 1.5)
Yes, but it will have the same effect as previous snippet but you will just waste 2 additional bytes. If you try to write in half[2], you can corrupt some memory.
I want to create 5*5 2D Matrix. I usually use the following way of memory allocation:
int **M = malloc(5 * sizeof(int *));
for (i = 0; i < 5; i++)
{
M[i] = malloc(5 * sizeof(int));
}
While I was reading a blog, I found also another way to do that:
int **M = malloc(5 * sizeof(int*));
M[0] = malloc((5*5) * sizeof(int));
My question is: What is the difference between both methods? Which one in more efficient?
For the second code, note that you need to initialize the other array members for it to work correctly:
for (int i = 1; i < 5; i++) {
M[i] = M[0] + i * 5;
}
So in the second code the arrays members (through all arrays) are contiguous. It does not make any difference to access them (e.g., you an still access them using M[i][j] syntax). It has the advantage over the first code to require only two malloc calls and as mentioned in the comments to favor caching which can greatly improve the access performances.
But if you plan to dynamically allocate large arrays, it is better to use the first method because of memory fragmentation (large contiguous memory allocation can be not available or can exacerbate memory fragmentation).
A similar example of this kind of dynamic allocation of arrays of arrays can be found in the c-faq: http://c-faq.com/aryptr/dynmuldimary.html
After seeing ouah's answer and seeing the example in the C FAQ, I now understand where the second technique comes from, although I personally wouldn't use it where I could help it.
The main problem with the first approach you show is that the rows in the array are not guaranteed to be adjacent in memory; IOW, the object immediately following M[0][4] is not necessarily M[1][0]. If two rows are allocated from different pages, that could degrade runtime performance.
The second approach guarantees that all the rows will be allocated contiguously, but you have to manually assign M[1] through M[4] to get the normal M[i][j] subscripting to work, as in
for ( size_t i = 0; i < 5; i++ )
M[i] = M[i-1] + 5;
IMO it's a clumsy approach compared to the following:
int (*M)[5] = malloc( sizeof *M * 5 );
This also guarantees that the memory is allocated contiguously, and the M[i][j] subscripting works without any further effort.
However, there is a drawback; on compilers that don't support variable-length arrays, the array size must be known at compile time. Unless your compiler supports VLAs, you can't do something like
size_t cols;
...
int (*M)[cols] = malloc( sizeof *M * rows );
In that case, the M[0] = malloc( rows * cols * sizeof *M[0]) followed by manually assigning M[1] through M[rows - 1] would be a reasonable substitute.
I hope I'm not missing something here but here's my attempt to answer the question "What is the difference...". If I am completely off base, forgive me and I will correct my answer but here goes:
I tried drawing out what is happening in your two mallocs so what I have to say is tied to the picture included which I drew by hand (hand crafted answers?)
First option:
For the first option, you allocate a memory block the size of 5 int*s. M, which is an int** points to the start of that memory block.
Then, you go over each of the memory blocks (the size of int*) and in each block you put in the address of a memory block the size of 5 ints. Note that these are located in some random portion of your memory (the heap) that has enough space to take the size of 5 ints.
This is the key - it's a noncontiguous block of memory. So if you think about memory as an array, you are pointing at different start locations in the array.
Second Option
Your second does the allocation of int** exactly the same. But instead, it allocates the size of 25 ints and returns places the address of that array in the memory block M[0]. Note: you've never placed any address in the memory locations M[1] - M[4].
So, what happens? You have a contiguous block of 25 ints with an address that can be found in M[0]. What happens when you try getting M[1]? You guessed it - it's empty or contains junk values. Even more, it's a value that does not point to an allocated memory space so you Segfault.
If you want to allocate a 5x5 array in contiguous memory, the correct approach would be
int rows = 5;
int cols = 5;
int (*M)[cols] = malloc(rows * sizeof(*M));
You can then access the array with normal array indexing, e.g.
M[3][2] = 6;
int **M = malloc(5 * sizeof(int *)); refers to allocating memory for a pointer M[i] = malloc(5 * sizeof(int)); refers to allocating memory for a variable of int.
Maybe this will help you understand what is going on:
int **M = malloc(5 * sizeof(void *));
/* size of 'void *' and size of 'int *' are the same */
for (i = 0; i < 5; i++)
{
M[i] = malloc(5 * sizeof(int));
}
Another little difference when using malloc((5*5) * sizeof(int));. Certainly a side issue to what OP is looking for, but still a concern.
Both of the below are the same as the order of the 2 operands still result in using size_t math for the product.
#define N 5
malloc(N * sizeof(int));
malloc(sizeof(int) * N);
Consider:
#define N some_large_value
malloc((N*N) * sizeof(int));
The type of the result of sizeof() is type size_t, an unsigned integer type, that is certainly has SIZE_MAX >= INT_MAX, possible far larger. so to avoid int overflow that does not overflow size_t math use
malloc(sizeof(int) * N * N);
I am coding a 3D array using triple pointers with malloc. I replaced *ptrdate in (a), *ptrdate[i], and *ptrdate[i] with *ptrdate in the code below since They are all basically pointers of type Date but access in different dimension. I got the same results both ways.
Question: what's the difference when used as the operand of sizeof?
typedef struct {
int day;
} Date;
int main(){
int i, j, k, count=0;
int row=3, col=4, dep=5;
Date ***ptrdate = malloc(row * sizeof *ptrdate); //(a)
for (i=0; i<row; i++) {
ptrdate[i] = malloc(col * sizeof *ptrdate[i]); //(b)
for (j=0; j<col; j++) {
ptrdate[i][j] = malloc(dep * sizeof *ptrdate[i][j]); //(c)
}
}
I am coding a 3D array using triple pointers with malloc.
First of all, there is no need for any array to be allocated using more than one call to malloc. In fact, it is incorrect to do so, as the word "array" is considered to denote a single block of contiguous memory, i.e. one allocation. I'll get to that later, but first, your question:
Question: what's the difference when used as the operand of sizeof?
The answer, though obvious, is often misunderstood. They're different pointer types, which coincidentally have the same size and representation on your system... but they might have different sizes and representations on other systems. It is important to keep that possibility in mind, so that you can be sure your code is as portable as possible.
Given size_t row=3, col=4, dep=5;, you can declare an array like so: Date array[row][col][dep];. I know you have no use for such a declaration in this question... Bear with me for a moment. If we printf("%zu\n", sizeof array);, it'll print row * col * dep * sizeof (Date). It knows the full size of the array, including all of the dimensions... and this is exactly how many bytes are required when allocating such an array.
printf("%zu\n", sizeof ptrDate); with ptrDate declared as in your code will produce something entirely different, though... It'll produce the size of a pointer (to pointer to pointer to Date, not to be confused with pointer to Date or pointer to pointer to Date) on your system. All of the size information, regarding the number of dimensions (e.g. the row * col * dep multiplication) is lost, because we haven't told our pointers to maintain that size information. We can still find sizeof (Date) by using sizeof *ptrDate, though, because we've told our code to keep that size information associated with the pointer type.
What if we could tell our pointers to maintain the other size information (the dimensions), though? What if we could write ptrDate = malloc(row * sizeof *ptrDate);, and have sizeof *ptrDate equal to col * dep * sizeof (Date)? This would simplify allocation, wouldn't it?
This brings us back to my introduction: There is a way to perform all of this allocation using one single malloc. It's a simple pattern to remember, but a difficult pattern to understand (and probably appropriate to ask another question about):
Date (*ptrDate)[col][dep] = malloc(row * sizeof *ptrDate);
Suffice to say, usage is still mostly the same. You can still use this like ptrDate[x][y][z]... There is one thing that doesn't seem quite right, though, and that is sizeof ptrDate still yields the size of a pointer (to array[col][dep] of Date) and sizeof *ptrDate doesn't contain the row dimension (hence the multiplication in the malloc above. I'll leave it as an exercise to you to work out whether a solution is necessary for that...
free(ptrDate); // Ooops! I must remember to free the memory I have allocated!
int *ptr is the declaration of pointer which stores the address of the integer variable and int **ptr is the declaration that stores the address of the pointer storing the integer variable.
int **arrayPtr;
arrayPtr = malloc(sizeof(int) * rows *cols + sizeof(int *) * rows);
In the above code, we are trying to allocate a 2D array in a single malloc call.
malloc takes a number of bytes and allocates memory for that many bytes,
but in the above case, how does malloc know that first it has to allocate a array of pointers, each of which pointer points to a one-dimensional array?
How does malloc work internally in this particular case?
2D arrays aren't the same as arrays of pointers to arrays.
int **arrayPtr doesn't define a 2D array. 2D arrays look like this:
int array[2][3]
And a pointer to the first element of this array would look like:
int (*array)[3]
which you can point to a block of memory:
int (*array)[3] = malloc(sizeof(int)*5*3);
Note how that's indexed:
array[x] would expand to *(array+x), so "x arrays of 3 ints forward".
array[x][y] would expand to *( *(array+x) + y), so "then y ints forward".
There's no immediate array of pointers involved here, only one contignous block of memory.
If you'd have an array of arrays (not the same as 2D array, often done using int** ptr and a series of per-row mallocs), it would go like:
ptr[x] would expand to *(array+x), so "x pointers forward"
ptr[x][y] would expand to *( *(array+x) + y) = "y ints forward".
Mind the difference. Both are indexed with [x][y], but they are represented in a different way in memory and the indexing happens in a different manner.
how does malloc know that first it has to allocate a array of pointers, each of which pointer points to a one-dimensional array?
It doesn't; malloc simply allocates the number of bytes you specify, it has no working knowledge of how those bytes are structured into an aggregate data type.
If you're trying to dynamically allocate a multidimensional array, you have several choices.
If you're using a C99 or C2011 compiler that supports variable length arrays, you could simply declare the array as
int rows;
int cols;
...
rows = ...;
cols = ...;
...
int array[rows][cols];
There are a number of issues with VLAs, though; they don't work for very large arrays, they can't be declared at file scope, etc.
A secondary approach is to do something like the following:
int rows;
int cols;
...
rows = ...;
cols = ...;
...
int (*arrayPtr)[cols] = malloc(sizeof *arrayPtr * rows);
In this case, arrayPtr is declared as a pointer to an array of int with cols elements, so we're allocating rows arrays of cols elements each. Note that you can access each element simply by writing arrayPtr[i][j]; the rules of pointer arithmetic work the same way as for a regular 2D array.
If you aren't working with a C compiler that supports VLAs, you'll have to take a different approach.
You can allocate everything as a single chunk, but you'll have to access it as a 1-d array, computing the offsets like so:
int *arrayPtr = malloc(sizeof *arrayPtr * rows * cols);
...
arrayPtr[i * rows + j] = ...;
Or you can allocate it in two steps:
int **arrayPtr = malloc(sizeof *arrayPtr * rows);
if (arrayPtr)
{
int i;
for (i = 0; i < rows; i++)
{
arrayPtr[i] = malloc(sizeof *arrayPtr[i] * cols);
if (arrayPtr[i])
{
int j;
for (j = 0; j < cols; j++)
{
arrayPtr[i][j] = some_initial_value();
}
}
}
}
malloc() does not know that it needs to allocate an array of pointers to arrays. It simply returns a chunk of memory of the requested size. You can certainly do the allocation this way, but you'll need to initialize the first "row" (or last, or even a column instead of a row - however you want to do it) that are to be used as pointers so that they point to the appropriate area within that chunk.
It would be better and more efficient to just do:
int *arrayPtr = malloc(sizeof(int)*rows*cols);
The downside to that is that you have to calculate the proper index on every use, but you could write a simple helper function to do that. You wouldn't have the "convenience" of using [] to reference an element, but you could have e.g. element(arrayPtr, x, y).
I would re-direct your attention rather to question of "what does [] operator do?".
If you plan to access elements in your array via [] operator, then you need to realize that it can only do off-setting based on element's size, unless some array geometry info is supplied.
malloc does not have provisions for dimension info, calloc - explicitly 1D.
On the other hand, declared arrays (arr[3][4]) explicitly specify the dimensions to the compiler.
So to access dynamically alloc'ed multi-D arrays in a fashion arr[i][j], you in fact allocate the series of 1D-arrays of the target dimension size. You will need to loop to do that.
malloc returns plain pointer to heap memory, no information about geometry or data-type. Thus [][] won't work, you'll need to the offsetting manually.
So it's your call whether []-indexing is your priority, or the bulk allocation.
int **arrayPtr; does not point to a 2D array. It points to an array of pointers to int. If you want to create a 2D array, use:
int (*arrayPtr)[cols] = calloc(rows, sizeof *arrayPtr);
I have a question about how C / C++ internally stores multidimensional arrays declared using the notation foo[m][n]. I am not questioning pure pointers to pointers etc... I am asking because of speed reasons...
Correct me if I am wrong, but syntactically foo is an array of pointers, which themselves point to an array
int foo[5][4]
*(foo + i) // returns a memory address
*( *(foo + i) + j) // returns an int
I have heard from many places that the C/C++ compiler converts foo[m][n] to a one dimensional array behind the scenes (calculating the required one dimension index with i * width + j). However if this was true then the following would hold
*(foo + 1) // should return element foo[0][1]
Thus my question:
Is it true that foo[m][n] is (always?) stored in memory as a flat one dimensional array?? If so, why does the above code work as shown.
A two-dimensional array:
int foo[5][4];
is nothing more or less than an array of arrays:
typedef int row[4]; /* type "row" is an array of 4 ints */
row foo[5]; /* the object "foo" is an array of 5 rows */
There are no pointer objects here, either explicit or implicit.
Arrays are not pointers. Pointers are not arrays.
What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn't apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array's first element), but there is no pointer object stored in memory.
The array object foo is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguous int elements; the whole thing is therefore stored as 20 contiguous int objects.
The indexing operator is defined in terms of pointer arithmetic; x[y] is equivalent to *(x + y). Typically the left operand is going to be either a pointer expression or an array expression; if it's an array expression, the array is implicitly converted to a pointer.
So foo[x][y] is equivalent to *(foo[x] + y), which in turn is equivalent to *(*(foo + x) + y). (Note that no casts are necessary.) Fortunately, you don't have to write it that way, and foo[x][y] is a lot easier to understand.
Note that you can create a data structure that can be accessed with the same foo[x][y] syntax, but where foo really is a pointer to pointer to int. (In that case, the prefix of each [] operator is already a pointer expression, and doesn't need to be converted.) But to do that, you'd have to declare foo as a pointer-to-pointer-to-int:
int **foo;
and then allocate and initialize all the necessary memory. This is more flexible than int foo[5][4], since you can determine the number of rows and the size (or even existence) of each row dynamically.
Section 6 of the comp.lang.c FAQ explains this very well.
EDIT:
In response to Arrakis's comment, it's important to keep in mind the distinction between type and representation.
For example, these two types:
struct pair { int x; int y;};
typedef int arr2[2];
very likely have the same representation in memory (two consecutive int objects), but the syntax to access the elements is quite different.
Similarly, the types int[5][4] and int[20] have the same memory layout (20 consecutive int objects), but the syntax to access the elements is different.
You can access foo[2][2] as ((int*)foo)[10] (treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it's useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don't do array bounds-checking. On the other hand, optimizing compilers can assume that your code's behavior is defined, and generate arbitrary code if it isn't.
Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element foo[0][1], the following code will work:
*((int *)foo+1)=5;
The explicit cast is necessary, because foo+1, is the same as &foo[1] which is not at all the same thing as foo[0][1]. *(foo+1) is a pointer to the fifth element in the flat memory area. In other words, *(foo+1) is basically foo[1] and **(foo+1) is foo[1][0]. Here is how the memory is laid out for some of your two dimensional array:
C arrays - even multi-dimensional ones - are contiguous, ie an array of type int [4][5] is structurally equivalent to an array of type int [20].
However, these types are still incompatible according to C language semantics. In particular, the following code is in violation of the C standard:
int foo[4][5] = { { 0 } };
int *p = &foo[0][0];
int x = p[12]; // undefined behaviour - can't treat foo as int [20]
The reason for this is that the C standard is (probably intentionally) worded in a way which makes bounds-checking implementations possible: As p is derived from foo[0], which has type int [5], valid indices must be in range 0..5 (resp. 0..4 if you actually access the element).
Many other programming languages (Java, Perl, Python, JavaScript, ...) use jagged arrays to implement multi-dimensional arrays. This is also possible in C by using an array of pointers:
int *bar[4] = { NULL };
bar[0] = (int [3]){ 0 };
bar[1] = (int [5]){ 1, 2, 3, 4 };
int y = bar[1][2]; // y == 3
However, jagged arrays are not contiguous, and the pointed-to arrays need not be of uniform size.
Because of implicit conversion of array expressions into pointer expressions, indexing jagged and non-jagged arrays looks identical, but the actual address calculations will be quite different:
&foo[1] == (int (*)[5])((char *)&foo + 1 * sizeof (int [5]))
&bar[1] == (int **)((char *)&bar + 1 * sizeof (int *))
&foo[1][2] == (int *)((char *)&foo[1] + 2 * sizeof (int))
== (int *)((char *)&foo + 1 * sizeof (int [5]) + 2 * sizeof (int))
&bar[1][2] == (int *)((char *)bar[1] + 2 * sizeof (int)) // no & before bar!
== (int *)((char *)*(int **)((char *)&bar + 1 * sizeof (int *))
+ 2 * sizeof (int))
int foo[5][4];
foo is not an array of pointers; it's an array of arrays. Below image will help.