Column-major array storage in C compilers - c

Are there any C compilers that have extensions to store an array in column-major order instead of the standard row-major order?

Short answer is "No".
Long answer is that storing an array in column-major order would break the one-to-one correspondence between array index operations and pointer arithmetics, and the way an N-dimension array is sliced into N-1 dimension arrays.
Consider a 10x20 array stored in column-major order. This means that cells in adjacent columns would be next to each other in memory. On the other hand, converting a pointer to array element at i, j to an element pointer must work like this:
int *p=&a[1][5];
int *q=&a[1][6];
p++;
The standard requires that p is equal q, because the two pointers point to adjacent elements. This would not be possible if array a were stored in column-major order.
In C you would have to write your own set of functions to work with such arrays. If you code in C++, however, you would have an option to implement your own multi-dimension array, and overload the parentheses operator () to work in a column-major order.

Related

Why is it mandatory to specify number of columns while declaring/initializing multi-dimensional arrays?

While declaring a 2-D array in C, I see that the following ways of declarations are valid:
int a[2][3] = {1,2,3,4,5,6}
int a[2][3] = {{1,2,3}, {4,5,6}}
int a[][3] = {1,2,3,4,5,6}
But, the following way of array declaration is invalid in C:
int a[2][] = {1,2,3,4,5,6}
Why is it mandatory to mention the number of columns, regardless of mentioning the number of rows while declaring a 2-D array in C?
There are multiple possible answers, but most start from the fact that C n-dimensional arrays are 1-dimensional arrays of (n-1)-dimensional arrays. From here, we can go several directions, such as
the declared type of an array's elements must be a complete type, and array types having an unspecified dimension are not complete types.
that's closely related to the fact that C's array indexing and the equivalent pointer arithmetic depend on knowing the size of the elements of the array / pointed-to type, and (being an incomplete type) the size of an array type with an unspecified dimension is not known.
Coming from another direction, a declaration such as your example ...
int a[2][] = {1,2,3,4,5,6}
... does not suffice to establish the second dimension of the array. That same initializer would be valid in all of these, among unboundedly-many others:
int a3[2][3] = {1,2,3,4,5,6}; // equivalent to {{1,2,3},{4,5,6}}
int a4[2][4] = {1,2,3,4,5,6}; // equivalent to {{1,2,3,4},{5,6}}
int a5[2][5] = {1,2,3,4,5,6}; // equivalent to {{1,2,3,4,5},{6}}
int a6[2][6] = {1,2,3,4,5,6}; // equivalent to {{1,2,3,4,5,6}}
int a10[2][10] = {1,2,3,4,5,6}; // equivalent to {{1,2,3,4,5,6}}
Note in particular that C does not require an explicit initializer element for every element in any array being initialized.
The ultimate answer, though, is simply that the language designers chose that it should be as it is. Their choice seems logical and internally consistent to me, but that does not mean that they couldn't have chosen differently. Ultimately, there is no other "why?" that matters.
To well answer this question, we first need to get ourselves familiarized with two concepts: row-major implementation and column-major implementation.
In Row-Major Implementation, the elements of a row of an array are placed next to those of the previous row elements. All row elements are placed in contiguous memory addresses. Whereas in Column-Major Implementation, the column-wise elements are placed next to each other in contiguous memory allocations.
For more explanation: https://en.wikipedia.org/wiki/Row-_and_column-major_order
Now coming back, C/C++ language is based on Row-Major implementation. Thus, even if we don't give the number of rows while declaring an array, the number of columns provided during the declaration would be sufficient for the compiler to decide the row to which an element should belong to.
For example, in the following way of declaration and initialization:
int a[][3] = {1,2,3,4,5,6}
Though the number of rows is not mentioned, the number of columns provided would help the compiler decide that the elements 1,2,3 should belong to the first row of the matrix, and 4,5,6 elements should belong to the second row, since each row should only contain 3columns. So this type of declaration and initialization is valid.
On the other hand, in the following example,
int a[2][] = {1,2,3,4,5,6}
It is only given that there should be 2 rows. But since C a is row-major implementation-based language, though it knows that there should be 2 rows, it doesn't know the offset of the number of columns each row should have. So it doesn't have an answer for the question: "from which element should the second row start?".
Since the second row will be filled only after filling the first row in row-major implementation, and here since the number of columns is unknown, the compiler will never be able to get the answer for the above question, and thus it doesn't know which elements should do go the next row. It doesn't know the number of elements in each row due to the lack of offset on the number of columns each row should have.
And hence, regardless of the number of rows being mentioned or not, it is mandatory to mention the number of columns during array declaration in C.

C two dimensional arrays: Is the first 'level' an array of pointers?

In C we have two dimensional arrays, i.e. a[m][n].
In one dimensional arrays a is a pointer to the start of the array.
What about two dimensional arrays? Does a[i] hold a pointer to the start of the i row in an array? And thus a[i] is an array of pointers that is passed to a function in the following matter function(int **a, m, n)?
Does a[i] hold a pointer to the start of the i row in an array?
No. The data of a 2D array in C is a contiguous block of elements plus some clever indexing access. But a 2D array is an array of arrays, not an array of pointers.
Formally, the a[i] holds a 1D array. This may decay to a pointer to the first element of the ith row in certain contexts, but its type is still T[n], for some type T that you have not specified.
In one dimensional arrays a is a pointer to the start of the array.
Not correct. a is an array. When you use a in an expression, it "decays" into a pointer to the first element. To better understand this, read this chapter of the C FAQ, particularly this one.
What about two dimensional arrays? Does a[i] hold a pointer to the start of the i row in an array?
No. In a 2D array, a[i] is an array, while int a[x][y]; is an array of arrays. There are no pointers anywhere.
You might be confused because C allows this syntax: int a[][N] = ...;, but that syntax merely means that the size of the array of arrays depends on the number of items in the initialization list.

Matrixes in contiguous position of memory

I often use to memorize all matrixes in a single vector, because my book says it's faster to use a single vector.And the access to a matrix is slower in time.
If I have a code like this one:
int main(int argc, char **argv)
{
int mat[10][10],i;
for(i=0;i<10;i++)
mat[i][0]=99;
int *ptr=&mat[0][0];
for(i=0;i<10;i++)
{
printf("%d\n",*ptr);
ptr+=10;
}
return 0;
}
I tried to run it 4/5 times and all times prints 10 times 99.
So also matrixes are memorized in contigous positions of memory? Always?
If yes, why the access to a vector is faster?
If by 'matrix' you mean two-dimensional array, then yes they're in contiguous memory. 2D arrays in C are just arrays of arrays (row major). If by vector you mean 1D array, then there's no reason it should be faster than accessing a 2D array.
Well, arrays (in C) are stored in contiguous memory, and since that your mat is array of arrays, it also stored in a contiguous memory. I think that dereferencing by one index (when you have some separating 1D arrays) may be a little faster than dereferencing by two indexes (in matrix), but the difference is too small to worry about.
C has no multidimensional arrays like in other languages, it called them multidimensional but they are really arrays of arrays.
And C arrays are contiguous.
(C99, 6.2.5p20) "An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type."

How an n-dimensional (n>=2) array is represented in memory?

Can anyone provide me with a formula so that I can understand the memory representation of an n-dimensional(n>=2) array like this "How_are_two-dimensional_arrays_represented_in_memory"?
This calculation is applicable for 2D-arrays only.
How to calculate, suppose, a 5D array?
Ok....
I think I found the answer: Array_data_structure#Two-dimensional_arrays
A 2-dimensional array in C is nothing more or less than an array of arrays. A 3-dimensional array is an array of arrays of arrays. And so on.
The relevant section from the C99 standard is 6.5.2.1, "Array subscripting":
Successive subscript operators designate an element of a
multidimensional array object. If E is an n-dimensional array
(n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n −
1)-dimensional array with dimensions j × . . . × k. If the unary *
operator is applied to this pointer explicitly, or implicitly as a
result of subscripting, the result is the pointed-to (n −
1)-dimensional array, which itself is converted into a pointer if used
as other than an lvalue. It follows from this that arrays are stored
in row-major order (last subscript varies fastest).
Some confusion is caused by the fact that the indexing operator is defined in terms of pointer arithmetic. This does not imply that arrays are "really pointers" -- and in fact they very definitely are not. Declaring an array object does not create any pointer objects at all (unless of course it's an array of pointers). But an expression that refers to the array usually (but not always) "decays" to a pointer to the array's first element (that's a pointer value, not a pointer object).
Now simple array objects, of however many dimensions, are quite inflexible. Prior to C99, all array objects had to be of a fixed size determined at compile time. C99 introduced variable-length arrays (VLAs), but even so a VLA's size is fixed when it's declared (and not all compilers support VLAs, even 12 years after the C99 standard was issued).
If you need something more flexible, a common approach is to declare a pointer to the element type, and then allocate an array using malloc() and have the pointer point to the array's first element:
int *ptr = malloc(N * sizeof *ptr);
if (ptr == NULL) /* handle allocation failure */
This lets you refer to elements of the heap-allocated array using the same syntax you'd use for a declared fixed-size array object, but in arr[i] the expression arr decays to a pointer, whereas in ptr[i] `ptr is already a pointer.
The same thing can be extended to higher dimensions. You can allocate an array of pointers, and then initialize each pointer to point to the beginning of an allocated array of whatever.
This gives you something that acts very much like a 2-dimensional (or more) array, but you have to manage the memory yourself; that's the price of the greater flexibility.
Strictly speaking, this is not a 2-dimensional array. A 2-dimensional array, as I said above, is only an array of arrays. It's probably not entirely unreasonable to think of it as a 2-D array, but that conflicts with the usage in the C Standard; it's similar to referring to a linked list as a 1-D array.
The comp.lang.c FAQ is a good resource; section 6, which covers arrays and pointers, is particularly excellent.
A 2 dimensional array is really an array of pointers to arrays. A 2-dimensional array of integers a[i][j] will take up i*sizeof(int*) for the array of pointers, and i*j*sizeof(int) for the final array.
A 3-D array a[i1][i2][i3] is an array of pointers to arrays of pointers to arrays. The first level of arrays contains i1 pointers, the second level contains i1*i2 pointers, the third level contains i1*i2*i3 integers.
In general, an N-dimensional array with sizes i1..iN will have N-1 levels of arrays of pointers and 1 level of arrays of ints. The arrays in level N have length iN and there are product of i1..iN-1 arrays in that level.
So, a 5-D array:
1 array, length i1, of pointers
i1 arrays, length i2, of pointers
i1*i2 arrays, length i3, of pointers
i1*i2*i3 arrays, length i4, of pointers
i1*i2*i3*i4 arrays, length i5, of ints
Hope that helps (and I hope I got the indices right).
That wikipedia link you posted refers to a /different kind of multidimensional array/. By default, C multidimensional arrays are the way I just described. You can also abstract them as a single dimensional array. This saves memory and makes the entire array contiguous, but it makes accessing elements somewhat more complex. For the 5-D example:
// WARNING I AM CHANGING NOTATION. N1..N5 are the lengths in each direction.
// i1..i5 are the indicies.
int* bigarray = malloc(sizeof(int)*N1*N2*N3*N4*N5);
// now instead of bigarray[i1][i2][i3][i4][i5], write this:
*(bigarray + i1*N2*N3*N4*N5 + i2*N3*N4*N5 + i3*N4*N5 + i4*N5 + i5);
each term there is an offset times the number of elements we need to offset. For example, to increment by one first-dimension level we need to traverse the the four remaining dimensions once to 'wrap around', if you will.
How arrays are stored in memory for C is not, as I recall, standardized. But for some information about arrays, and how they might be stored in memory, see the following two links:
http://webster.cs.ucr.edu/AoA/Windows/HTML/Arraysa2.html
http://publications.gbdirect.co.uk/c_book/chapter5/arrays.html
The first link is more general, and discusses different ways of storing arrays, while the second discusses the most likely way a C array may be layout in memory.

How does C allocate data items in a multidimensional array?

I'd like to find out how C will allocate a the data items of a multidimensional array, and if their allocation is consistent across machines.
I know that, at the lowest level, the data items are neighbours, but I don't know how they're arranged further up.
For example, if I allocate a 3D array as int threeD[10][5][6], can I assume that &(threeD[4][2][5]) + 1 == &(threeD[4][3][0])? On all machines?
Thanks in advance for your help.
Yes, arrays are stored in row major order across all implementations of C compilers.
The Standard says (I applied some reformatting):
6.5.2.1 Array subscripting
Constraints
3 Successive subscript operators designate an element of a multidimensional
array object.
If E is an n-dimensional array (n >= 2) with dimensions i * j * . . . * k,
then E (used a s other than an lvalue) is converted to a pointer to an
(n - 1)-dimensional array with dimensions j * . . . * k.
If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to
(n - 1)-dimensional array, which itself is converted into a pointer if
used as other than an lvalue. It follows from this that arrays are stored
in row-major order (last subscript varies fastest).
The C standard is very specific in equating array subscripting with pointer arithmetic, and specifies that arrays are stored in row major order.
Consider the array object defined by the declaration
int x[3][5];
Here x is a 3 x 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to
(*((x)+(i))), x is first converted to a pointer to the initial array of five ints. Then
i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.
The elements are stored in Row Major order. So Elements along the last dimension are contiguous. However, elements between rows (as indicated by your example) aren't guaranteed to be contiguous. It depends on how the initial memory has been allocated.
#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>
// only elements in a single row are guaranteed to be
// contiguous because of the multiple mallocs
void main(void)
{
// 3 rows, 4 columns
int *a[3];
for ( int row = 0; row < 3; row++ )
a[row] = (int *)malloc(4*sizeof(int));
}
// all elements are guaranteed to be contiguous
// in a row major order.
void main(void)
{
// 3 rows, 4 columns
int *a[3];
int *buf = (int *)malloc(3*4*sizeof(int));
for ( int row = 0; row < 3; row++ )
a[row] = buf+4*row;
assert( (&a[1][3] + 1) == &a[2][0] );
}
Firstly, In C language address arithmetic is only defined within the boundaries of a given array. (I wanted to say "single-dimensional (SD) array", but technically all arrays in C are SD. Multi-dimensional arrays are built as SD arrays of SD arrays. And this view of arrays is the most appropriate for this topic). In C you can start from the pointer to the beginning of an array and move back and forth within that array using additive operations. You are not allowed to cross the boundaries of the array you started from, except that it is legal to form a pointer to an imaginary element that follows the last element. However, when it comes to accessing elements (reading and writing), you are only allowed to access the real, existing elements of the array you started from.
Secondly, in your example '&threeD[4][2][5] + 1' you are forming a pointer to the imaginary "past-the-last" element of array 'threeD[4][2]'. This by itself is legal. However, the language specification does not guarantee that this pointer is equal to the address of '&threeD[4][3][0]'. The only thing that it says is that it might be equal to it. It is true, that the other requirements imposed on arrays by the language specification pretty much "force" this relationship to hold. But it is not formally guaranteed. Some pedantic (to the point of being malicious) implementation is perfectly allowed to use some kind of compiler magic to break this relationship.
Thirdly, actually accessing '*(threeD[4][2][5] + 1)' is always illegal. Even if the pointer is pointing into the next array, the compiler is allowed to perform the necessary run-time checks and generate a segmentation fault, since you are using pointer arithmetic on 'threeD[4][2]' array and trying to access something outside its boundaries.
Fourthly, doing 'threeD[4][2][5] + 2', '...+ 3' etc. is always illegal for similar reasons (remember: one past the end is OK, but 2, 3 or more is illegal).
And finally, fifthly: yes I know that in many (if not most) (if not all) practical cases interpreting a 'T A[2][3][4]' array as a flat 'T A[2*3*4]' array will work. But, again, from the formal language point of view this is illegal. And don't be surprised if this perfectly working code will one day trigger a huge amount of warnings from some static or dynamic code analysis tool, if not from the compiler itself.

Resources