Why would C have "fake arrays"? [closed] - c

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm reading The Unix haters handbook and in chapter 9 there's something I don't really understand:
C doesn’t really have arrays either. It has something that looks like an array
but is really a pointer to a memory location.
I can't really imagine any way to store an array in memory other than using pointers to index memory locations. How C implements "fake" arrays, anyways? Is there any veracity on this claim?

I think the author’s point is that C arrays are really just a thin veneer on pointer arithmetic. The subscript operator is defined simply as a[b] == *(a + b), so you can easily say 5[a] instead of a[5] and do other horrible things like access the array past the last index.
Comparing to that, a “true array” would be one that knows its own size, doesn’t let you do pointer arithmetic, access past the last index without an error, or access its contents using a different item type. In other words, a “true array” is a tight abstraction that doesn’t tie you to a single representation – it could be a linked list instead, for example.
PS. To spare myself some trouble: I don’t really have an opinion on this, I’m just explaining the quote from the book.

There is a difference between C arrays and pointers, and it can be seen by the output of sizeof() expressions. For example:
void sample1(const char * ptr)
{
/* s1 depends on pointer size of architecture */
size_t s1 = sizeof(ptr);
}
size_t sample2(const char arr[])
{
/* s2 also depends on pointer size of architecture, because arr decays to pointer */
size_t s2 = sizeof(arr);
return s2;
}
void sample3(void)
{
const char arr[3];
/* s3 = 3 * sizeof(char) = 3 */
size_t s2 = sizeof(arr);
}
void sample4(void)
{
const char arr[3];
/* s4 = output of sample2(arr) which... depends on pointer size of architecture, because arr decays to pointer */
size_t s4 = sample2(arr);
}
The sample2 and sample4 in particular is probably why people tend to conflate C arrays with C pointers, because in other languages you can simply pass arrays as an argument to a function and have it work 'just the same' as it did in the caller function. Similarly because of how C works you can pass pointers instead of arrays and this is 'valid', whereas in other languages with a clearer distinction between arrays and pointers it would not be.
You could also view the sizeof() output as a consequence of C's pass-by-value semantics (since C arrays decay to pointers).
Also, some compilers also support this C syntax:
void foo(const char arr[static 2])
{
/* arr must be **at least** 2 elements in size, cannot pass NULL */
}

The statement you quoted is factually incorrect. Arrays in C are not pointers.
The idea of implementing arrays as pointers was used in B and BCPL languages (ancestors of C), but it has not survived transition to C. At the early ages of C the "backward compatibility" with B and BCPL was considered somewhat important, which is why C arrays closely emulate behavior of B and BCPL arrays (i.e. C arrays easily "decay" to pointers). Nevertheless, C arrays are not "pointers to a memory location".
The book quote is completely bogus. This misconception is rather widespread among C newbies. But how it managed to get into a book is beyond me.

Author probably means, that arrays are constrained in ways which make them feel like 2nd class citizens from programmer point of view. For example, two functions, one is ok, another is not:
int finefunction() {
int ret = 5;
return ret;
}
int[] wtffunction() {
int ret[1] = { 5 };
return ret;
}
You can work around this a bit by wrapping arrays in structs, but it just sort of emphasizes that arrays are different, they're not like other types.
struct int1 {
int a[1];
}
int[] finefunction2() {
struct int1 ret = { { 5 } };
return ret;
}
Another effect of this is, that you can't get size of array at runtime:
int my_sizeof(int a[]) {
int size = sizeof(a);
return size;
}
int main() {
int arr[5];
// prints 20 4, not 20 20 as it would if arrays were 1st class things
printf("%d %d\n", sizeof(arr), my_sizeof(arr));
}
Another way to say what the authors says is, in C (and C++) terminology, "array" means something else than in most other languages.
So, your title question, how would a "true array" be stored in memory. Well, there is no one single kind of "true array". If you wanted true arrays in C, you have basically two options:
Use calloc to allocate buffer, and store pointer and item count here
struct intarrayref {
size_t count;
int *data;
}
This struct is basically reference to array, and you can pass it around nicely to functions etc. You will want to write functions to operate on it, such as create copy of the actual data.
Use flexible array member, and allocate whole struct with single calloc
struct intarrayobject {
size_t count;
int data[];
}
In this case, you allocate both the metadata (count), and the space for array data in one go, but the price is, you can't pass this struct around as value any more, because that would leave behind the extra data. You have to pass pointer to this struct to functions etc. So it is matter of opinion whether one would consider this a "true array" or just slightly enhanced normal C array.

Like the entire book, it's a case of trolling, specifically, the type of trolling that involves stating something almost-true but wrong to solicit angry responses about why it's wrong. C most certainly does have actual arrays/array types, as evidenced by the way pointer-to-array types (and multi-dimensional arrays) work.

Related

Is creating an array with a built-in lenght common in c?

For an experiment I created a function to initialize an array that have a built-in length like in java
int *create_arr(int len) {
void *ptr = malloc(sizeof(int[len + 1]));
int *arr = ptr + sizeof(int);
arr[-1] = len;
return arr;
}
that can be later be used like this
int *arr = create_arr(12);
and allow to find the length at arr[-1]. I was asking myself if this is a common practice or not, and if there is an error in what i did.
First of all, your code has some bugs, mainly that in standard C you can't do arithmetic on void pointers (as commented by MikeCAT). Probably a more typical way to write it would be:
int *create_arr(int len) {
int *ptr = malloc((len + 1) * sizeof(int));
if (ptr == NULL) {
// handle allocation failure
}
ptr[0] = len;
return ptr + 1;
}
This is legal but no, it's not common. It's more idiomatic to keep track of the length in a separate variable, not as part of the array itself. An exception is functions that try to reproduce the effect of malloc, where the caller will later pass back the pointer to the array but not the size.
One other issue with this approach is that it limits your array length to the maximum value of an int. On, let's say, a 64-bit system with 32-bit ints, you could conceivably want an array whose length did not fit in an int. Normally you'd use size_t for array lengths instead, but that won't work if you need to fit the length in an element of the array itself. (And of course this limitation would be much more severe if you wanted an array of short or char or bool :-) )
Note that, as Andrew Henle comments, the pointer returned by your function could be used for an array of int, but would not be safe to use for other arbitrary types as you have destroyed the alignment promised by malloc. So if you're trying to make a general wrapper or replacement for malloc, this doesn't do it.
Apart from the small mistakes that have already been pointed in comments, this is not common, because C programmers are used to handle arrays as an initial pointer and a size. I have mainly seen that in mixed programming environments, for example in Windows COM/DCOM where C++ programs can exchange data with VB programs.
Your array with builtin size is close to winAPI BSTR: an array of 16 bits wide chars where the allocated size is at index -1 (and is also a 16 bit integer). So there is nothing really bad with it.
But in the general case, you could have an alignment problem. malloc does return a pointer with a suitable alignment for any type. And you should make sure that the 0th index of your returned array also has a suitable alignment. If int has not the larger alignment, it could fail...
Furthermore, as the pointer is not a the beginning of the allocated memory, the array would require a special function for its deallocation. It should probaby be documented in a red flashing font, because this would be very uncommon for most C programmers.
This technique is not as uncommon as people expect. For example stb header only library for image processing uses this method to implement type safe vector like container in C. See https://github.com/nothings/stb/blob/master/stretchy_buffer.h
It would be more idiomatic to do something like:
struct array {
int *d;
size_t s;
};
struct array *
create_arr(size_t len)
{
struct array *a = malloc(sizeof *a);
if( a ){
a->d = malloc(len * sizeof *a->d);
a->s = a->d ? len : 0;
}
return a;
}

Is it good programming practice in C to use first array element as array length?

Because in C the array length has to be stated when the array is defined, would it be acceptable practice to use the first element as the length, e.g.
int arr[9]={9,0,1,2,3,4,5,6,7};
Then use a function such as this to process the array:
int printarr(int *ARR) {
for (int i=1; i<ARR[0]; i++) {
printf("%d ", ARR[i]);
}
}
I can see no problem with this but would prefer to check with experienced C programmers first. I would be the only one using the code.
Well, it's bad in the sense that you have an array where the elements does not mean the same thing. Storing metadata with the data is not a good thing. Just to extrapolate your idea a little bit. We could use the first element to denote the element size and then the second for the length. Try writing a function utilizing both ;)
It's also worth noting that with this method, you will have problems if the array is bigger than the maximum value an element can hold, which for char arrays is a very significant limitation. Sure, you can solve it by using the two first elements. And you can also use casts if you have floating point arrays. But I can guarantee you that you will run into hard traced bugs due to this. Among other things, endianness could cause a lot of issues.
And it would certainly confuse virtually every seasoned C programmer. This is not really a logical argument against the idea as such, but rather a pragmatic one. Even if this was a good idea (which it is not) you would have to have a long conversation with EVERY programmer who will have anything to do with your code.
A reasonable way of achieving the same thing is using a struct.
struct container {
int *arr;
size_t size;
};
int arr[10];
struct container c = { .arr = arr, .size = sizeof arr/sizeof *arr };
But in any situation where I would use something like above, I would probably NOT use arrays. I would use dynamic allocation instead:
const size_t size = 10;
int *arr = malloc(sizeof *arr * size);
if(!arr) { /* Error handling */ }
struct container c = { .arr = arr, .size = size };
However, do be aware that if you init it this way with a pointer instead of an array, you're in for "interesting" results.
You can also use flexible arrays, as Andreas wrote in his answer
In C you can use flexible array members. That is you can write
struct intarray {
size_t count;
int data[]; // flexible array member needs to be last
};
You allocate with
size_t count = 100;
struct intarray *arr = malloc( sizeof(struct intarray) + sizeof(int)*count );
arr->count = count;
That can be done for all types of data.
It makes the use of C-arrays a bit safer (not as safe as the C++ containers, but safer than plain C arrays).
Unforntunately, C++ does not support this idiom in the standard.
Many C++ compilers provide it as extension though, but it is not guarantueed.
On the other hand this C FLA idiom may be more explicit and perhaps more efficient than C++ containers as it does not use an extra indirection and/or need two allocations (think of new vector<int>).
If you stick to C, I think this is a very explicit and readable way of handling variable length arrays with an integrated size.
The only drawback is that the C++ guys do not like it and prefer C++ containers.
It is not bad (I mean it will not invoke undefined behavior or cause other portability issues) when the elements of array are integers, but instead of writing magic number 9 directly you should have it calculate the length of array to avoid typo.
#include <stdio.h>
int main(void) {
int arr[9]={sizeof(arr)/sizeof(*arr),0,1,2,3,4,5,6,7};
for (int i=1; i<arr[0]; i++) {
printf("%d ", arr[i]);
}
return 0;
}
Only a few datatypes are suitable for that kind of hack. Therefore, I would advise against it, as this will lead to inconsistent implementation styles across different types of arrays.
A similar approach is used very often with character buffers where in the beginning of the buffer there is stored its actual length.
Dynamic memory allocation in C also uses this approach that is the allocated memory is prefixed with an integer that keeps the size of the allocated memory.
However in general with arrays this approach is not suitable. For example a character array can be much larger than the maximum positive value (127) that can be stored in an object of the type char. Moreover it is difficult to pass a sub-array of such an array to a function. Most of functions that designed to deal with arrays will not work in such a case.
A general approach to declare a function that deals with an array is to declare two parameters. The first one has a pointer type that specifies the initial element of an array or sub-array and the second one specifies the number of elements in the array or sub-array.
Also C allows to declare functions that accepts variable length arrays when their sizes can be specified at run-time.
It is suitable in rather limited circumstances. There are better solutions to the problem it solves.
One problem with it is that if it is not universally applied, then you would have a mix of arrays that used the convention and those that didn't - you have no way of telling if an array uses the convention or not. For arrays used to carry strings for example you have to continually pass &arr[1] in calls to the standard string library, or define a new string library that uses "Pascal strings" rather then "ASCIZ string" conventions (such a library would be more efficient as it happens),
In the case of a true array rather then simply a pointer to memory, sizeof(arr) / sizeof(*arr) will yield the number of elements without having to store it in the array in any case.
It only really works for integer type arrays and for char arrays would limit the length to rather short. It is not practical for arrays of other object types or data structures.
A better solution would be to use a structure:
typedef struct
{
size_t length ;
int* data ;
} intarray_t ;
Then:
int data[9] ;
intarray_t array{ sizeof(data) / sizeof(*data), data } ;
Now you have an array object that can be passed to functions and retain the size information and the data member can be accesses directly for use in third-party or standard library interfaces that do not accept the intarray_t. Moreover the type of the data member can be anything.
Obviously NO is the answer.
All programming languages has predefined functions stored along with the variable type. Why not use them??
In your case is more suitable to access count /length method instead of testing the first value.
An if clause sometimes take more time than a predefined function.
On the first look seems ok to store the counter but imagine you will have to update the array. You will have to do 2 operations, one to insert other to update the counter. So 2 operations means 2 variables to be changed.
For statically arrays might be ok to have them counter then the list, but for dinamic ones NO NO NO.
On the other hand please read programming basic concepts and you will find your idea as a bad one, not complying with programming principles.

Why is the need of pointer to an array? [duplicate]

This question goes out to the C gurus out there:
In C, it is possible to declare a pointer as follows:
char (* p)[10];
.. which basically states that this pointer points to an array of 10 chars. The neat thing about declaring a pointer like this is that you will get a compile time error if you try to assign a pointer of an array of different size to p. It will also give you a compile time error if you try to assign the value of a simple char pointer to p. I tried this with gcc and it seems to work with ANSI, C89 and C99.
It looks to me like declaring a pointer like this would be very useful - particularly, when passing a pointer to a function. Usually, people would write the prototype of such a function like this:
void foo(char * p, int plen);
If you were expecting a buffer of an specific size, you would simply test the value of plen. However, you cannot be guaranteed that the person who passes p to you will really give you plen valid memory locations in that buffer. You have to trust that the person who called this function is doing the right thing. On the other hand:
void foo(char (*p)[10]);
..would force the caller to give you a buffer of the specified size.
This seems very useful but I have never seen a pointer declared like this in any code I have ever ran across.
My question is: Is there any reason why people do not declare pointers like this? Am I not seeing some obvious pitfall?
What you are saying in your post is absolutely correct. I'd say that every C developer comes to exactly the same discovery and to exactly the same conclusion when (if) they reach certain level of proficiency with C language.
When the specifics of your application area call for an array of specific fixed size (array size is a compile-time constant), the only proper way to pass such an array to a function is by using a pointer-to-array parameter
void foo(char (*p)[10]);
(in C++ language this is also done with references
void foo(char (&p)[10]);
).
This will enable language-level type checking, which will make sure that the array of exactly correct size is supplied as an argument. In fact, in many cases people use this technique implicitly, without even realizing it, hiding the array type behind a typedef name
typedef int Vector3d[3];
void transform(Vector3d *vector);
/* equivalent to `void transform(int (*vector)[3])` */
...
Vector3d vec;
...
transform(&vec);
Note additionally that the above code is invariant with relation to Vector3d type being an array or a struct. You can switch the definition of Vector3d at any time from an array to a struct and back, and you won't have to change the function declaration. In either case the functions will receive an aggregate object "by reference" (there are exceptions to this, but within the context of this discussion this is true).
However, you won't see this method of array passing used explicitly too often, simply because too many people get confused by a rather convoluted syntax and are simply not comfortable enough with such features of C language to use them properly. For this reason, in average real life, passing an array as a pointer to its first element is a more popular approach. It just looks "simpler".
But in reality, using the pointer to the first element for array passing is a very niche technique, a trick, which serves a very specific purpose: its one and only purpose is to facilitate passing arrays of different size (i.e. run-time size). If you really need to be able to process arrays of run-time size, then the proper way to pass such an array is by a pointer to its first element with the concrete size supplied by an additional parameter
void foo(char p[], unsigned plen);
Actually, in many cases it is very useful to be able to process arrays of run-time size, which also contributes to the popularity of the method. Many C developers simply never encounter (or never recognize) the need to process a fixed-size array, thus remaining oblivious to the proper fixed-size technique.
Nevertheless, if the array size is fixed, passing it as a pointer to an element
void foo(char p[])
is a major technique-level error, which unfortunately is rather widespread these days. A pointer-to-array technique is a much better approach in such cases.
Another reason that might hinder the adoption of the fixed-size array passing technique is the dominance of naive approach to typing of dynamically allocated arrays. For example, if the program calls for fixed arrays of type char[10] (as in your example), an average developer will malloc such arrays as
char *p = malloc(10 * sizeof *p);
This array cannot be passed to a function declared as
void foo(char (*p)[10]);
which confuses the average developer and makes them abandon the fixed-size parameter declaration without giving it a further thought. In reality though, the root of the problem lies in the naive malloc approach. The malloc format shown above should be reserved for arrays of run-time size. If the array type has compile-time size, a better way to malloc it would look as follows
char (*p)[10] = malloc(sizeof *p);
This, of course, can be easily passed to the above declared foo
foo(p);
and the compiler will perform the proper type checking. But again, this is overly confusing to an unprepared C developer, which is why you won't see it in too often in the "typical" average everyday code.
I would like to add to AndreyT's answer (in case anyone stumbles upon this page looking for more info on this topic):
As I begin to play more with these declarations, I realize that there is major handicap associated with them in C (apparently not in C++). It is fairly common to have a situation where you would like to give a caller a const pointer to a buffer you have written into. Unfortunately, this is not possible when declaring a pointer like this in C. In other words, the C standard (6.7.3 - Paragraph 8) is at odds with something like this:
int array[9];
const int (* p2)[9] = &array; /* Not legal unless array is const as well */
This constraint does not seem to be present in C++, making these type of declarations far more useful. But in the case of C, it is necessary to fall back to a regular pointer declaration whenever you want a const pointer to the fixed size buffer (unless the buffer itself was declared const to begin with). You can find more info in this mail thread: link text
This is a severe constraint in my opinion and it could be one of the main reasons why people do not usually declare pointers like this in C. The other being the fact that most people do not even know that you can declare a pointer like this as AndreyT has pointed out.
The obvious reason is that this code doesn't compile:
extern void foo(char (*p)[10]);
void bar() {
char p[10];
foo(p);
}
The default promotion of an array is to an unqualified pointer.
Also see this question, using foo(&p) should work.
I also want to use this syntax to enable more type checking.
But I also agree that the syntax and mental model of using pointers is simpler, and easier to remember.
Here are some more obstacles I have come across.
Accessing the array requires using (*p)[]:
void foo(char (*p)[10])
{
char c = (*p)[3];
(*p)[0] = 1;
}
It is tempting to use a local pointer-to-char instead:
void foo(char (*p)[10])
{
char *cp = (char *)p;
char c = cp[3];
cp[0] = 1;
}
But this would partially defeat the purpose of using the correct type.
One has to remember to use the address-of operator when assigning an array's address to a pointer-to-array:
char a[10];
char (*p)[10] = &a;
The address-of operator gets the address of the whole array in &a, with the correct type to assign it to p. Without the operator, a is automatically converted to the address of the first element of the array, same as in &a[0], which has a different type.
Since this automatic conversion is already taking place, I am always puzzled that the & is necessary. It is consistent with the use of & on variables of other types, but I have to remember that an array is special and that I need the & to get the correct type of address, even though the address value is the same.
One reason for my problem may be that I learned K&R C back in the 80s, which did not allow using the & operator on whole arrays yet (although some compilers ignored that or tolerated the syntax). Which, by the way, may be another reason why pointers-to-arrays have a hard time to get adopted: they only work properly since ANSI C, and the & operator limitation may have been another reason to deem them too awkward.
When typedef is not used to create a type for the pointer-to-array (in a common header file), then a global pointer-to-array needs a more complicated extern declaration to share it across files:
fileA:
char (*p)[10];
fileB:
extern char (*p)[10];
Well, simply put, C doesn't do things that way. An array of type T is passed around as a pointer to the first T in the array, and that's all you get.
This allows for some cool and elegant algorithms, such as looping through the array with expressions like
*dst++ = *src++
The downside is that management of the size is up to you. Unfortunately, failure to do this conscientiously has also led to millions of bugs in C coding, and/or opportunities for malevolent exploitation.
What comes close to what you ask in C is to pass around a struct (by value) or a pointer to one (by reference). As long as the same struct type is used on both sides of this operation, both the code that hand out the reference and the code that uses it are in agreement about the size of the data being handled.
Your struct can contain whatever data you want; it could contain your array of a well-defined size.
Still, nothing prevents you or an incompetent or malevolent coder from using casts to fool the compiler into treating your struct as one of a different size. The almost unshackled ability to do this kind of thing is a part of C's design.
You can declare an array of characters a number of ways:
char p[10];
char* p = (char*)malloc(10 * sizeof(char));
The prototype to a function that takes an array by value is:
void foo(char* p); //cannot modify p
or by reference:
void foo(char** p); //can modify p, derefernce by *p[0] = 'f';
or by array syntax:
void foo(char p[]); //same as char*
I would not recommend this solution
typedef int Vector3d[3];
since it obscures the fact that Vector3D has a type that you
must know about. Programmers usually dont expect variables of the
same type to have different sizes. Consider :
void foo(Vector3d a) {
Vector3d b;
}
where sizeof a != sizeof b
Maybe I'm missing something, but... since arrays are constant pointers, basically that means that there's no point in passing around pointers to them.
Couldn't you just use void foo(char p[10], int plen); ?
type (*)[];
// points to an array e.g
int (*ptr)[5];
// points to an 5 integer array
// gets the address of the array
type *[];
// points to an array of pointers e.g
int* ptr[5]
// point to an array of five integer pointers
// point to 5 adresses.
On my compiler (vs2008) it treats char (*p)[10] as an array of character pointers, as if there was no parentheses, even if I compile as a C file. Is compiler support for this "variable"? If so that is a major reason not to use it.

Triple pointers in C: is it a matter of style?

I feel like triple pointers in C are looked at as "bad". For me, it makes sense to use them at times.
Starting from the basics, the single pointer has two purposes: to create an array, and to allow a function to change its contents (pass by reference):
char *a;
a = malloc...
or
void foo (char *c); //means I'm going to modify the parameter in foo.
{ *c = 'f'; }
char a;
foo(&a);
The double pointer can be a 2D array (or array of arrays, since each "column" or "row" need not be the same length). I personally like to use it when I need to pass a 1D array:
void foo (char **c); //means I'm going to modify the elements of an array in foo.
{ (*c)[0] = 'f'; }
char *a;
a = malloc...
foo(&a);
To me, that helps describe what foo is doing. However, it is not necessary:
void foo (char *c); //am I modifying a char or just passing a char array?
{ c[0] = 'f'; }
char *a;
a = malloc...
foo(a);
will also work.
According to the first answer to this question, if foo were to modify the size of the array, a double pointer would be required.
One can clearly see how a triple pointer (and beyond, really) would be required. In my case if I were passing an array of pointers (or array of arrays), I would use it. Evidently it would be required if you are passing into a function that is changing the size of the multi-dimensional array. Certainly an array of arrays of arrays is not too common, but the other cases are.
So what are some of the conventions out there? Is this really just a question of style/readability combined with the fact that many people have a hard time wrapping their heads around pointers?
Using triple+ pointers is harming both readability and maintainability.
Let's suppose you have a little function declaration here:
void fun(int***);
Hmmm. Is the argument a three-dimensional jagged array, or pointer to two-dimensional jagged array, or pointer to pointer to array (as in, function allocates an array and assigns a pointer to int within a function)
Let's compare this to:
void fun(IntMatrix*);
Surely you can use triple pointers to int to operate on matrices. But that's not what they are. The fact that they're implemented here as triple pointers is irrelevant to the user.
Complicated data structures should be encapsulated. This is one of manifest ideas of Object Oriented Programming. Even in C, you can apply this principle to some extent. Wrap the data structure in a struct (or, very common in C, using "handles", that is, pointers to incomplete type - this idiom will be explained later in the answer).
Let's suppose that you implemented the matrices as jagged arrays of double. Compared to contiguous 2D arrays, they are worse when iterating over them (as they don't belong to a single block of contiguous memory) but allow for accessing with array notation and each row can have different size.
So now the problem is you can't change representations now, as the usage of pointers is hard-wired over user code, and now you're stuck with inferior implementation.
This wouldn't be even a problem if you encapsulated it in a struct.
typedef struct Matrix_
{
double** data;
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i][j];
}
simply gets changed to
typedef struct Matrix_
{
int width;
double data[]; //C99 flexible array member
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i*m->width+j];
}
The handle technique works like this: in the header file, you declare a incomplete struct and all the functions that work on the pointer to the struct:
// struct declaration with no body.
struct Matrix_;
// optional: allow people to declare the matrix with Matrix* instead of struct Matrix*
typedef struct Matrix_ Matrix;
Matrix* create_matrix(int w, int h);
void destroy_matrix(Matrix* m);
double get_element(Matrix* m, int i, int j);
double set_element(Matrix* m, double value, int i, int j);
in the source file you declare the actual struct and define all the functions:
typedef struct Matrix_
{
int width;
double data[]; //C99 flexible array member
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i*m->width+j];
}
/* definition of the rest of the functions */
The rest of the world doesn't know what does the struct Matrix_ contain and it doesn't know the size of it. This means users can't declare the values directly, but only by using pointer to Matrix and the create_matrix function. However, the fact that the user doesn't know the size means the user doesn't depend on it - which means we can remove or add members to struct Matrix_ at will.
Most of the time, the use of 3 levels of indirection is a symptom of bad design decisions made elsewhere in the program. Therefore it is regarded as bad practice and there are jokes about "three star programmers" where, unlike the the rating for restaurants, more stars means worse quality.
The need for 3 levels of indirection often originates from the confusion about how to properly allocate multi-dimensional arrays dynamically. This is often taught incorrectly even in programming books, partially because doing it correctly was burdensome before the C99 standard. My Q&A post Correctly allocating multi-dimensional arrays addresses that very issue and also illustrates how multiple levels of indirection will make the code increasingly hard to read and maintain.
Though as that post explains, there are some situations where a type** might make sense. A variable table of strings with variable length is such an example. And when that need for type** arises, you might soon be tempted to use type***, because you need to return your type** through a function parameter.
Most often this need arises in a situation where you are designing some manner of complex ADT. For example, lets say that we are coding a hash table, where each index is a 'chained' linked list, and each node in the linked list an array. The proper solution then is to re-design the program to use structs instead of multiple levels of indirection. The hash table, linked list and array should be distinct types, autonomous types without any awareness of each other.
So by using proper design, we will avoid the multiple stars automatically.
But as with every rule of good programming practice, there are always exceptions. It is perfectly possible to have a situation like:
Must implement an array of strings.
The number of strings is variable and may change in run-time.
The length of the strings is variable.
You can implement the above as an ADT, but there may also be valid reasons to keep things simple and just use a char* [n]. You then have two options to allocate this dynamically:
char* (*arr_ptr)[n] = malloc( sizeof(char*[n]) );
or
char** ptr_ptr = malloc( sizeof(char*[n]) );
The former is more formally correct, but also cumbersome. Because it has to be used as (*arr_ptr)[i] = "string";, while the alternative can be used as ptr_ptr[i] = "string";.
Now suppose we have to place the malloc call inside a function and the return type is reserved for an error code, as is custom with C APIs. The two alternatives will then look like this:
err_t alloc_arr_ptr (size_t n, char* (**arr)[n])
{
*arr = malloc( sizeof(char*[n]) );
return *arr == NULL ? ERR_ALLOC : OK;
}
or
err_t alloc_ptr_ptr (size_t n, char*** arr)
{
*arr = malloc( sizeof(char*[n]) );
return *arr == NULL ? ERR_ALLOC : OK;
}
It is quite hard to argue and say that the former is more readable, and it also comes with the cumbersome access needed by the caller. The three star alternative is actually more elegant, in this very specific case.
So it does us no good to dismiss 3 levels of indirection dogmatically. But the choice to use them must be well-informed, with an awareness that they may create ugly code and that there are other alternatives.
So what are some of the conventions out there? Is this really just a question of style/readability combined with the fact that many people have a hard time wrapping their heads around pointers?
Multiple indirection is not bad style, nor black magic, and if you're dealing with high-dimension data then you're going to be dealing with high levels of indirection; if you're really dealing with a pointer to a pointer to a pointer to T, then don't be afraid to write T ***p;. Don't hide pointers behind typedefs unless whoever is using the type doesn't have to worry about its "pointer-ness". For example, if you're providing the type as a "handle" that gets passed around in an API, such as:
typedef ... *Handle;
Handle h = NewHandle();
DoSomethingWith( h, some_data );
DoSomethingElseWith( h, more_data );
ReleaseHandle( h );
then sure, typedef away. But if h is ever meant to be dereferenced, such as
printf( "Handle value is %d\n", *h );
then don't typedef it. If your user has to know that h is a pointer to int1 in order to use it properly, then that information should not be hidden behind a typedef.
I will say that in my experience I haven't had to deal with higher levels of indirection; triple indirection has been the highest, and I haven't had to use it more than a couple of times. If you regularly find yourself dealing with >3-dimensional data, then you'll see high levels of indirection, but if you understand how pointer expressions and indirection work it shouldn't be an issue.
1. Or a pointer to pointer to int, or pointer to pointer to pointer to pointer to struct grdlphmp, or whatever.
After two levels of indirection, comprehension becomes difficult. Moreover if the reason you're passing these triple (or more) pointers into your methods is so that they can re-allocate and re-set some pointed-to memory, that gets away from the concept of methods as "functions" that just return values and don't affect state. This also negatively affects comprehension and maintainability beyond some point.
But more fundamentally, you've hit upon one of the main stylistic objections to the triple pointer right here:
One can clearly see how a triple pointer (and beyond, really) would be required.
It's the "and beyond" that is the issue here: once you get to three levels, where do you stop? Surely it's possible to have an aribitrary number of levels of indirection. But it's better to just have a customary limit someplace where comprehensibility is still good but flexibility is adequate. Two's a good number. "Three star programming", as it's sometimes called, is controversial at best; it's either brilliant, or a headache for those who need to maintain the code later.
Unfortunately you misunderstood the concept of pointer and arrays in C. Remember that arrays are not pointers.
Starting from the basics, the single pointer has two purposes: to create an array, and to allow a function to change its contents (pass by reference):
When you declare a pointer, then you need to initialize it before using it in the program. It can be done either by passing address of a variable to it or by dynamic memory allocation.
In latter, pointer can be used as indexed arrays (but it is not an array).
The double pointer can be a 2D array (or array of arrays, since each "column" or "row" need not be the same length). I personally like to use it when I need to pass a 1D array:
Again wrong. Arrays are not pointers and vice-versa. A pointer to pointer is not the 2D array.
I would suggest you to read the c-faq section 6. Arrays and Pointers.

C pointers : pointing to an array of fixed size

This question goes out to the C gurus out there:
In C, it is possible to declare a pointer as follows:
char (* p)[10];
.. which basically states that this pointer points to an array of 10 chars. The neat thing about declaring a pointer like this is that you will get a compile time error if you try to assign a pointer of an array of different size to p. It will also give you a compile time error if you try to assign the value of a simple char pointer to p. I tried this with gcc and it seems to work with ANSI, C89 and C99.
It looks to me like declaring a pointer like this would be very useful - particularly, when passing a pointer to a function. Usually, people would write the prototype of such a function like this:
void foo(char * p, int plen);
If you were expecting a buffer of an specific size, you would simply test the value of plen. However, you cannot be guaranteed that the person who passes p to you will really give you plen valid memory locations in that buffer. You have to trust that the person who called this function is doing the right thing. On the other hand:
void foo(char (*p)[10]);
..would force the caller to give you a buffer of the specified size.
This seems very useful but I have never seen a pointer declared like this in any code I have ever ran across.
My question is: Is there any reason why people do not declare pointers like this? Am I not seeing some obvious pitfall?
What you are saying in your post is absolutely correct. I'd say that every C developer comes to exactly the same discovery and to exactly the same conclusion when (if) they reach certain level of proficiency with C language.
When the specifics of your application area call for an array of specific fixed size (array size is a compile-time constant), the only proper way to pass such an array to a function is by using a pointer-to-array parameter
void foo(char (*p)[10]);
(in C++ language this is also done with references
void foo(char (&p)[10]);
).
This will enable language-level type checking, which will make sure that the array of exactly correct size is supplied as an argument. In fact, in many cases people use this technique implicitly, without even realizing it, hiding the array type behind a typedef name
typedef int Vector3d[3];
void transform(Vector3d *vector);
/* equivalent to `void transform(int (*vector)[3])` */
...
Vector3d vec;
...
transform(&vec);
Note additionally that the above code is invariant with relation to Vector3d type being an array or a struct. You can switch the definition of Vector3d at any time from an array to a struct and back, and you won't have to change the function declaration. In either case the functions will receive an aggregate object "by reference" (there are exceptions to this, but within the context of this discussion this is true).
However, you won't see this method of array passing used explicitly too often, simply because too many people get confused by a rather convoluted syntax and are simply not comfortable enough with such features of C language to use them properly. For this reason, in average real life, passing an array as a pointer to its first element is a more popular approach. It just looks "simpler".
But in reality, using the pointer to the first element for array passing is a very niche technique, a trick, which serves a very specific purpose: its one and only purpose is to facilitate passing arrays of different size (i.e. run-time size). If you really need to be able to process arrays of run-time size, then the proper way to pass such an array is by a pointer to its first element with the concrete size supplied by an additional parameter
void foo(char p[], unsigned plen);
Actually, in many cases it is very useful to be able to process arrays of run-time size, which also contributes to the popularity of the method. Many C developers simply never encounter (or never recognize) the need to process a fixed-size array, thus remaining oblivious to the proper fixed-size technique.
Nevertheless, if the array size is fixed, passing it as a pointer to an element
void foo(char p[])
is a major technique-level error, which unfortunately is rather widespread these days. A pointer-to-array technique is a much better approach in such cases.
Another reason that might hinder the adoption of the fixed-size array passing technique is the dominance of naive approach to typing of dynamically allocated arrays. For example, if the program calls for fixed arrays of type char[10] (as in your example), an average developer will malloc such arrays as
char *p = malloc(10 * sizeof *p);
This array cannot be passed to a function declared as
void foo(char (*p)[10]);
which confuses the average developer and makes them abandon the fixed-size parameter declaration without giving it a further thought. In reality though, the root of the problem lies in the naive malloc approach. The malloc format shown above should be reserved for arrays of run-time size. If the array type has compile-time size, a better way to malloc it would look as follows
char (*p)[10] = malloc(sizeof *p);
This, of course, can be easily passed to the above declared foo
foo(p);
and the compiler will perform the proper type checking. But again, this is overly confusing to an unprepared C developer, which is why you won't see it in too often in the "typical" average everyday code.
I would like to add to AndreyT's answer (in case anyone stumbles upon this page looking for more info on this topic):
As I begin to play more with these declarations, I realize that there is major handicap associated with them in C (apparently not in C++). It is fairly common to have a situation where you would like to give a caller a const pointer to a buffer you have written into. Unfortunately, this is not possible when declaring a pointer like this in C. In other words, the C standard (6.7.3 - Paragraph 8) is at odds with something like this:
int array[9];
const int (* p2)[9] = &array; /* Not legal unless array is const as well */
This constraint does not seem to be present in C++, making these type of declarations far more useful. But in the case of C, it is necessary to fall back to a regular pointer declaration whenever you want a const pointer to the fixed size buffer (unless the buffer itself was declared const to begin with). You can find more info in this mail thread: link text
This is a severe constraint in my opinion and it could be one of the main reasons why people do not usually declare pointers like this in C. The other being the fact that most people do not even know that you can declare a pointer like this as AndreyT has pointed out.
The obvious reason is that this code doesn't compile:
extern void foo(char (*p)[10]);
void bar() {
char p[10];
foo(p);
}
The default promotion of an array is to an unqualified pointer.
Also see this question, using foo(&p) should work.
I also want to use this syntax to enable more type checking.
But I also agree that the syntax and mental model of using pointers is simpler, and easier to remember.
Here are some more obstacles I have come across.
Accessing the array requires using (*p)[]:
void foo(char (*p)[10])
{
char c = (*p)[3];
(*p)[0] = 1;
}
It is tempting to use a local pointer-to-char instead:
void foo(char (*p)[10])
{
char *cp = (char *)p;
char c = cp[3];
cp[0] = 1;
}
But this would partially defeat the purpose of using the correct type.
One has to remember to use the address-of operator when assigning an array's address to a pointer-to-array:
char a[10];
char (*p)[10] = &a;
The address-of operator gets the address of the whole array in &a, with the correct type to assign it to p. Without the operator, a is automatically converted to the address of the first element of the array, same as in &a[0], which has a different type.
Since this automatic conversion is already taking place, I am always puzzled that the & is necessary. It is consistent with the use of & on variables of other types, but I have to remember that an array is special and that I need the & to get the correct type of address, even though the address value is the same.
One reason for my problem may be that I learned K&R C back in the 80s, which did not allow using the & operator on whole arrays yet (although some compilers ignored that or tolerated the syntax). Which, by the way, may be another reason why pointers-to-arrays have a hard time to get adopted: they only work properly since ANSI C, and the & operator limitation may have been another reason to deem them too awkward.
When typedef is not used to create a type for the pointer-to-array (in a common header file), then a global pointer-to-array needs a more complicated extern declaration to share it across files:
fileA:
char (*p)[10];
fileB:
extern char (*p)[10];
Well, simply put, C doesn't do things that way. An array of type T is passed around as a pointer to the first T in the array, and that's all you get.
This allows for some cool and elegant algorithms, such as looping through the array with expressions like
*dst++ = *src++
The downside is that management of the size is up to you. Unfortunately, failure to do this conscientiously has also led to millions of bugs in C coding, and/or opportunities for malevolent exploitation.
What comes close to what you ask in C is to pass around a struct (by value) or a pointer to one (by reference). As long as the same struct type is used on both sides of this operation, both the code that hand out the reference and the code that uses it are in agreement about the size of the data being handled.
Your struct can contain whatever data you want; it could contain your array of a well-defined size.
Still, nothing prevents you or an incompetent or malevolent coder from using casts to fool the compiler into treating your struct as one of a different size. The almost unshackled ability to do this kind of thing is a part of C's design.
You can declare an array of characters a number of ways:
char p[10];
char* p = (char*)malloc(10 * sizeof(char));
The prototype to a function that takes an array by value is:
void foo(char* p); //cannot modify p
or by reference:
void foo(char** p); //can modify p, derefernce by *p[0] = 'f';
or by array syntax:
void foo(char p[]); //same as char*
I would not recommend this solution
typedef int Vector3d[3];
since it obscures the fact that Vector3D has a type that you
must know about. Programmers usually dont expect variables of the
same type to have different sizes. Consider :
void foo(Vector3d a) {
Vector3d b;
}
where sizeof a != sizeof b
Maybe I'm missing something, but... since arrays are constant pointers, basically that means that there's no point in passing around pointers to them.
Couldn't you just use void foo(char p[10], int plen); ?
type (*)[];
// points to an array e.g
int (*ptr)[5];
// points to an 5 integer array
// gets the address of the array
type *[];
// points to an array of pointers e.g
int* ptr[5]
// point to an array of five integer pointers
// point to 5 adresses.
On my compiler (vs2008) it treats char (*p)[10] as an array of character pointers, as if there was no parentheses, even if I compile as a C file. Is compiler support for this "variable"? If so that is a major reason not to use it.

Resources