I'm preparing some slides for an introductory C class, and I'm trying to present good examples (and motivation) for using pointer arithmetic over array subscripting.
A lot of the examples I see in books are fairly equivalent. For example, many books show how to reverse the case of all values in a string, but with the exception of replacing an a[i] with a *p the code is identical.
I am looking for a good (and short) example with single-dimensional arrays where pointer arithmetic can produce significantly more elegant code. Any ideas?
Getting a pointer again instead of a value:
One usually uses pointer arithmetic when they want to get a pointer again. To get a pointer while using an array index: you are 1) calculating the pointer offset, then 2) getting the value at that memory location, then 3) you have to use & to get the address again. That's more typing and less clean syntax.
Example 1: Let's say you need a pointer to the 512th byte in a buffer
char buffer[1024]
char *p = buffer + 512;
Is cleaner than:
char buffer[1024];
char *p = &buffer[512];
Example 2: More efficient strcat
char buffer[1024];
strcpy(buffer, "hello ");
strcpy(buffer + 6, "world!");
This is cleaner than:
char buffer[1024];
strcpy(buffer, "hello ");
strcpy(&buffer[6], "world!");
Using pointer arithmetic ++ as an iterator:
Incrementing pointers with ++, and decrementing with -- is useful when iterating over each element in an array of elements. It is cleaner than using a separate variable used to keep track of the offset.
Pointer subtraction:
You can use pointer subtraction with pointer arithmetic. This can be useful in some cases to get the element before the one you are pointing to. It can be done with array subscripts too, but it looks really bad and confusing. Especially to a python programmer where a negative subscript is given to index something from the end of the list.
char *my_strcpy(const char *s, char *t) {
char *u = t;
while (*t++ = *s++);
return u;
}
Why would you want to spoil such a beauty with an index? (See K&R, and how they build on up to this style.)There is a reason I used the above signature the way it is. Stop editing without asking for a clarification first. For those who think they know, look up the present signature -- you missed a few restrict qualifications.
Structure alignment testing and the offsetof macro implementation.
Pointer arithmetic may look fancy and "hackerish", but I have never encountered a case it was FASTER than the standard indexing. Just the opposite, I often encountered cases when it slowed the code down by a large factor.
For example, typical sequential looping through an array with a pointer may be less efficient than looping with a classic index on a modern processors, that support SSE extensions. Pointer arithmetic in a loop sufficiently blocks compilers from performing loop vectorization, which can yield typical 2x-4x performance boost. Additionally, using pointers instead of simple integer variables may result in needless memory store operations due to pointer aliasing.
So, generally pointer arithmetic instead of standard indexed access should NEVER be recommended.
iterating through a 2-dimensional array where the position of a datum does not really matter
if you dont use pointers, you would have to keep track of two subscripts
with pointers, you could point to the top of your array, and with a single loop, zip through the whole thing
If you were using an old compiler, or some kind of specialist embedded systems compiler, there might be slight performance differences, but most modern compilers would probably optimize these (tiny) differences out.
The following article might be something you could draw on - depends on the level of your students:
http://geeks.netindonesia.net/blogs/risman/archive/2007/06/25/Pointer-Arithmetic-and-Array-Indexing.aspx
You're asking about C specifically, but C++ builds upon this as well:
Most pointer arithmetic naturally generalizes to the Forward Iterator concept. Walking through memory with *p++ can be used for any sequenced container (linked list, skip list, vector, binary tree, B tree, etc), thanks to operator overloading.
Something fun I hope you never have to deal with: pointers can alias, whereas arrays cannot. Aliasing can cause all sorts of non-ideal code generation, the most common of which is using a pointer as an out parameter to another function. Basically, the compiler cannot assume that the pointer used by the function doesn't alias itself or anything else in that stack frame, so it has to reload the value from the pointer every time it's used. Or rather, to be safe it does.
Often the choice is just one of style - one looks or feels more natural than the other for a particular case.
There is also the argument that using indexes can cause the compiler to have to repeatedly recalculate offsets inside a loop - I'm not sure how often this is the case (other than in non-optimized builds), but I imagine it happens, but it's probably rarely a problem.
One area that I think is important in the long run (which might not apply to an introductory C class - but learn 'em early, I say) is that using pointer arithmetic applies to the idioms used in the C++ STL. If you get them to understand pointer arithmetic and use it, then when they move on to the STL, they'll have a leg up on how to properly use iterators.
#include ctype.h
void skip_spaces( const char **ppsz )
{
const char *psz = *ppsz;
while( isspace(*psz) )
psz++;
*ppsz = psz;
}
void fn(void)
{
char a[]=" Hello World!";
const char *psz = a;
skip_spaces( &psz );
printf("\n%s", psz);
}
Related
Because in C the array length has to be stated when the array is defined, would it be acceptable practice to use the first element as the length, e.g.
int arr[9]={9,0,1,2,3,4,5,6,7};
Then use a function such as this to process the array:
int printarr(int *ARR) {
for (int i=1; i<ARR[0]; i++) {
printf("%d ", ARR[i]);
}
}
I can see no problem with this but would prefer to check with experienced C programmers first. I would be the only one using the code.
Well, it's bad in the sense that you have an array where the elements does not mean the same thing. Storing metadata with the data is not a good thing. Just to extrapolate your idea a little bit. We could use the first element to denote the element size and then the second for the length. Try writing a function utilizing both ;)
It's also worth noting that with this method, you will have problems if the array is bigger than the maximum value an element can hold, which for char arrays is a very significant limitation. Sure, you can solve it by using the two first elements. And you can also use casts if you have floating point arrays. But I can guarantee you that you will run into hard traced bugs due to this. Among other things, endianness could cause a lot of issues.
And it would certainly confuse virtually every seasoned C programmer. This is not really a logical argument against the idea as such, but rather a pragmatic one. Even if this was a good idea (which it is not) you would have to have a long conversation with EVERY programmer who will have anything to do with your code.
A reasonable way of achieving the same thing is using a struct.
struct container {
int *arr;
size_t size;
};
int arr[10];
struct container c = { .arr = arr, .size = sizeof arr/sizeof *arr };
But in any situation where I would use something like above, I would probably NOT use arrays. I would use dynamic allocation instead:
const size_t size = 10;
int *arr = malloc(sizeof *arr * size);
if(!arr) { /* Error handling */ }
struct container c = { .arr = arr, .size = size };
However, do be aware that if you init it this way with a pointer instead of an array, you're in for "interesting" results.
You can also use flexible arrays, as Andreas wrote in his answer
In C you can use flexible array members. That is you can write
struct intarray {
size_t count;
int data[]; // flexible array member needs to be last
};
You allocate with
size_t count = 100;
struct intarray *arr = malloc( sizeof(struct intarray) + sizeof(int)*count );
arr->count = count;
That can be done for all types of data.
It makes the use of C-arrays a bit safer (not as safe as the C++ containers, but safer than plain C arrays).
Unforntunately, C++ does not support this idiom in the standard.
Many C++ compilers provide it as extension though, but it is not guarantueed.
On the other hand this C FLA idiom may be more explicit and perhaps more efficient than C++ containers as it does not use an extra indirection and/or need two allocations (think of new vector<int>).
If you stick to C, I think this is a very explicit and readable way of handling variable length arrays with an integrated size.
The only drawback is that the C++ guys do not like it and prefer C++ containers.
It is not bad (I mean it will not invoke undefined behavior or cause other portability issues) when the elements of array are integers, but instead of writing magic number 9 directly you should have it calculate the length of array to avoid typo.
#include <stdio.h>
int main(void) {
int arr[9]={sizeof(arr)/sizeof(*arr),0,1,2,3,4,5,6,7};
for (int i=1; i<arr[0]; i++) {
printf("%d ", arr[i]);
}
return 0;
}
Only a few datatypes are suitable for that kind of hack. Therefore, I would advise against it, as this will lead to inconsistent implementation styles across different types of arrays.
A similar approach is used very often with character buffers where in the beginning of the buffer there is stored its actual length.
Dynamic memory allocation in C also uses this approach that is the allocated memory is prefixed with an integer that keeps the size of the allocated memory.
However in general with arrays this approach is not suitable. For example a character array can be much larger than the maximum positive value (127) that can be stored in an object of the type char. Moreover it is difficult to pass a sub-array of such an array to a function. Most of functions that designed to deal with arrays will not work in such a case.
A general approach to declare a function that deals with an array is to declare two parameters. The first one has a pointer type that specifies the initial element of an array or sub-array and the second one specifies the number of elements in the array or sub-array.
Also C allows to declare functions that accepts variable length arrays when their sizes can be specified at run-time.
It is suitable in rather limited circumstances. There are better solutions to the problem it solves.
One problem with it is that if it is not universally applied, then you would have a mix of arrays that used the convention and those that didn't - you have no way of telling if an array uses the convention or not. For arrays used to carry strings for example you have to continually pass &arr[1] in calls to the standard string library, or define a new string library that uses "Pascal strings" rather then "ASCIZ string" conventions (such a library would be more efficient as it happens),
In the case of a true array rather then simply a pointer to memory, sizeof(arr) / sizeof(*arr) will yield the number of elements without having to store it in the array in any case.
It only really works for integer type arrays and for char arrays would limit the length to rather short. It is not practical for arrays of other object types or data structures.
A better solution would be to use a structure:
typedef struct
{
size_t length ;
int* data ;
} intarray_t ;
Then:
int data[9] ;
intarray_t array{ sizeof(data) / sizeof(*data), data } ;
Now you have an array object that can be passed to functions and retain the size information and the data member can be accesses directly for use in third-party or standard library interfaces that do not accept the intarray_t. Moreover the type of the data member can be anything.
Obviously NO is the answer.
All programming languages has predefined functions stored along with the variable type. Why not use them??
In your case is more suitable to access count /length method instead of testing the first value.
An if clause sometimes take more time than a predefined function.
On the first look seems ok to store the counter but imagine you will have to update the array. You will have to do 2 operations, one to insert other to update the counter. So 2 operations means 2 variables to be changed.
For statically arrays might be ok to have them counter then the list, but for dinamic ones NO NO NO.
On the other hand please read programming basic concepts and you will find your idea as a bad one, not complying with programming principles.
When I want to pass an array by reference to a function I don't know what to choose.
void myFunction(int* data);
Is there a difference or a best coding way between those two cases:
myFunction(&data[0]);
Or
myFunction(data);
There is no difference. Arrays ("proper" arrays) automatically decay to pointers to their first element.
For example, lets say you have
int my_array[10];
then using plain my_array will automatically decay to a pointer to its first element, which is &my_array[0].
It is this array-to-pointer decay that allows you to use both pointer arithmetic and array indexing for both arrays and pointers. For the array above my_array[i] is exactly equal to *(my_array + i). This equivalence also exists for pointers:
int *my_pointer = my_array; // Make my_pointer point to the first element of my_array
Then my_pointer[i] is also exactly equal to *(my_pointer + i).
For curiosity (and something you should never do in real programs), thanks to the commutative property of addition an expression such as *(my_array + i) will also be equal to *(i + my_array) which is then equal to i[my_array].
An array, when passed as a parameter to a function, automatically decays to a pointer to its first element. So passing either data or &data[0] to the function are exactly equivalent.
From a readability standpoint I would opt for the former. It makes it clear to the reader that the function is potentially operating on the entire array and not just on one element.
Apart from the obvious (data being shorter than &data[0]; therefore easier to write and to read), there's no difference.
Think about what &data[0] means:
It's a pointer to data[0].
And data[0] just means *(data+0), i.e. *data.
A pointer to *data is simply data.
data is a pointer to the beginning of the array.
&data[0] is an address of the first element of an array.
When reading a code, the first option is, for the most people, more readable and i suppose is a way most programmers will and should choose
There isn't any difference as both point to the same starting location of the array i.e. a[0].
I would just use my_function(data) because why make it more confusing than it has to be?
If for some reason you needed to find the memory address of a single element somewhere in the middle of data, then my_function(&data[17]) might possibly be warranted, but there are probably better ways to handle that case too.
In general if you have to manually and specifically pick out single pieces of data like that by hand, you are probably not doing it in a very good way.
There are rare cases where it can makes sense ( like if you are parsing data from some other source and you ALWAYS 100% of the time only care about the 17th byte )... but that's not usually the case.
Consider the following:
As your code evolves and you make changes, you will probably also slightly change data structures. Data[17] might no longer be the magical byte that you need anymore. Now it might be data[18]. If you manually hard coded data[17] in 100 or 1000 different places in your code, you will now have to go manually change them all and hope that it doesn't cause any new bugs. Also... portability issues.
Instead design functions that can find and return whatever data you need from your data structures without needing any hard coded addresses. They will still work ( if designed properly ) as your code evolves and will be 1000 times more portable.
No difference. When coerced into a pointer, an array (data) decays into a pointer to its first element (&data[0]).
Remember that data[0] simply means *(data+0), so &data[0] is equivalent to &*(data+0), which simplifies to data (because &* cancels out).
Demo:
#include <stdio.h>
int main(void) {
int data[2];
printf("%p\n", (void*)data);
printf("%p\n", (void*)&*(data+0));
printf("%p\n", (void*)&data[0]);
return 0;
}
Output:
$ gcc -Wall -Wextra -pedantic a.c -o a && a
0x3c2180f4fa0
0x3c2180f4fa0
0x3c2180f4fa0
I always advice to use a general approach.
Just consider the function
void myFunction( char* data);
where the parameter has the type char * instead of int *.
And now let's assume that you want to pass to the function a string literal.
It can be done either like
myFunction( "Hello" );
or like
myFunction( &"Hello"[0] );
It is evident that the first approach is more clear and readable.
So I prefer to use the first approach.:)
In fact such an expression
&data[i];
is syntactically redundant.. In fact it looks like
&( *( data + i ) )
that is equivalent to just
data + i
When i is equal to 0 then you have
data + 0
that in expressions is equivalent (I do not take into account for example the sizeof operaor) to
data
So use data instead of &data[0].
This question goes out to the C gurus out there:
In C, it is possible to declare a pointer as follows:
char (* p)[10];
.. which basically states that this pointer points to an array of 10 chars. The neat thing about declaring a pointer like this is that you will get a compile time error if you try to assign a pointer of an array of different size to p. It will also give you a compile time error if you try to assign the value of a simple char pointer to p. I tried this with gcc and it seems to work with ANSI, C89 and C99.
It looks to me like declaring a pointer like this would be very useful - particularly, when passing a pointer to a function. Usually, people would write the prototype of such a function like this:
void foo(char * p, int plen);
If you were expecting a buffer of an specific size, you would simply test the value of plen. However, you cannot be guaranteed that the person who passes p to you will really give you plen valid memory locations in that buffer. You have to trust that the person who called this function is doing the right thing. On the other hand:
void foo(char (*p)[10]);
..would force the caller to give you a buffer of the specified size.
This seems very useful but I have never seen a pointer declared like this in any code I have ever ran across.
My question is: Is there any reason why people do not declare pointers like this? Am I not seeing some obvious pitfall?
What you are saying in your post is absolutely correct. I'd say that every C developer comes to exactly the same discovery and to exactly the same conclusion when (if) they reach certain level of proficiency with C language.
When the specifics of your application area call for an array of specific fixed size (array size is a compile-time constant), the only proper way to pass such an array to a function is by using a pointer-to-array parameter
void foo(char (*p)[10]);
(in C++ language this is also done with references
void foo(char (&p)[10]);
).
This will enable language-level type checking, which will make sure that the array of exactly correct size is supplied as an argument. In fact, in many cases people use this technique implicitly, without even realizing it, hiding the array type behind a typedef name
typedef int Vector3d[3];
void transform(Vector3d *vector);
/* equivalent to `void transform(int (*vector)[3])` */
...
Vector3d vec;
...
transform(&vec);
Note additionally that the above code is invariant with relation to Vector3d type being an array or a struct. You can switch the definition of Vector3d at any time from an array to a struct and back, and you won't have to change the function declaration. In either case the functions will receive an aggregate object "by reference" (there are exceptions to this, but within the context of this discussion this is true).
However, you won't see this method of array passing used explicitly too often, simply because too many people get confused by a rather convoluted syntax and are simply not comfortable enough with such features of C language to use them properly. For this reason, in average real life, passing an array as a pointer to its first element is a more popular approach. It just looks "simpler".
But in reality, using the pointer to the first element for array passing is a very niche technique, a trick, which serves a very specific purpose: its one and only purpose is to facilitate passing arrays of different size (i.e. run-time size). If you really need to be able to process arrays of run-time size, then the proper way to pass such an array is by a pointer to its first element with the concrete size supplied by an additional parameter
void foo(char p[], unsigned plen);
Actually, in many cases it is very useful to be able to process arrays of run-time size, which also contributes to the popularity of the method. Many C developers simply never encounter (or never recognize) the need to process a fixed-size array, thus remaining oblivious to the proper fixed-size technique.
Nevertheless, if the array size is fixed, passing it as a pointer to an element
void foo(char p[])
is a major technique-level error, which unfortunately is rather widespread these days. A pointer-to-array technique is a much better approach in such cases.
Another reason that might hinder the adoption of the fixed-size array passing technique is the dominance of naive approach to typing of dynamically allocated arrays. For example, if the program calls for fixed arrays of type char[10] (as in your example), an average developer will malloc such arrays as
char *p = malloc(10 * sizeof *p);
This array cannot be passed to a function declared as
void foo(char (*p)[10]);
which confuses the average developer and makes them abandon the fixed-size parameter declaration without giving it a further thought. In reality though, the root of the problem lies in the naive malloc approach. The malloc format shown above should be reserved for arrays of run-time size. If the array type has compile-time size, a better way to malloc it would look as follows
char (*p)[10] = malloc(sizeof *p);
This, of course, can be easily passed to the above declared foo
foo(p);
and the compiler will perform the proper type checking. But again, this is overly confusing to an unprepared C developer, which is why you won't see it in too often in the "typical" average everyday code.
I would like to add to AndreyT's answer (in case anyone stumbles upon this page looking for more info on this topic):
As I begin to play more with these declarations, I realize that there is major handicap associated with them in C (apparently not in C++). It is fairly common to have a situation where you would like to give a caller a const pointer to a buffer you have written into. Unfortunately, this is not possible when declaring a pointer like this in C. In other words, the C standard (6.7.3 - Paragraph 8) is at odds with something like this:
int array[9];
const int (* p2)[9] = &array; /* Not legal unless array is const as well */
This constraint does not seem to be present in C++, making these type of declarations far more useful. But in the case of C, it is necessary to fall back to a regular pointer declaration whenever you want a const pointer to the fixed size buffer (unless the buffer itself was declared const to begin with). You can find more info in this mail thread: link text
This is a severe constraint in my opinion and it could be one of the main reasons why people do not usually declare pointers like this in C. The other being the fact that most people do not even know that you can declare a pointer like this as AndreyT has pointed out.
The obvious reason is that this code doesn't compile:
extern void foo(char (*p)[10]);
void bar() {
char p[10];
foo(p);
}
The default promotion of an array is to an unqualified pointer.
Also see this question, using foo(&p) should work.
I also want to use this syntax to enable more type checking.
But I also agree that the syntax and mental model of using pointers is simpler, and easier to remember.
Here are some more obstacles I have come across.
Accessing the array requires using (*p)[]:
void foo(char (*p)[10])
{
char c = (*p)[3];
(*p)[0] = 1;
}
It is tempting to use a local pointer-to-char instead:
void foo(char (*p)[10])
{
char *cp = (char *)p;
char c = cp[3];
cp[0] = 1;
}
But this would partially defeat the purpose of using the correct type.
One has to remember to use the address-of operator when assigning an array's address to a pointer-to-array:
char a[10];
char (*p)[10] = &a;
The address-of operator gets the address of the whole array in &a, with the correct type to assign it to p. Without the operator, a is automatically converted to the address of the first element of the array, same as in &a[0], which has a different type.
Since this automatic conversion is already taking place, I am always puzzled that the & is necessary. It is consistent with the use of & on variables of other types, but I have to remember that an array is special and that I need the & to get the correct type of address, even though the address value is the same.
One reason for my problem may be that I learned K&R C back in the 80s, which did not allow using the & operator on whole arrays yet (although some compilers ignored that or tolerated the syntax). Which, by the way, may be another reason why pointers-to-arrays have a hard time to get adopted: they only work properly since ANSI C, and the & operator limitation may have been another reason to deem them too awkward.
When typedef is not used to create a type for the pointer-to-array (in a common header file), then a global pointer-to-array needs a more complicated extern declaration to share it across files:
fileA:
char (*p)[10];
fileB:
extern char (*p)[10];
Well, simply put, C doesn't do things that way. An array of type T is passed around as a pointer to the first T in the array, and that's all you get.
This allows for some cool and elegant algorithms, such as looping through the array with expressions like
*dst++ = *src++
The downside is that management of the size is up to you. Unfortunately, failure to do this conscientiously has also led to millions of bugs in C coding, and/or opportunities for malevolent exploitation.
What comes close to what you ask in C is to pass around a struct (by value) or a pointer to one (by reference). As long as the same struct type is used on both sides of this operation, both the code that hand out the reference and the code that uses it are in agreement about the size of the data being handled.
Your struct can contain whatever data you want; it could contain your array of a well-defined size.
Still, nothing prevents you or an incompetent or malevolent coder from using casts to fool the compiler into treating your struct as one of a different size. The almost unshackled ability to do this kind of thing is a part of C's design.
You can declare an array of characters a number of ways:
char p[10];
char* p = (char*)malloc(10 * sizeof(char));
The prototype to a function that takes an array by value is:
void foo(char* p); //cannot modify p
or by reference:
void foo(char** p); //can modify p, derefernce by *p[0] = 'f';
or by array syntax:
void foo(char p[]); //same as char*
I would not recommend this solution
typedef int Vector3d[3];
since it obscures the fact that Vector3D has a type that you
must know about. Programmers usually dont expect variables of the
same type to have different sizes. Consider :
void foo(Vector3d a) {
Vector3d b;
}
where sizeof a != sizeof b
Maybe I'm missing something, but... since arrays are constant pointers, basically that means that there's no point in passing around pointers to them.
Couldn't you just use void foo(char p[10], int plen); ?
type (*)[];
// points to an array e.g
int (*ptr)[5];
// points to an 5 integer array
// gets the address of the array
type *[];
// points to an array of pointers e.g
int* ptr[5]
// point to an array of five integer pointers
// point to 5 adresses.
On my compiler (vs2008) it treats char (*p)[10] as an array of character pointers, as if there was no parentheses, even if I compile as a C file. Is compiler support for this "variable"? If so that is a major reason not to use it.
I'm really wondering why there's no function in C like strcpy(), memcpy(), etc. that automatically checks the size of the buffer. Something that behaves like this:
#define strcpy2(X, Y) strncpy(X, Y, sizeof(X))
Some people tell me: "Because it's old language." But, C is not a dead language. IOS can fix the standard, and new functions like strncpy have been added.
Others tell me: "It causes performance issues." But, I argue "if a function like that existed, you can still use the old function in situations where performance is important. In all situation, you can use that function and you can expect security improvement."
Still others tell me: "So, there's a function like strncpy()", or "C is designed for professional developer who consider this problem", but strncpy() does not do the check automatically - developers must determine the size of the buffer, and still large programs like Chrome, which are made by professional developers, have buffer overflow vulnerabilities.
I want to know a technical reason why such a function cannot be made.
*English is not my native language. so I guess there are some mistakes... sorry about this. (Edit (cmaster): Should be fixed now. Hope you like the new wording.)
If X is a pointer, and it usually is, then sizeof X tells you nothing about the size of the array to which X points. The size must be passed as a parameter.
To really understand the reason why C functions cannot do what you want, you need to understand about the difference between arrays and pointers, and what it means that an array decays to a pointer. Just to give you an idea what I'm talking about:
int array[7]; //define an array
int* pointer = array; //define a pointer that points to the same memory, array decays into a pointer to the first int
//Now the following two expressions are precisely equivalent, since array decays to a pointer again:
pointer[3];
array[3];
//However, the sizeof of the two is not the same:
assert(sizeof(array) == 7*sizeof(int)); //this is what you used in your define
assert(*pointer == sizeof(int)); //probably not what you expected
//Now the thing gets nasty: Array declarations in function arguments truly decay into pointers!
void foo(int bar[9]) {
assert(sizeof(bar) == sizeof(int)); //I bet, you didn't expect this!
}
//This is, because the definition of foo() is truly equivalent to this definition:
void foo(int* bar) {
assert(sizeof(bar) == sizeof(int));
}
//Transfering this to your #define, this will definitely not do what you want:
void baz(char aBuffer[BUFFER_SIZE], const char* source) {
strcpy2(aBuffer, source); //This will copy only the first four or eight bytes (depending on the size of a pointer on your system), no matter how big you make BUFFER_SIZE!
}
I hope, I enticed you to google for array-pointer-decay now...
The truth is, that the C language relies heavily on the fact that no array size is required to correctly access an array element, only the surrounding loops need to know the size. As such, arrays decay to pure pointers in many places, and once they are decayed, there is no bringing back the size of the array. This brings a great deal of flexibility and simplicity to the language (very easy handling of subarrays!), but it also makes a function that behaves like your #define impossible.
Technical reason is: in C the buffer size cannot be checked automatically, because it is not managed by the language. Functions like strcpy operate on pointers, and though pointers point to buffers, there is no way for strcpy implementation to know how long a buffer is. Your suggestion of using sizeof does not work, since sizeof returns the object size, not the size of the buffer a pointer points to. (In your example it would return always the same number, most probably 4 or 8).
C language makes programmer responsible for managing buffer sizes, so one can use functions like strncpy and pass the buffer size explicitly. But it will never be possible to implement safe version of strcpy in C, since it would require fundamental changes in the way the language treats pointers.
All of it applies to C descendants like C++ of Objective C too.
#include <stdlib.h>
char* x;
if (!asprintf(&x, "%s", y)) {
perror("asprintf");
exit(1);
}
// from here, x will contain the content of y
Under the assumption, that y is Null terminated, this works safely.
(Written a on tablet, so forgive any silly errors, please.)
This question goes out to the C gurus out there:
In C, it is possible to declare a pointer as follows:
char (* p)[10];
.. which basically states that this pointer points to an array of 10 chars. The neat thing about declaring a pointer like this is that you will get a compile time error if you try to assign a pointer of an array of different size to p. It will also give you a compile time error if you try to assign the value of a simple char pointer to p. I tried this with gcc and it seems to work with ANSI, C89 and C99.
It looks to me like declaring a pointer like this would be very useful - particularly, when passing a pointer to a function. Usually, people would write the prototype of such a function like this:
void foo(char * p, int plen);
If you were expecting a buffer of an specific size, you would simply test the value of plen. However, you cannot be guaranteed that the person who passes p to you will really give you plen valid memory locations in that buffer. You have to trust that the person who called this function is doing the right thing. On the other hand:
void foo(char (*p)[10]);
..would force the caller to give you a buffer of the specified size.
This seems very useful but I have never seen a pointer declared like this in any code I have ever ran across.
My question is: Is there any reason why people do not declare pointers like this? Am I not seeing some obvious pitfall?
What you are saying in your post is absolutely correct. I'd say that every C developer comes to exactly the same discovery and to exactly the same conclusion when (if) they reach certain level of proficiency with C language.
When the specifics of your application area call for an array of specific fixed size (array size is a compile-time constant), the only proper way to pass such an array to a function is by using a pointer-to-array parameter
void foo(char (*p)[10]);
(in C++ language this is also done with references
void foo(char (&p)[10]);
).
This will enable language-level type checking, which will make sure that the array of exactly correct size is supplied as an argument. In fact, in many cases people use this technique implicitly, without even realizing it, hiding the array type behind a typedef name
typedef int Vector3d[3];
void transform(Vector3d *vector);
/* equivalent to `void transform(int (*vector)[3])` */
...
Vector3d vec;
...
transform(&vec);
Note additionally that the above code is invariant with relation to Vector3d type being an array or a struct. You can switch the definition of Vector3d at any time from an array to a struct and back, and you won't have to change the function declaration. In either case the functions will receive an aggregate object "by reference" (there are exceptions to this, but within the context of this discussion this is true).
However, you won't see this method of array passing used explicitly too often, simply because too many people get confused by a rather convoluted syntax and are simply not comfortable enough with such features of C language to use them properly. For this reason, in average real life, passing an array as a pointer to its first element is a more popular approach. It just looks "simpler".
But in reality, using the pointer to the first element for array passing is a very niche technique, a trick, which serves a very specific purpose: its one and only purpose is to facilitate passing arrays of different size (i.e. run-time size). If you really need to be able to process arrays of run-time size, then the proper way to pass such an array is by a pointer to its first element with the concrete size supplied by an additional parameter
void foo(char p[], unsigned plen);
Actually, in many cases it is very useful to be able to process arrays of run-time size, which also contributes to the popularity of the method. Many C developers simply never encounter (or never recognize) the need to process a fixed-size array, thus remaining oblivious to the proper fixed-size technique.
Nevertheless, if the array size is fixed, passing it as a pointer to an element
void foo(char p[])
is a major technique-level error, which unfortunately is rather widespread these days. A pointer-to-array technique is a much better approach in such cases.
Another reason that might hinder the adoption of the fixed-size array passing technique is the dominance of naive approach to typing of dynamically allocated arrays. For example, if the program calls for fixed arrays of type char[10] (as in your example), an average developer will malloc such arrays as
char *p = malloc(10 * sizeof *p);
This array cannot be passed to a function declared as
void foo(char (*p)[10]);
which confuses the average developer and makes them abandon the fixed-size parameter declaration without giving it a further thought. In reality though, the root of the problem lies in the naive malloc approach. The malloc format shown above should be reserved for arrays of run-time size. If the array type has compile-time size, a better way to malloc it would look as follows
char (*p)[10] = malloc(sizeof *p);
This, of course, can be easily passed to the above declared foo
foo(p);
and the compiler will perform the proper type checking. But again, this is overly confusing to an unprepared C developer, which is why you won't see it in too often in the "typical" average everyday code.
I would like to add to AndreyT's answer (in case anyone stumbles upon this page looking for more info on this topic):
As I begin to play more with these declarations, I realize that there is major handicap associated with them in C (apparently not in C++). It is fairly common to have a situation where you would like to give a caller a const pointer to a buffer you have written into. Unfortunately, this is not possible when declaring a pointer like this in C. In other words, the C standard (6.7.3 - Paragraph 8) is at odds with something like this:
int array[9];
const int (* p2)[9] = &array; /* Not legal unless array is const as well */
This constraint does not seem to be present in C++, making these type of declarations far more useful. But in the case of C, it is necessary to fall back to a regular pointer declaration whenever you want a const pointer to the fixed size buffer (unless the buffer itself was declared const to begin with). You can find more info in this mail thread: link text
This is a severe constraint in my opinion and it could be one of the main reasons why people do not usually declare pointers like this in C. The other being the fact that most people do not even know that you can declare a pointer like this as AndreyT has pointed out.
The obvious reason is that this code doesn't compile:
extern void foo(char (*p)[10]);
void bar() {
char p[10];
foo(p);
}
The default promotion of an array is to an unqualified pointer.
Also see this question, using foo(&p) should work.
I also want to use this syntax to enable more type checking.
But I also agree that the syntax and mental model of using pointers is simpler, and easier to remember.
Here are some more obstacles I have come across.
Accessing the array requires using (*p)[]:
void foo(char (*p)[10])
{
char c = (*p)[3];
(*p)[0] = 1;
}
It is tempting to use a local pointer-to-char instead:
void foo(char (*p)[10])
{
char *cp = (char *)p;
char c = cp[3];
cp[0] = 1;
}
But this would partially defeat the purpose of using the correct type.
One has to remember to use the address-of operator when assigning an array's address to a pointer-to-array:
char a[10];
char (*p)[10] = &a;
The address-of operator gets the address of the whole array in &a, with the correct type to assign it to p. Without the operator, a is automatically converted to the address of the first element of the array, same as in &a[0], which has a different type.
Since this automatic conversion is already taking place, I am always puzzled that the & is necessary. It is consistent with the use of & on variables of other types, but I have to remember that an array is special and that I need the & to get the correct type of address, even though the address value is the same.
One reason for my problem may be that I learned K&R C back in the 80s, which did not allow using the & operator on whole arrays yet (although some compilers ignored that or tolerated the syntax). Which, by the way, may be another reason why pointers-to-arrays have a hard time to get adopted: they only work properly since ANSI C, and the & operator limitation may have been another reason to deem them too awkward.
When typedef is not used to create a type for the pointer-to-array (in a common header file), then a global pointer-to-array needs a more complicated extern declaration to share it across files:
fileA:
char (*p)[10];
fileB:
extern char (*p)[10];
Well, simply put, C doesn't do things that way. An array of type T is passed around as a pointer to the first T in the array, and that's all you get.
This allows for some cool and elegant algorithms, such as looping through the array with expressions like
*dst++ = *src++
The downside is that management of the size is up to you. Unfortunately, failure to do this conscientiously has also led to millions of bugs in C coding, and/or opportunities for malevolent exploitation.
What comes close to what you ask in C is to pass around a struct (by value) or a pointer to one (by reference). As long as the same struct type is used on both sides of this operation, both the code that hand out the reference and the code that uses it are in agreement about the size of the data being handled.
Your struct can contain whatever data you want; it could contain your array of a well-defined size.
Still, nothing prevents you or an incompetent or malevolent coder from using casts to fool the compiler into treating your struct as one of a different size. The almost unshackled ability to do this kind of thing is a part of C's design.
You can declare an array of characters a number of ways:
char p[10];
char* p = (char*)malloc(10 * sizeof(char));
The prototype to a function that takes an array by value is:
void foo(char* p); //cannot modify p
or by reference:
void foo(char** p); //can modify p, derefernce by *p[0] = 'f';
or by array syntax:
void foo(char p[]); //same as char*
I would not recommend this solution
typedef int Vector3d[3];
since it obscures the fact that Vector3D has a type that you
must know about. Programmers usually dont expect variables of the
same type to have different sizes. Consider :
void foo(Vector3d a) {
Vector3d b;
}
where sizeof a != sizeof b
Maybe I'm missing something, but... since arrays are constant pointers, basically that means that there's no point in passing around pointers to them.
Couldn't you just use void foo(char p[10], int plen); ?
type (*)[];
// points to an array e.g
int (*ptr)[5];
// points to an 5 integer array
// gets the address of the array
type *[];
// points to an array of pointers e.g
int* ptr[5]
// point to an array of five integer pointers
// point to 5 adresses.
On my compiler (vs2008) it treats char (*p)[10] as an array of character pointers, as if there was no parentheses, even if I compile as a C file. Is compiler support for this "variable"? If so that is a major reason not to use it.