Pointer to 2D arrays in C - c

I know there is several questions about that which gives good (and working) solutions, but none IMHO which says clearly what is the best way to achieve this.
So, suppose we have some 2D array :
int tab1[100][280];
We want to make a pointer that points to this 2D array.
To achieve this, we can do :
int (*pointer)[280]; // pointer creation
pointer = tab1; //assignation
pointer[5][12] = 517; // use
int myint = pointer[5][12]; // use
or, alternatively :
int (*pointer)[100][280]; // pointer creation
pointer = &tab1; //assignation
(*pointer)[5][12] = 517; // use
int myint = (*pointer)[5][12]; // use
OK, both seems to work well. Now I would like to know :
what is the best way, the 1st or the 2nd ?
are both equals for the compiler ? (speed, perf...)
is one of these solutions eating more memory than the other ?
what is the more frequently used by developers ?

//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
Using pointer2 or pointer3 produce the same binary except manipulations as ++pointer2 as pointed out by WhozCraig.
I recommend using typedef (producing same binary code as above pointer3)
typedef int myType[100][280];
myType *pointer3;
Note: Since C++11, you can also use keyword using instead of typedef
using myType = int[100][280];
myType *pointer3;
in your example:
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
Note: If the array tab1 is used within a function body => this array will be placed within the call stack memory. But the stack size is limited. Using arrays bigger than the free memory stack produces a stack overflow crash.
The full snippet is online-compilable at gcc.godbolt.org
int main()
{
//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
static_assert( sizeof(pointer1) == 2240, "" );
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
static_assert( sizeof(pointer2) == 8, "" );
static_assert( sizeof(pointer3) == 8, "" );
// Use 'typedef' (or 'using' if you use a modern C++ compiler)
typedef int myType[100][280];
//using myType = int[100][280];
int tab1[100][280];
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
return myint;
}

Both your examples are equivalent. However, the first one is less obvious and more "hacky", while the second one clearly states your intention.
int (*pointer)[280];
pointer = tab1;
pointer points to an 1D array of 280 integers. In your assignment, you actually assign the first row of tab1. This works since you can implicitly cast arrays to pointers (to the first element).
When you are using pointer[5][12], C treats pointer as an array of arrays (pointer[5] is of type int[280]), so there is another implicit cast here (at least semantically).
In your second example, you explicitly create a pointer to a 2D array:
int (*pointer)[100][280];
pointer = &tab1;
The semantics are clearer here: *pointer is a 2D array, so you need to access it using (*pointer)[i][j].
Both solutions use the same amount of memory (1 pointer) and will most likely run equally fast. Under the hood, both pointers will even point to the same memory location (the first element of the tab1 array), and it is possible that your compiler will even generate the same code.
The first solution is "more advanced" since one needs quite a deep understanding on how arrays and pointers work in C to understand what is going on. The second one is more explicit.

int *pointer[280]; //Creates 280 pointers of type int.
In 32 bit os, 4 bytes for each pointer. so 4 * 280 = 1120 bytes.
int (*pointer)[100][280]; // Creates only one pointer which is used to point an array of [100][280] ints.
Here only 4 bytes.
Coming to your question, int (*pointer)[280]; and int (*pointer)[100][280]; are different though it points to same 2D array of [100][280].
Because if int (*pointer)[280]; is incremented, then it will points to next 1D array, but where as int (*pointer)[100][280]; crosses the whole 2D array and points to next byte. Accessing that byte may cause problem if that memory doen't belongs to your process.

Ok, this is actually four different question. I'll address them one by one:
are both equals for the compiler? (speed, perf...)
Yes. The pointer dereferenciation and decay from type int (*)[100][280] to int (*)[280] is always a noop to your CPU. I wouldn't put it past a bad compiler to generate bogus code anyways, but a good optimizing compiler should compile both examples to the exact same code.
is one of these solutions eating more memory than the other?
As a corollary to my first answer, no.
what is the more frequently used by developers?
Definitely the variant without the extra (*pointer) dereferenciation. For C programmers it is second nature to assume that any pointer may actually be a pointer to the first element of an array.
what is the best way, the 1st or the 2nd?
That depends on what you optimize for:
Idiomatic code uses variant 1. The declaration is missing the outer dimension, but all uses are exactly as a C programmer expects them to be.
If you want to make it explicit that you are pointing to an array, you can use variant 2. However, many seasoned C programmers will think that there's a third dimension hidden behind the innermost *. Having no array dimension there will feel weird to most programmers.

Related

is it possible to have a union of arrays in c

I wish to have a type which can be used as two different array structures - depending on context. They are not to be used interchangeably whilst the program is executing, rather when the program is executed with a particular start-up flag the type will be addressed as one of the array types
(for example):
array1[2][100]
or
array2[200];
I am not interested in how the data is organised (well I am but it is not relevant to what I wish to achieve)
union m_arrays
{
uint16_t array1[2][100];
uint16_t array2[200];
};
or do I have to use a pointer and alloc it at runtime?
uint16_t * array;
array = malloc(200 * sizeof(uint16_t));
uint16_t m_value =100;
*(array + 199) = m_value;
//equivalent uint16_t array1[1][99] == *(array + 199);
//equivalent uint16_t array2[199] == *(array + 199);
I haven't tried anything as yet
A union as itself contains either of its members. That is, only one member can be "bound" at a time (this is just an abstraction, since C has no notion about which member is "active").
In general, the effective size of that union will be the higher size on bytes of its members.
Let me give an example:
#include <stdio.h>
typedef union m_arrays
{
int array1[2][100];
int array2[400];
} a;
int main()
{
printf("%zu", sizeof(a));
return 0;
}
In this example, this would print 1600 (assuming int is 4 bytes long, but at the end it will depend on the architecture) and is the highest size in bytes. So, YES, you can have a union of arrays in C
Yes, this does work, and it's actually precisely because of how arrays are different from pointers. I'm sure you've heard that arrays in C are really just pointers, but the truth is that there are some important differences.
First, an array always points to somewhere on the stack. You can't use malloc to make an array because malloc returns a heap address. A pointer can point anywhere, you can even set it to an arbitrary integer if you want (though there's no guaruntee you can access that memory that it points to).
Second, because arrays are fixed length, the compiler can and does allocate them for you when you declare them. Importantly, this comes with the guaruntee that the whole array is in one continuous memory block. So if you declare int arr[2][100], you'll have 200 int slots allocated in a row on the stack. That means you can treat any multimensional array as a single-dimensional array if you want to, e.g. instead of arr[y][x] you could do arr[0][y*100+x]. You could also do something like int* arr2 = arr and then treat arr2 as a regular array even though arr is technically an int** (you'll get a warning for doing either of these things, my point is that you can do them because of how arrays are made).
The third, and probably most important difference, is a consequence of the second. When you have an array in a struct or union, the struct/union isn't just holding a pointer to the first element. It holds the entire array. This is often used for copying arrays or returning them from functions. What this means for you is that what you want to do works despite what someone who's heard that arrays are pointers might initially think. If arrays were just an address and they were initialized by allocating at that address, there would be two different arrays initialized at two different places, and having the pointers to them in a union would mean one gets overwritten and now you have an array somewhere that you can't access.
So when this all comes together, your union of arrays basically has one array with two different ways of accessing the data (which is what you want if I'm not mistaken). A little example:
#include <stdio.h>
int main(void) {
union {
int arr1[4];
int arr2[2][2];
} u;
u.arr1[0] = 1;
u.arr1[1] = 2;
u.arr1[2] = 3;
u.arr1[3] = 4;
printf("%d %d\n%d %d\n", u.arr2[0][0], u.arr2[0][1], u.arr2[1][0], u.arr2[1][1]);
return 0;
}
Output:
1 2
3 4
We can also quickly walk through why this wouldn't work with pure pointers. Let's say we instead had a union like this:
union {
int* arr1;
int** arr2;
} u;
Then we might initialize with u.arr1 = (int*) malloc(4 * sizeof (int));. Then we could use arr1 like a normal array. But what happens when we try to use arr2? Well, arr2[y][x] is of course syntactic sugar for *(*(arr2+y)+x)). Once it's dereferenced that first time, we now have an int, since the address points to an int. So when we add x to that int and try to dereference again, we're trying to dereference an int. C will try to do it, and if you're very unlucky it will succeed; I say unlucky because then you'll be messing with arbitrary memory. What's more likely is a segfault because whatever int is there is most likely not an address your program has access to.

Misunderstanding in particular user case of pointers and double-pointers

I'm dealing with pointers, double-pointers and arrays, and I think I'm messing up a bit my mind. I've been reading about it, but my particular user-case is messing me up, and I'd appreciate if someone could clear a bit my mind. This is a small piece of code I've built to show my misunderstanding:
#include <stdio.h>
#include <stdint.h>
void fnFindValue_vo(uint8_t *vF_pu8Msg, uint8_t vF_u8Length, uint8_t **vF_ppu8Match, uint8_t vF_u8Value)
{
for(int i=0; i<vF_u8Length; i++)
{
if(vF_u8Value == vF_pu8Msg[i])
{
*vF_ppu8Match = &vF_pu8Msg[i];
break;
}
}
}
int main()
{
uint8_t u8Array[]={0,0,0,1,2,4,8,16,32,64};
uint8_t *pu8Reference = &u8Array[3];
/*
* Purpose: Find the index of a value in u8Array from a reference
* Reference: First non-zero value
* Condition: using the function with those input arguments
*/
// WAY 1
uint8_t *pu8P2 = &u8Array[0];
uint8_t **ppu8P2 = &pu8P2;
fnFindValue_vo(u8Array,10,ppu8P2,16); // Should be diff=4
uint8_t u8Diff1 = *ppu8P2 - pu8Reference;
printf("Diff1: %u\n", u8Diff1);
// WAY 2
uint8_t* ppu8Pos; // Why this does not need to be initialized and ppu8P2 yes
fnFindValue_vo(u8Array,10,&ppu8Pos,64); // Should be diff=6
uint8_t u8Diff2 = ppu8Pos - pu8Reference;
printf("Diff2: %u\n", u8Diff2);
}
Suppose the function fnFindValue_vo and its arguments cannot be changed. So my purpose is to find the relative index of a value in the array taking as reference the first non-zero value (no need to find it, can be hard-coded).
In the first way, I've done it following my logic and understanding of the pointers. So I have *pu8P2 that contains the address of the first member of u8Array, and **ppu8P2 containing the address of pu8P2. So after calling the funcion, I just need to substract the pointers 'pointing' to u8Array to get the relative index.
Anyway, I tried another method. I just created a pointer, and passed it's address, without initializing the pointer, to the funcion. So later I just need to substract those two pointers and I get also the relative index.
My confusion comes with this second method.
Why ppu8Pos does not have to be initialized, and ppu8P2 yes? I.e. Why couldn't I declare it as uint8_t **ppu8P2;? (it gives me Segmentation fault).
Which of the two methods is more practical/better practice for coding?
Why is it possible to give the address to a pointer when the function's argument is a double pointer?
Why ppu8Pos does not have to be initialized, and ppu8P2 yes
You are not using the value of ppu8Pos right away. Instead, you pass its address to another function, where it gets assigned by-reference. On the other hand, ppu8P2 is the address of ppu8Pos you pass to another function, where its value is used, so you need to initialise it.
Which of the two methods is more practical/better practice for coding
They are identical for all intents and purposes, for exactly the same reason these two fragments are identical:
// 1
double t = sin(x)/cos(x);
// 2
double s = sin(x), c = cos(x);
double t = s/c;
In one case, you use a variable initialised to a value. In the other case, you use a value directly. The type of the value doesn't really matter. It could be a double, or a pointer, or a pointer to a pointer.
Why is it possible to give the address to a pointer when the function's argument is a double pointer?
These two things you mention, an address to a pointer and a double pointer, are one and the same thing. They are not two very similar things, or virtually indistinguishable, or any weak formulation like that. No, the two wordings mean exactly the same, to all digits after the decimal point.
The address of a pointer (like e.g. &pu8P2) is a pointer to a pointer.
The result of &pu8P2 is a pointer to the variable pu8P2.
And since pu8P2 is of the type uint8_t * then a pointer to such a type must be uint8_t **.
Regarding ppu8Pos, it doesn't need to be initialized, because that happens in the fnFindValue_vo function with the assignment *vF_ppu8Match = &vF_pu8Msg[i].
But there is a trap here: If the condition vF_u8Value == vF_pu8Msg[i] is never true then the assignment never happens and ppu8Pos will remain uninitialized. So that initialization of ppu8Pos is really needed after all.
The "practicality" of each solution is more an issue of personal opinion I believe, so I leave that unanswered.
For starters the function fnFindValue_vo can be a reason of undefined behavior because it does not set the pointer *vF_ppu8Match in case when the target value is not found in the array.
Also it is very strange that the size of the array is specified by an object of the type uint8_t. This does not make a sense.
The function should be declared at least the following way
void fnFindValue_vo( const uint8_t *vF_pu8Msg, size_t vF_u8Length, uint8_t **vF_ppu8Match, uint8_t vF_u8Value )
{
const uint8_t *p = vF_pu8Msg;
while ( p != vF_pu8Msg + vF_u8Length && *p != vF_u8Value ) ++p;
*vF_ppu8Match = ( uint8_t * )p;
}
The difference between the two approaches used in your question is that in the first code snippet if the target element will not be found then the pointer will still point to the first element of the array
uint8_t *pu8P2 = &u8Array[0];
And this expression
uint8_t u8Diff1 = *ppu8P2 - pu8Reference;
will yield some confusing positive value (due to the type uint8_t) because the difference *ppu8P2 - pu8Reference be negative.
In the second code snippet in this case you will get undefined behavior due to this statement
uint8_t u8Diff2 = ppu8Pos - pu8Reference;
because the pointer ppu8Pos was not initialized.
Honestly, not trying to understand your code completely, but my advice is do not overcomplicate it.
I would start with one fact which helped me untangle:
if you have int a[10]; then a is a pointer, in fact
int x = a[2] is exactly the same like int x = *(a+2) - you can try it.
So let's have
int a[10]; //this is an array
//a is a pointer to the begging of the array
a[2] is an int type and it is the third value in that array stored at memory location a plus size of two ints;
&a[2] is a pointer to that third value
*(a) is the first value in the array a
*(a+1) is the same as a[1] and it is the second int value in array a
and finally
**a is the same as *(*a) which means: *a is take the first int value in the array a (the same as above) and the second asterisk means "and take that int and pretend it is a pointer and take the value from the that location" - which is most likely a garbage.
https://stackoverflow.com/questions/42118190/dereferencing-a-double-pointer
Only when you have a[5][5]; then a[0] would be still a pointer to the first row and a[1] would be a pointer to the second row and **(a) would then be the same as a[0][0].
https://beginnersbook.com/2014/01/2d-arrays-in-c-example/
Drawing it on paper as suggested in comments helps, but what helped me a lot is to learn using debugger and break points. Put a breakpoint at the first line and then go trough the program step by step. In the "watches" put all variants like
pu8P2,&pu8P2,*pu8P2,**pu8P2 and see what is going on.

Better way of declaring an array?

I'm writing in C and compiling with GCC.
is there a better way of declaring points. I was surprised to see that points was an array. Is there some way of declaring points so it looks more like an array.
typedef struct Span
{
unsigned long lo;
unsigned long hi;
} Span;
typedef struct Series
{
unsigned long *points;
unsigned long count;
unsigned long limit;
} Series;
void SetSpanSeries(Series *self, const Span *src)
{
unsigned long *points;
if (src->lo < src->hi )
{
// Overlays second item in series.
points = self->points; // a pointer in self structure
points[0] = src->lo;
points[1] = src->hi;
self->count = 1;
}
}
Now lets say that points points to a structure that is an array.
typedef struct Span
{
unsigned long lo;
unsigned long hi;
} Span;
span *points[4];
now how do I write these lines of code? Did I get this right?
points = self->points; // a pointer in self structure
points[0].lo = src->lo;
points[0].hi = src->hi;
With the declaration unsigned long *points, points is a pointer. It points to the beginning of an array. arr[x] is the same as *(arr + x), so whether arr is an array (in which case, it takes the address of the array, adds x, and dereferences the 'pointer') or a pointer (in which case, it takes the pointer value, adds x, and dereferences the pointer), arr[0] still gets the same array access.
In this case, you can't declare points as an array because you're not using it as an array - you're using it as a pointer, which points to an array. A pointer is a shallow copy - if you change the data pointed to by a pointer, it changes the original data. To create a regular array, you'd need to do a deep copy, which would prevent your changes in pointer from affecting the array self, which is ultimately what you want.
In fact, you could rewrite the whole thing without points:
void SetSpanSeries(Series *self, const Span *src)
{
if (src->lo < src->hi )
{
self->points[0] = src->lo;
self->points[1] = src->hi;
self->count = 1;
}
}
As to your second example, yes, points[0].lo is correct. points->lo would also be correct, so long as you're only accessing points[0]. (Or self->points[0].lo if you take out points entirely.)
The ability to treat a pointer as an array definitely confuses most C beginners. Arrays even decay to pointers when passed as arguments to functions, giving the impression that arrays and pointers are completely interchangeable -- they aren't. An excellent description is in Expert C Programming: Deep C Secrets. (This is one of my favorite books; it's strongly recommended if you intend to understand C.)
Anyway, writing pointer[2] is the same as *(pointer+2) -- the array syntax is far easier for most people to read (and write).
Since you are using this *points variable to provide easier access to another block of memory (the pointer points in the struct Series), you cannot use an array for your local variable because you cannot re-assign the base of an array to something else. Consider the following illegal code:
int foo[10];
int *bar;
int wrong[10];
bar = foo; /* fine */
wrong = foo; /* compile error -- cannot assign to the array 'wrong' */
Another option for re-writing this code is to remove the temporary variable:
if (src->lo < src->hi) {
self->points[0] = src->lo;
self->points[1] = src->hi;
self->count = 1;
}
I'm not sure the temporary variable helps with legibility -- it just saved typing a few characters at the expense of adding a lot of characters. (And a confusing variable, too.)
In the middle section you say points is an array 4 of pointer to struct span. In the third section you are assigning points from self->points (meaning the previous value of points, that array, has been lost). You then dereference points as if it were an array of struct Span and not an array of pointers to struct Span.
In other works, this cannot compile because you are mixing types and even if you were not, you are overwriting the memory allocated by your definition of the points variable.
Providing the definition of Series might help explain what is going on.
But certainly in the first example, points should probably be a Span *points but without seeing Series we cannot tell for sure.

Increasing The Size of Memory Allocated to a Struct via Malloc

I just learned that it's possible to increase the size of the memory you'll allocate to a struct when using the malloc function. For example, you can have a struct like this:
struct test{
char a;
int v[1];
char b;
};
Which clearly has space for only 2 chars and 1 int (pointer to an int in reality, but anyway). But you could call malloc in such a way to make the struct holds 2 chars and as many ints as you wanted (let's say 10):
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test)+sizeof(int)*9);
ptr->v[9]=50;
printf("%d\n",ptr->v[9]);
return 0;
}
The output here would be "50" printed on the screen, meaning that the array inside the struct was holding up to 10 ints.
My questions for the experienced C programmers out there:
What is happening behind the scenes here? Does the computer allocate 2+4 (2 chars + pointer to int) bytes for the standard "struct test", and then 4*9 more bytes of memory and let the pointer "ptr" put whatever kind of data it wants on those extra bytes?
Does this trick only works when there is an array inside the struct?
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
...Which clearly has space for only 2 chars and 1 int (pointer to an
int in reality, but anyway)...
Already incorrect. Arrays are not pointers. Your struct holds space for 2 chars and 1 int. There's no pointer of any kind there. What you have declared is essentially equivalent to
struct test {
char a;
int v;
char b;
};
There's not much difference between an array of 1 element and an ordinary variable (there's conceptual difference only, i.e. syntactic sugar).
...But you could call malloc in such a way to make it hold 1 char and as
many ints as you wanted (let's say 10)...
Er... If you want it to hold 1 char, why did you declare your struct with 2 chars???
Anyway, in order to implement an array of flexible size as a member of a struct you have to place your array at the very end of the struct.
struct test {
char a;
char b;
int v[1];
};
Then you can allocate memory for your struct with some "extra" memory for the array at the end
struct test *ptr = malloc(offsetof(struct test, v) + sizeof(int) * 10);
(Note how offsetof is used to calculate the proper size).
That way it will work, giving you an array of size 10 and 2 chars in the struct (as declared). It is called "struct hack" and it depends critically on the array being the very last member of the struct.
C99 version of C language introduced dedicated support for "struct hack". In C99 it can be done as
struct test {
char a;
char b;
int v[];
};
...
struct test *ptr = malloc(sizeof(struct test) + sizeof(int) * 10);
What is happening behind the scenes here? Does the computer allocate
2+4 (2 chars + pointer to int) bytes for the standard "struct test",
and then 4*9 more bytes of memory and let the pointer "ptr" put
whatever kind of data it wants on those extra bytes?
malloc allocates as much memory as you ask it to allocate. It is just a single flat block of raw memory. Nothing else happens "behind the scenes". There's no "pointer to int" of any kind in your struct, so any questions that involve "pointer to int" make no sense at all.
Does this trick only works when there is an array inside the struct?
Well, that's the whole point: to access the extra memory as if it belongs to an array declared as the last member of the struct.
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
It doesn't manage anything. If the array is not the last member of the struct, then trying to work with the extra elements of the array will trash the members of the struct that declared after the array. This is pretty useless, which is why the "flexible" array has to be the last member.
No, that does not work. You can't change the immutable size of a struct (which is a compile-time allocation, after all) by using malloc ( ) at run time. But you can allocate a memory block, or change its size, such that it holds more than one struct:
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test) * 9);
}
That's just about all you can do with malloc ( ) in this context.
In addition to what others have told you (summary: arrays are not pointers, pointers are not arrays, read section 6 of the comp.lang.c FAQ), attempting to access array elements past the last element invokes undefined behavior.
Let's look at an example that doesn't involve dynamic allocation:
struct foo {
int arr1[1];
int arr2[1000];
};
struct foo obj;
The language guarantees that obj.arr1 will be allocated starting at offset 0, and that the offset of obj.arr2 will be sizeof (int) or more (the compiler may insert padding between struct members and after the last member, but not before the first one). So we know that there's enough room in obj for multiple int objects immediately following obj.arr1. That means that if you write obj.arr1[5] = 42, and then later access obj.arr[5], you'll probably get back the value 42 that you stored there (and you'll probably have clobbered obj.arr2[4]).
The C language doesn't require array bounds checking, but it makes the behavior of accessing an array outside its declared bounds undefined. Anything could happen -- including having the code quietly behave just the way you want it to. In fact, C permits array bounds checking; it just doesn't provide a way to handle errors, and most compilers don't implement it.
For an example like this, you're most likely to run into visible problems in the presence of optimization. A compiler (particularly an optimizing compiler) is permitted to assume that your program's behavior is well-defined, and to rearrange the generated code to take advantage of that assumption. If you write
int index = 5;
obj.arr1[index] = 42;
the compiler is permitted to assume that the index operation doesn't go outside the declared bounds of the array. As Henry Spencer wrote, "If you lie to the compiler, it will get its revenge".
Strictly speaking, the struct hack probably involves undefined behavior (which is why C99 added a well-defined version of it), but it's been so widely used that most or all compilers will support it. This is covered in question 2.6 of the comp.lang.c FAQ.

sizeof array clarification

I am studying for a final tomorrow in C, and have a question regarding the sizeof operator.
Let's say the size of an int is 32 bits and a pointer is 64 bits.
If there were a function:
int
foo (int zap[])
{
int a = sizeof(zap);
return a;
}
Because zap is a pointer, foo would return 8, as that's how many bytes are needed to store this particular pointer. However, with the following code:
int zip[] = { 0, 1, 2, 3, 4, 5 };
int i = sizeof(zip);
i would be 6 * sizeof(int) = 6 * 4 = 24
Why is it that sizeof(zip) returns the number of elements times the size of each element, whereas sizeof(zap) returns the size of a pointer? Is it that the size of zap is unspecified, and zip is not? The compiler knows that zip is 6 elements, but doesn't have a clue as to how large zap may be.
This is sort of an asymmetry in the C syntax. In C it's not possible to pass an array to a function, so when you use the array syntax in a function declaration for one of the parameters the compiler instead reads it as a pointer.
In C in most cases when you use an array in an expression the array is implicitly converted to a pointer to its first element and that is exactly what happens for example when you call a function. In the following code:
int bar[] = {1,2,3,4};
foo(bar);
the array is converted to a pointer to the first element and that is what the function receives.
This rule of implict conversion is not however always applied. As you discovered for example the sizeof operator works on the array, and even & (address-of) operator works on the original array (i.e. sizeof(*&bar) == 4*sizeof(int)).
A function in C cannot recevive an array as parameter, it can only receive a pointer to the first element, or a pointer to an array... or you must wrap the array in a structure.
Even if you put a number between the brackets in the function declaration...
void foo(int x[4])
{
...
}
that number is completely ignored by the compiler... that declaration for the compiler is totally equivalent to
void foo(int *x)
{
...
}
and for example even calling it passing an array with a different size will not trigger any error...
int tooshort[] = {1,2,3};
foo(tooshort); /* Legal, even if probably wrong */
(actually a compiler MAY give a warning, but the code is perfectly legal C and must be accepted if the compiler follows the standard)
If you think that this rule about arrays when in function arguments is strange then I agree, but this is how the C language is defined.
Because zip is an array and the compiler knows its size at compile-time. It just a case of using the same notation for two different things, something quite usual in C.
int
foo (int zap[])
is completely equivalent to
int
foo (int *zap)
The compiler doesn't have any idea how big zap could be (so it leaves the task of finding out to the programmer).
zip is a memory block of 6 * sizeof(int) so it has a size of 24 (on your architecture).
zap (it could be also written as int *zap in your function declaration) however can point to any memory address and the compiler has no way of knowing how much space starting at this (or even containing this) address has been allocated.
The size of zip is known at compile time and the size of zap is not. That is why you are getting the size of a pointer on sizeof(zap) and the size of the array on sizeof(zip).
There are some situations wherearrays decay to pointers. Function calls is one of those.
because it has been statically initialized with 6 elemens.

Resources