I am looking for an explanation to the following statement regarding array declarators in this book.
The concept of composite types (§6.1.2.6) was introduced to provide
for the accretion of information from incomplete declarations, such as
array declarations with missing size, and function declarations with
missing prototype (argument declarations). Type declarators
are therefore said to specify compatible types if they agree
except for the fact that one provides less information of this sort
than the other.
The declaration of 0-length arrays is invalid, under the general
principle of not providing for 0-length objects. The only common use
of this construct has been in the declaration of dynamically allocated
variable-size arrays, such as
struct segment {
short int count;
char c[N];
};
struct segment * new_segment( const int length ) {
struct segment * result;
result = malloc( sizeof segment + (length-N) );
result->count = length;
return result;
}
In such usage, N would be 0 and (length-N) would be written as length.
But this paradigm works just as well, as written, if N is 1.
Specifically I am interested in what is the motivation of this paragraph and to understand that code snippet. Where does the N come from in the new_segment function?
Where does the N come from in the new_segment function?
It is simply a placeholder in the text rather than intended to be an actual N in real code. As we see from this sentence:
In such usage, N would be 0 and (length-N) would be written as length. But this paradigm works just as well, as written, if N is 1.
the text wishes to discuss two declarations of the c member, one with:
struct segment {
short int count;
char c[0];
};
and the other with:
struct segment {
short int count;
char c[1];
};
Writing them out requires more space, and also the following sample code for the new_segment function must be repeated. Further, it might be a bit less clear how the value of N changed new_segment if it were written as two separate instances with different literal constants rather than with N showing where the change occurs (although the affect is minor in any case).
The text is saying it is fairly easy for a programmer to use either 0 or 1 as the array size; it merely requires a minor adjustment when allocating space.
Related
Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.
Sample Program
#include <stdio.h>
int main() {
int arr[0];
return 0;
}
Clarification
I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.
This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];
I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.
Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.
An array cannot have zero size.
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
The above text is true both for a plain array (paragraph 1). For a VLA (variable length array), the behavior is undefined if the expression's value is less than or equal to zero (paragraph 5). This is normative text in the C standard. A compiler is not allowed to implement it differently.
gcc -std=c99 -pedantic gives a warning for the non-VLA case.
As per the standard, it is not allowed.
However it's been current practice in C compilers to treat those declarations as a flexible array member (FAM) declaration:
C99 6.7.2.1, §16: As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
The standard syntax of a FAM is:
struct Array {
size_t size;
int content[];
};
The idea is that you would then allocate it so:
void foo(size_t x) {
Array* array = malloc(sizeof(size_t) + x * sizeof(int));
array->size = x;
for (size_t i = 0; i != x; ++i) {
array->content[i] = 0;
}
}
You might also use it statically (gcc extension):
Array a = { 3, { 1, 2, 3 } };
This is also known as tail-padded structures (this term predates the publication of the C99 Standard) or struct hack (thanks to Joe Wreschnig for pointing it out).
However this syntax was standardized (and the effects guaranteed) only lately in C99. Before a constant size was necessary.
1 was the portable way to go, though it was rather strange.
0 was better at indicating intent, but not legal as far as the Standard was concerned and supported as an extension by some compilers (including gcc).
The tail padding practice, however, relies on the fact that storage is available (careful malloc) so is not suited to stack usage in general.
In Standard C and C++, zero-size array is not allowed..
If you're using GCC, compile it with -pedantic option. It will give warning, saying:
zero.c:3:6: warning: ISO C forbids zero-size array 'a' [-pedantic]
In case of C++, it gives similar warning.
It's totally illegal, and always has been, but a lot of compilers
neglect to signal the error. I'm not sure why you want to do this.
The one use I know of is to trigger a compile time error from a boolean:
char someCondition[ condition ];
If condition is a false, then I get a compile time error. Because
compilers do allow this, however, I've taken to using:
char someCondition[ 2 * condition - 1 ];
This gives a size of either 1 or -1, and I've never found a compiler
which would accept a size of -1.
Another use of zero-length arrays is for making variable-length object (pre-C99). Zero-length arrays are different from flexible arrays which have [] without 0.
Quoted from gcc doc:
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied.
A real-world example is zero-length arrays of struct kdbus_item in kdbus.h (a Linux kernel module).
I'll add that there is a whole page of the online documentation of gcc on this argument.
Some quotes:
Zero-length arrays are allowed in GNU C.
In ISO C90, you would have to give contents a length of 1
and
GCC versions before 3.0 allowed zero-length arrays to be statically initialized, as if they were flexible arrays. In addition to those cases that were useful, it also allowed initializations in situations that would corrupt later data
so you could
int arr[0] = { 1 };
and boom :-)
Zero-size array declarations within structs would be useful if they were allowed, and if the semantics were such that (1) they would force alignment but otherwise not allocate any space, and (2) indexing the array would be considered defined behavior in the case where the resulting pointer would be within the same block of memory as the struct. Such behavior was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets.
The struct hack, as commonly implemented using an array of size 1, is dodgy and I don't think there's any requirement that compilers refrain from breaking it. For example, I would expect that if a compiler sees int a[1], it would be within its rights to regard a[i] as a[0]. If someone tries to work around the alignment issues of the struct hack via something like
typedef struct {
uint32_t size;
uint8_t data[4]; // Use four, to avoid having padding throw off the size of the struct
}
a compiler might get clever and assume the array size really is four:
; As written
foo = myStruct->data[i];
; As interpreted (assuming little-endian hardware)
foo = ((*(uint32_t*)myStruct->data) >> (i << 3)) & 0xFF;
Such an optimization might be reasonable, especially if myStruct->data could be loaded into a register in the same operation as myStruct->size. I know nothing in the standard that would forbid such optimization, though of course it would break any code which might expect to access stuff beyond the fourth element.
Definitely you can't have zero sized arrays by standard, but actually every most popular compiler gives you to do that. So I will try to explain why it can be bad
#include <cstdio>
int main() {
struct A {
A() {
printf("A()\n");
}
~A() {
printf("~A()\n");
}
int empty[0];
};
A vals[3];
}
I am like a human would expect such output:
A()
A()
A()
~A()
~A()
~A()
Clang prints this:
A()
~A()
GCC prints this:
A()
A()
A()
It is totally strange, so it is a good reason not to use empty arrays in C++ if you can.
Also there is extension in GNU C, which gives you to create zero length array in C, but as I understand it right, there should be at least one member in structure prior, or you will get very strange examples as above if you use C++.
In C you can declare a variable that points to an array like this:
int int_arr[4] = {1,2,3,4};
int (*ptr_to_arr)[4] = &int_arr;
Although practically it is the same as just declaring a pointer to int:
int *ptr_to_arr2 = int_arr;
But syntactically it is something different.
Now, how would a function look like, that returns such a pointer to an array (of int e.g.) ?
A declaration of int is int foo;.
A declaration of an array of 4 int is int foo[4];.
A declaration of a pointer to an array of 4 int is int (*foo)[4];.
A declaration of a function returning a pointer to an array of 4 int is int (*foo())[4];. The () may be filled in with parameter declarations.
As already mentioned, the correct syntax is int (*foo(void))[4]; And as you can tell, it is very hard to read.
Questionable solutions:
Use the syntax as C would have you write it. This is in my opinion something you should avoid, since it's incredibly hard to read, to the point where it is completely useless. This should simply be outlawed in your coding standard, just like any sensible coding standard enforces function pointers to be used with a typedef.
Oh so we just typedef this just like when using function pointers? One might get tempted to hide all this goo behind a typedef indeed, but that's problematic as well. And this is since both arrays and pointers are fundamental "building blocks" in C, with a specific syntax that the programmer expects to see whenever dealing with them. And the absensce of that syntax suggests an object that can be addressed, "lvalue accessed" and copied like any other variable. Hiding them behind typedef might in the end create even more confusion than the original syntax.
Take this example:
typedef int(*arr)[4];
...
arr a = create(); // calls malloc etc
...
// somewhere later, lets make a hard copy! (or so we thought)
arr b = a;
...
cleanup(a);
...
print(b); // mysterious crash here
So this "hide behind typedef" system heavily relies on us naming types somethingptr to indicate that it is a pointer. Or lets say... LPWORD... and there it is, "Hungarian notation", the heavily criticized type system of the Windows API.
A slightly more sensible work-around is to return the array through one of the parameters. This isn't exactly pretty either, but at least somewhat easier to read since the strange syntax is centralized to one parameter:
void foo (int(**result)[4])
{
...
*result = &arr;
}
That is: a pointer to a pointer-to-array of int[4].
If one is prepared to throw type safety out the window, then of course void* foo (void) solves all of these problems... but creates new ones. Very easy to read, but now the problem is type safety and uncertainty regarding what the function actually returns. Not good either.
So what to do then, if these versions are all problematic? There are a few perfectly sensible approaches.
Good solutions:
Leave allocation to the caller. This is by far the best method, if you have the option. Your function would become void foo (int arr[4]); which is readable and type safe both.
Old school C. Just return a pointer to the first item in the array and pass the size along separately. This may or may not be acceptable from case to case.
Wrap it in a struct. For example this could be a sensible implementation of some generic array type:
typedef struct
{
size_t size;
int arr[];
} array_t;
array_t* alloc (size_t items)
{
array_t* result = malloc(sizeof *result + sizeof(int[items]));
return result;
}
The typedef keyword can make things a lot clearer/simpler in this case:
int int_arr[4] = { 1,2,3,4 };
typedef int(*arrptr)[4]; // Define a pointer to an array of 4 ints ...
arrptr func(void) // ... and use that for the function return type
{
return &int_arr;
}
Note: As pointed out in the comments and in Lundin's excellent answer, using a typedef to hide/bury a pointer is a practice that is frowned-upon by (most of) the professional C programming community – and for very good reasons. There is a good discussion about it here.
However, although, in your case, you aren't defining an actual function pointer (which is an exception to the 'rule' that most programmers will accept), you are defining a complicated (i.e. difficult to read) function return type. The discussion at the end of the linked post delves into the "too complicated" issue, which is what I would use to justify use of a typedef in a case like yours. But, if you should choose this road, then do so with caution.
I tried to run the following code with a C++ compiler:
#include <iostream>
#include <string>
using namespace std;
int MAX=10;
int list[MAX];
int main()
{
int sum =0;
for (int i = 0; i<=MAX; ++i){
list[i]=i;
}
for (int i = 0; i<=MAX; ++i){
sum=sum+list[i];
}
cout << sum << endl;
}
But received this error:
"integer array bound is not an integer constant before ‘]’ token"
I don't understand why this is an error because I have defined MAX as 10 right before
int list[MAX]
so shouldn't it work?
Appreciate any help
No compiler error message here, just exactly what the error message says. You haven't included a const before your int MAX declaration.
Capital letters and never changing the value of MAX doesn't mean it's a constant.
Note that some compilers accept having a variable (i.e. int MAX = 10; instead of const int MAX = 10; for array initialization. Don't rely on this because it shouldn't occur.
If you want to use a variable to initialize an array, you need to use pointers:
int size;
cin >> size;
int *list = new int[size];
I don't understand why this is an error because I have defined MAX as 10 right before int list[MAX]
You defined MAX as 10 but you didn't define MAX as constant. It's an error because the compiler insists that (in this case) the array bound must be an integer constant.
One way to fix the error is to make MAX constant...
const int MAX = 10;
int list[MAX];
Another way to avoid the error is to move the array off the stack and onto the heap (since the bound of a heap array doesn't have to be constant) …
auto list = new int[MAX];
… however this changes the type of list from int[10] to int * and also forces you to become responsible for managing the life of list by calling delete when appropriate (which can be a non-trivial challenge) …
delete [] list;
Not deleting list correctly can cause memory leaks.
You can avoid the error and avoid responsibility for managing the array by using a unique_ptr …
std::unique_ptr<int[]> list{ new int[MAX] };
However many well regarded authorities would argue that using a container like std::vector or std::array would be a better approach. For example, in Effective Modern C++ Scott Meyers says this...
The existence of std::unique_ptr for arrays should be of only intellectual interest to you, because std::array, std::vector, and std::string are virtually always better data structure choices than raw arrays. About the only situation I can conceive of when a std::unique_ptr would make sense would be when you’re using a C-like API that returns a raw pointer to a heap array that you assume ownership of.
At this point you may be wondering why MAX has to be constant in your original code. I'm not a language lawyer but I believe the short answer is because the C++ Standard says it must be so.
For insight into why the standard imposes that requirement you could read some of the answers to the question Why aren't variable-length arrays part of the C++ standard?
Here is what is it written as rationale for adding the fancy * star syntax for declaring array types inside function prototypes - just for clarification before we get into the question:
A function prototype can have parameters that have variable length
array types (§6.7.5.2) using a special syntax as in
int minimum(int,int [*][*]); This is consistent with other C prototypes where the name
of the parameter need not be specified.
But I'm pretty confident that we can have the same effect by simply using only ordinary arrays with unspecified size like this (here re-writing the function example named minimum given above in the quote with what I believe exactly the same functionality (except for using size_t instead of int as first parameter which isn't that important in the case)):
#include <stdio.h>
int minimum(size_t, int (*)[]);
int (main)()
{
size_t sz;
scanf("%zu", &sz);
int vla[sz];
for(size_t i = 0; i < sz; ++i)
vla[i] = i;
minimum(sizeof(vla) / sizeof(*vla), &vla);
int a[] = { 5, 4, 3, 2, 1, 0 };
minimum(sizeof(a) / sizeof(*a), &a);
}
int minimum(size_t a, int (*b)[a])
{
for(size_t i = 0; i < sizeof(*b) / sizeof(**b); ++i)
printf("%d ", (*b)[i]);
return printf("\n");
}
Because I'm pretty sure that there was some place in the standard stating that 2 arrays are compatible only if their size are equal and no-matter if they are variable or not.
My point is also confirmed by the fact that the minimum definition wouldn't complain for "conflicting types" as it would if some of it's parameters had incompatible types (which I don't think is the case as both of those arrays have size which is unspecified at compile-time - I refer to the second parameter of minimum).
OK besides - can you point me 1 single use-case for [*] that can not be replaced using ordinary unspecified size arrays?
The above code compiles without any warnings using both clang and gcc. It also produces the expected output.
For anyone who doesn't know C (or anyone who thinks that he/she knows it) - function parameter of type array is implicitly transformed to "pointer to its elements type". So this:
int minimum(int,int [*][*]);
Gets adjusted to:
int minimum(int,int (*)[*]);
And then I'm arguing that it could be also written as:
int minimum(int,int (*)[]);
Without any consequences and with the same behavior as the 2 forms above. Thus making the [*] form obsolete.
OK besides - can you point me 1 single use-case for [*] that can not
be replaced using ordinary unspecified size arrays?
This would be the case, when you pass three-dimensional VLA array:
int minimum(size_t, int [*][*][*]);
This can be written as:
int minimum(size_t, int (*)[*][*]);
or even using an array of unspecified size:
int minimum(size_t, int (*)[][*]);
But you have no possibility to omit nor get around of the last indice, thus it has to stay as [*] in a such declaration.
[] can only be used as the leftmost "dimension specifier" of a multidimensional array, whereas [*] can be used anywhere.
In function parameter declarations, the leftmost (only!) [...] is adjusted to (*) anyway, so one could use (*) in that position at the expense of some clarity.
One can omit the dimension in the next-to-leftmost [...], leaving the empty brackets. This will leave the array element type incomplete. This is not a big deal, as one can complete it close to the point of use (e.g. in the function definition).
The next [...] needs a number or * inside which cannot be omitted. These declarations
int foo (int [*][*][*]);
int foo (int (*)[*][*]);
int foo (int (*)[ ][*]);
are all compatible, but there isn't one compatible with them that doesn't specify the third dimension as either * or a number. If the third dimension is indeed variable, * is the only option.
Thus, [*] is necessary at least for dimensions 3 and up.
Why does this work:
#include <sys/types.h>
#include <stdio.h>
#include <stddef.h>
typedef struct x {
int a;
int b[128];
} x_t;
int function(int i)
{
size_t a;
a = offsetof(x_t, b[i]);
return a;
}
int main(int argc, char **argv)
{
printf("%d\n", function(atoi(argv[1])));
}
If I remember the definition of offsetof correctly, it's a compile time construct. Using 'i' as the array index results in a non-constant expression. I don't understand how the compiler can evaluate the expression at compile time.
Why isn't this flagged as an error?
The C standard does not require this to work, but it likely works in some C implementations because offsetof(type, member) expands to something like:
type t; // Declare an object of type "type".
char *start = (char *) &t; // Find starting address of object.
char *p = (char *) &t->member; // Find address of member.
p - start; // Evaluate offset from start to member.
I have separated the above into parts to display the essential logic. The actual implementation of offsetof would be different, possibly using implementation-dependent features, but the core idea is that the address of a fictitious or temporary object would be subtracted from the address of the member within the object, and this results in the offset. It is designed to work for members but, as an unintended effect, it also works (in some C implementations) for elements of arrays in structures.
It works for these elements simply because the construction used to find the address of a member also works to find the address of an element of an array member, and the subtraction of the pointers works in a natural way.
it's a compile time construct
AFAICS, there are no such constraints. All the standard says is:
[C99, 7.17]:
The macro...
offsetof(type, member-designator)
...
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant.
offsetof (type,member)
Return member offset: This macro with functional form returns the offset value in bytes of member member in the data structure or union type type.
http://www.cplusplus.com/reference/cstddef/offsetof/
(C, C++98 and C++11 standards)
I think I understand this now.
The offsetof() macro does not evaluate to a constant, it evaluates to a run-time expression that returns the offset. Thus as long as type.member is valid syntax, the compiler doesn't care what it is. You can use arbitrary expressions for the array index. I had thought it was like sizeof and had to be constant at compile time.
There has been some confusion on what exactly is permitted as a member-designator. Here are two papers I am aware of:
DR 496
Offsetof for Pointers to Members
However, even quite old versions of GCC, clang, and ICC support calculating array elements with dynamic offset. Based on Raymond's blog I guess that MSVC has long supported it too.
I believe it is based out of pragmatism. For those not familiar, the "struct hack" and flexible array members use variable-length data in the last member of a struct:
struct string {
size_t size;
const char data[];
};
This type is often allocated with something like this:
string *string_alloc(size_t size) {
string *s = malloc(offsetof(string, data[size]));
s->size = size;
return s;
}
Admittedly, this latter part is just a theory. It's such a useful optimization that I imagine that initially it was permitted on purpose for such cases, or it was accidentally supported and then found to be useful for exactly such cases.