Global integer array with No dimension - c

What is the concept when we define a global array with no dimension
This shows output as 16.
#include <stdio.h>
#include <stdlib.h>
int arr[];
int main(int argc, char *argv[])
{
arr[1] = 16;
printf("%d\n",arr[1]);
system("PAUSE");
return 0;
}
And even sizeof(arr) doesn't work. Why?

int arr[]; is a tentative definition there.
Clause 6.9.2, paragraph 2 says:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
and example 2 in paragraph 5 of that clause clarifies:
If at the end of the translation unit containing
int i[];
the array i still has incomplete type, the implicit initializer causes it to have one element, which is set to zero on program startup.
So at the end of the translation unit, your array arr has type int[1]. Before the end, its type is incomplete, hence sizeof doesn't work, since in main, the array type is still incomplete.
Accessing arr[1] invokes undefined behaviour, since arr has only one element.

GCC assumes that arr should only have one element. The fact that you can access other elements than arr[0] without segfaulting is just a coincidence. For instance, on my machine I can access arr[1], arr[10] and arr[100] just fine, but arr[1000] causes a segfault. In general, accessing locations outside array bounds causes undefined behavior.

Related

Is it OK to access past the size of a structure via member address, with enough space allocated?

Specifically, is the following code, the line below the marker, OK?
struct S{
int a;
};
#include <stdlib.h>
int main(){
struct S *p;
p = malloc(sizeof(struct S) + 1000);
// This line:
*(&(p->a) + 1) = 0;
}
People have argued here, but no one has given a convincing explanation or reference.
Their arguments are on a slightly different base, yet essentially the same
typedef struct _pack{
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// This line, with similar intention:
strcpy((char*)&(p->c), str);
// ^^^^^^^
The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.
The member p->a is an object of type int. C11 6.5.6p7 says that
7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
Thus
&(p->a)
is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.
Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).
In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.
behavior [...], for which [The C11] Standard imposes no requirements
With the note saying that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.
There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].
Question 16
[...]
Response
For an array of arrays, the permitted pointer arithmetic in
subclause 6.3.6, page 47, lines 12-40 is to be understood by
interpreting the use of the word object as denoting the specific
object determined directly by the pointer's type and value, not other
objects related to that one by contiguity. Therefore, if an expression
exceeds these permissions, the behavior is undefined. For example, the
following code has undefined behavior:
int a[4][5];
a[1][7] = 0; /* undefined */
Some conforming implementations may
choose to diagnose an array bounds violation, while others may
choose to interpret such attempted accesses successfully with the
obvious extended semantics.
(bolded emphasis mine)
There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.
If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:
*(int *)((S *)&(p->a) + 1) = 0;
This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.
The correct way to achieve what you want, is to use an array without a size as the last struct member:
#include <stdlib.h>
typedef struct S {
int foo; //avoid flexible array being the only member
int a[];
} S;
int main(){
S *p = malloc(sizeof(*p) + 2*sizeof(int));
p->a[0] = 0;
p->a[1] = 42; //Perfectly legal.
}
C standard guarantees that
ยง6.7.2.1/15:
[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.

Array declaration at file scope versus function scope [duplicate]

What is the concept when we define a global array with no dimension
This shows output as 16.
#include <stdio.h>
#include <stdlib.h>
int arr[];
int main(int argc, char *argv[])
{
arr[1] = 16;
printf("%d\n",arr[1]);
system("PAUSE");
return 0;
}
And even sizeof(arr) doesn't work. Why?
int arr[]; is a tentative definition there.
Clause 6.9.2, paragraph 2 says:
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
and example 2 in paragraph 5 of that clause clarifies:
If at the end of the translation unit containing
int i[];
the array i still has incomplete type, the implicit initializer causes it to have one element, which is set to zero on program startup.
So at the end of the translation unit, your array arr has type int[1]. Before the end, its type is incomplete, hence sizeof doesn't work, since in main, the array type is still incomplete.
Accessing arr[1] invokes undefined behaviour, since arr has only one element.
GCC assumes that arr should only have one element. The fact that you can access other elements than arr[0] without segfaulting is just a coincidence. For instance, on my machine I can access arr[1], arr[10] and arr[100] just fine, but arr[1000] causes a segfault. In general, accessing locations outside array bounds causes undefined behavior.

Why is output of the following program how r u?

The following code is the basic implementation of the if - else conditional statements -
#include <stdio.h>
#include <stdlib.h>
int x;
int main()
{
if(x)
printf("hi");
else
printf("how r u \n");
return 0;
}
6.9.2 External object definitions
Semantics
1 If the declaration of an identifier for an object has file scope and an initializer, the
declaration is an external definition for the identifier.
2 A declaration of an identifier for an object that has file scope without an initializer, and
without a storage-class specifier or with the storage-class specifier static, constitutes a
tentative definition. If a translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for that identifier, then
the behavior is exactly as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation unit, with an initializer
equal to 0.
So, Global variables are never left uninitialized, they are always initialized to 0. Hence the output how r u.
According to the C standard, your code does not trigger undefined behaviour:
If an object that has static or thread storage duration is not initialized explicitly, then:
- if it has pointer type, it is initialized to a null pointer;
- if it has arithmetic type, it is initialized to (positive or unsigned) zero;
This is taken from the C11 standard, but C89, and C99 both defined this behaviour, too:
If an object that has static storage duration is not initialized explicitly, it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.
Because you're declaring x as a global variable, it has static storage duration, and therefore x is guaranteed to be initialized to 0 (it being an int, it obviously has an arithmetic type).
Your main function, therefore reads like this:
int main(void)//use int main(void), not int main()
{
if(0)//x is 0, 0 is false
printf("hi");
else
printf("how r u \n");//because if (x) is false, this is executed
return 0;
}
That's why the output of your program will be "How r u".
//global
int x;
You have declared 'x' as global variable ... Therefore it's default value is 0 .

Casting pointer to memory buffer to pointer to VLA

in C, I believe the following program is valid: casting a pointer to an allocated memory buffer to an array like this:
#include <stdio.h>
#include <stdlib.h>
#define ARRSIZE 4
int *getPointer(int num){
return malloc(sizeof(int) * num);
}
int main(){
int *pointer = getPointer(ARRSIZE);
int (*arrPointer)[ARRSIZE] = (int(*)[ARRSIZE])pointer;
printf("%d\n", sizeof(*arrPointer) / sizeof((*arrPointer)[0]));
return 0;
}
(this outputs 4).
However, is it safe, in C99, to do this using VLAs?
int arrSize = 4;
int *pointer = getPointer(arrSize);
int (*arrPointer)[arrSize] = (int(*)[arrSize])pointer;
printf("%d\n", sizeof(*arrPointer) / sizeof((*arrPointer)[0]));
return 0;
(also outputs 4).
Is this legit, according to the C99 standard?
It'd be quite strange if it is legit, since this would mean that VLAs effectively enable dynamic type creation, for example, types of the kind type(*)[variable].
Yes, this is legit, and yes, the variably-modified type system is extremely useful. You can use natural array syntax to access a contiguous 2-D array both of whose dimensions were not known until runtime.
It could be called syntactic sugar as there's nothing you can do with these types that you couldn't do without them, but it makes for clean code (in my opinion).
I would say it is valid. The Final version of the C99 standard (cited on Wikipedia) says in paragraph 7.5.2 - Array declarators alinea 5 :
If the size is an expression that is not an integer constant expression: ...
each time it is evaluated it shall have a value greater than zero. The size of each instance
of a variable length array type does not change during its lifetime.
It even explicitely says that it can be used in a sizeof provided the size never changes : Where a size
expression is part of the operand of a sizeof operator and changing the value of the
size expression would not affect the result of the operator, it is unspecified whether or not
the size expression is evaluated.
But the standard also says that this is only allowed in a block scope declarator or a function prototype : An ordinary identifier (as defined in 6.2.3) that has a variably modified type shall have
either block scope and no linkage or function prototype scope. If an identifier is declared
to be an object with static storage duration, it shall not have a variable length array type.
And an example later explains that it cannot be used for member fields, even in a block scope :
...
void fvla(int m, int C[m][m]); // valid: VLA with prototype scope
void fvla(int m, int C[m][m]) // valid: adjusted to auto pointer to VLA
{
typedef int VLA[m][m]; // valid: block scope typedef VLA
struct tag {
int (*y)[n]; // invalid: y not ordinary identifier
int z[n]; // invalid: z not ordinary identifier
};
...

Is Incomplete array type guaranteed to be able to store one element?

Incomplete array types are used in the famous Struct hack and they are allowed since c99 standard. prior to c99 standard these were not allowed. I was looking at the standard and I am unable to conclude:
Are Incomplete array types allowed outside a structure?(All references I found in the standard C99: 6.7.2.1.15 talk about it as the last element in the structure).
So is the following program allowed to compile as per the standard?
int array[];
int main(){return 0;}
Second part of my questions is, If this is allowed is array guaranteed to be able to store atleast one element of they type int.
is the following program allowed to compile as per the standard?
Yes, as per:
(C99, 6.9.2p5) "EXAMPLE 2 If at the end of the translation unit containing
int i[];
the array i still has incomplete type, the implicit initializer
causes it to have one element, which is set to zero on program
startup."
So
int array[];
int main(){return 0;}
is valid and equivalent to:
int array[1];
int main(){return 0;}
Note that it is OK only if array has (like above) external linkage as:
(C99, 6.9.2p3) "If the declaration of an identifier for an object is a tentative definition and has internal linkage, the declared type shall not be an incomplete type."

Resources