Why is this statement producing a linker error with gcc?

Why is this statement producing a linker error with gcc? - c

I have this extremely trivial piece of C code:
static int arr[];
int main(void) {
*arr = 4;
return 0;
}
I understand that the first statement is illegal (I've declared a file-scope array with static storage duration and file linkeage but no specified size), but why is it resulting in a linker error? :
/usr/bin/ld: /tmp/cch9lPwA.o: in function `main':
unit.c:(.text+0xd): undefined reference to `arr'
collect2: error: ld returned 1 exit status
Shouldn't the compiler be able to catch this before the linker?
It is also strange to me that, if I omit the static storage class, the compiler simply assumes array is of length 1 and produces no error beyond that:
int arr[];
int main(void) {
*arr = 4;
return 0;
}
Results in:
unit.c:5:5: warning: array 'arr' assumed to have one element
int arr[];
Why does omitting the storage class result in different behavior here and why does the first piece of code produce a linker error? Thanks.

Empty arrays static int arr[]; and zero-length arrays static int arr[0]; were gcc non-standard extensions.
The intention of these extensions were to act as a fix for the old "struct hack". Back in the C90 days, people wrote code such as this:
typedef struct
{
header stuff;
...
int data[1]; // the "struct hack"
} protocol;
where data would then be used as if it had variable size beyond the array depending on what's in the header part. Such code was buggy, wrote data to padding bytes and invoked array out-of-bounds undefined behavior in general.
gcc fixed this problem by adding empty/zero arrays as a compiler extension, making the code behave without bugs, although it was no longer portable.
The C standard committee recognized that this gcc feature was useful, so they added flexible array members to the C language in 1999. Since then, the gcc feature is to be regarded as obsolete, as using the C standard flexible array member is to prefer.
As recognized by the linked gcc documentation:
Declaring zero-length arrays in other contexts, including as interior members of structure objects or as non-member objects, is discouraged.
And this is what your code does.
Note that gcc with no compiler options passed defaults to -std=gnu90 (gcc < 5.0) or -std=gnu11(gcc > 5.0). This gives you all the non-standard extensions enabled, so the program compiles but does not link.
If you want standard compliant behavior, you must compile as
gcc -std=c11 -pedantic-errors
The -pedantic flag disables gcc extensions, and the linker error switches to a compiler error as expected. For an empty array as in your case, you get:
error: array size missing in 'arr'
And for a zero-length array you get:
error: ISO C forbids zero-size array 'arr' [-Wpedantic]
The reason why int arr[] works, is because this is an array declaration of tentative definition with external linkage (see C17 6.9.2). It is valid C and can be regarded as a forward declaration. It means that elsewhere in the code, the compiler (or rather the linker) should expect to find for example int arr[10], which is then referring to the same variable. This way, arr can be used in the code before the size is known. (I wouldn't recommend using this language feature, as it is a form of "spaghetti programming".)
When you use static you block the possibility to have the array size specified elsewhere, by forcing the variable to have internal linkage instead.

Maybe one reason for this behavior is that the compiler issues a warning resulting in a non-accessed static variable and optimizes it away - the linker will complain!
If it is not static, it cannot simply be ignored, because other modules might reference it - so the linker can at least find that symbol arr.

Related

C: Reading 8 bytes from a region of size 0 [-Wstringop-overread] [duplicate]

Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.
Sample Program
#include <stdio.h>
int main() {
int arr[0];
return 0;
}
Clarification
I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.
This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];
I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.
Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.

An array cannot have zero size.
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
The above text is true both for a plain array (paragraph 1). For a VLA (variable length array), the behavior is undefined if the expression's value is less than or equal to zero (paragraph 5). This is normative text in the C standard. A compiler is not allowed to implement it differently.
gcc -std=c99 -pedantic gives a warning for the non-VLA case.

As per the standard, it is not allowed.
However it's been current practice in C compilers to treat those declarations as a flexible array member (FAM) declaration:
C99 6.7.2.1, §16: As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
The standard syntax of a FAM is:
struct Array {
size_t size;
int content[];
};
The idea is that you would then allocate it so:
void foo(size_t x) {
Array* array = malloc(sizeof(size_t) + x * sizeof(int));
array->size = x;
for (size_t i = 0; i != x; ++i) {
array->content[i] = 0;
}
}
You might also use it statically (gcc extension):
Array a = { 3, { 1, 2, 3 } };
This is also known as tail-padded structures (this term predates the publication of the C99 Standard) or struct hack (thanks to Joe Wreschnig for pointing it out).
However this syntax was standardized (and the effects guaranteed) only lately in C99. Before a constant size was necessary.
1 was the portable way to go, though it was rather strange.
0 was better at indicating intent, but not legal as far as the Standard was concerned and supported as an extension by some compilers (including gcc).
The tail padding practice, however, relies on the fact that storage is available (careful malloc) so is not suited to stack usage in general.

In Standard C and C++, zero-size array is not allowed..
If you're using GCC, compile it with -pedantic option. It will give warning, saying:
zero.c:3:6: warning: ISO C forbids zero-size array 'a' [-pedantic]
In case of C++, it gives similar warning.

It's totally illegal, and always has been, but a lot of compilers
neglect to signal the error. I'm not sure why you want to do this.
The one use I know of is to trigger a compile time error from a boolean:
char someCondition[ condition ];
If condition is a false, then I get a compile time error. Because
compilers do allow this, however, I've taken to using:
char someCondition[ 2 * condition - 1 ];
This gives a size of either 1 or -1, and I've never found a compiler
which would accept a size of -1.

Another use of zero-length arrays is for making variable-length object (pre-C99). Zero-length arrays are different from flexible arrays which have [] without 0.
Quoted from gcc doc:
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied.
A real-world example is zero-length arrays of struct kdbus_item in kdbus.h (a Linux kernel module).

I'll add that there is a whole page of the online documentation of gcc on this argument.
Some quotes:
Zero-length arrays are allowed in GNU C.
In ISO C90, you would have to give contents a length of 1
and
GCC versions before 3.0 allowed zero-length arrays to be statically initialized, as if they were flexible arrays. In addition to those cases that were useful, it also allowed initializations in situations that would corrupt later data
so you could
int arr[0] = { 1 };
and boom :-)

Zero-size array declarations within structs would be useful if they were allowed, and if the semantics were such that (1) they would force alignment but otherwise not allocate any space, and (2) indexing the array would be considered defined behavior in the case where the resulting pointer would be within the same block of memory as the struct. Such behavior was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets.
The struct hack, as commonly implemented using an array of size 1, is dodgy and I don't think there's any requirement that compilers refrain from breaking it. For example, I would expect that if a compiler sees int a[1], it would be within its rights to regard a[i] as a[0]. If someone tries to work around the alignment issues of the struct hack via something like
typedef struct {
uint32_t size;
uint8_t data[4]; // Use four, to avoid having padding throw off the size of the struct
}
a compiler might get clever and assume the array size really is four:
; As written
foo = myStruct->data[i];
; As interpreted (assuming little-endian hardware)
foo = ((*(uint32_t*)myStruct->data) >> (i << 3)) & 0xFF;
Such an optimization might be reasonable, especially if myStruct->data could be loaded into a register in the same operation as myStruct->size. I know nothing in the standard that would forbid such optimization, though of course it would break any code which might expect to access stuff beyond the fourth element.

Definitely you can't have zero sized arrays by standard, but actually every most popular compiler gives you to do that. So I will try to explain why it can be bad
#include <cstdio>
int main() {
struct A {
A() {
printf("A()\n");
}
~A() {
printf("~A()\n");
}
int empty[0];
};
A vals[3];
}
I am like a human would expect such output:
A()
A()
A()
~A()
~A()
~A()
Clang prints this:
A()
~A()
GCC prints this:
A()
A()
A()
It is totally strange, so it is a good reason not to use empty arrays in C++ if you can.
Also there is extension in GNU C, which gives you to create zero length array in C, but as I understand it right, there should be at least one member in structure prior, or you will get very strange examples as above if you use C++.

C- Basic syntax unexpected fail when forward declaring "warning: data definition has no type or storage class"

I encounter some unexpected issues to compile a piece of code and after some time working on it, I think I need assistance. This trim down to a really small minimal exemple. I feel really lost because I am not a total beginner at C, and computer science and general, and for me, not only this should work but it also sound super basic.
#include <string.h>
char* tamere;
tamere=strdup("flute");
Nothing wierd. strdup return a pointer to char. tamere is declared as pointer to char. The compiler is free to allocate on the stack the place for tamere when he see the declaration. Then later, it set it to the result of strdup at definition on line after.
Gcc output
But this basic piece of code fail to compile on my debian machine:
b.c:3:1: warning: data definition has no type or storage class
tamere=strdup("flute");
b.c:3:1: warning: type defaults to ‘int’ in declaration of ‘tamere’ [-Wimplicit-int]
b.c:3:1: error: conflicting types for ‘tamere’
b.c:2:7: note: previous declaration of ‘tamere’ was here
char* tamere;
^~~~~~
b.c:3:8: warning: initialization makes integer from pointer without a cast [-Wint-conversion]
tamere=strdup("flute");
^~~~~~
b.c:3:8: error: initializer element is not constant
So, the question is why? Why this C piece of code is not working. Why the raisonning about compiler seeing the declaration and being able to do is work is wrong? What is the easiest way around?
Gcc version
gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Thanks for your help :)
Edit: The current answer explain what is happening. However one could imagine compilers to be a bit smarter, to replace
char* tamere;
tamere = "tata";
(not compiling) into:
char* tamere = "tata";
A even smarter compiler (I tought compilers were smart enought to replace compile time knowed variable (such as 3*2) by there values, would even allow
char* tamere;
tamere=strdup("tata");
Because strdup("tata") could be replaced by "tata" (a heap "tata")
Any reason why this is not a thing, and if you want to separate static variable declaration and definition, you have to write a one time function to setup do the definition?

You may use only declarations in the file scope but not statements. It is functions that get the control and execute code in C. And the main entry point of a C program is the function main.
From the C Standard (5.1.2.2.1 Program startup)
1 The function called at program startup is named main. The
implementation declares no prototype for this function. It shall be
defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any
names may be used, as they are local to the function in which they are
declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent;10) or in some other implementation-defined manner
The compiler tries to reinterpret this statement
tamere=strdup("flute");
as a declaration with the implied type specifier int. And as a result it resumes that the variable tamere was redeclared with a different type specifier.
b.c:3:1: warning: data definition has no type or storage class
tamere=strdup("flute");
b.c:3:1: warning: type defaults to ‘int’ in declaration of ‘tamere’ [-Wimplicit-int]
b.c:3:1: error: conflicting types for ‘tamere’
You could use the expression in the statement as an initializer in the declaration
char * tamere = strdup("flute");
But the compiler expects a constant expression as an initializer of a variable with the static storage duration. So it again will issue an error.
From the C Standard (6.7.9 Initialization)
4 All the expressions in an initializer for an object that has static
or thread storage duration shall be constant expressions or string
literals.
So the only acceptable initialization can look like
char * tamere = "flute";
Otherwise you have to set the value of the pointer in some function for example in main.
Pay attention to that strdup is not a standard C function.

You cannot execute code outside functions in C. You can only initialize variables declared there to constant expressions. A function call is never a constant expression since it is resolved in run-time.
The historical reason why C is like this, is probably efficiency. All plain variables at file scope have static storage duration and must be initialized before main() is called. Initialized objects through .data segment and non (zero) initialized objects through .bss segment. This variable initialization already consumes start-up overhead.
If you would in addition to that also execute functions, the start-up time of the program would become even slower. For example, this is a known performance problem in the C++ language, where objects with static storage duration have their constructors called at start-up.
Furthermore, the function strdup isn't standard C and might not exist inside string.h. If you compile the code in strict C mode, this function might turn unavailable since conforming compilers aren't allowed to declare non-standard functions inside standard headers such as string.h. I'd advise to use malloc + strcpy/memcpy instead.

IAR compilation failure, CCS compilation works. Types Incompatibility

After developing a new firmware (main and libraries) with CCS for my CC2538, all errors are debugged, and now, device is working fine.
As from CCS I can not flash the firmware permanently, I'm working with IAR to develop this action.
On IAR, I have created the workspace, the project and included all libraries and files needed to compile the firmware. But, compilation fails due to incompatible types errors.
Error[Pe144]: a value of type "int" cannot be used to initialize an
entity of type "signed short *"
int16_t *accData[3] = malloc(sizeof(int16_t));
Error[Pe513]: a value of type "int" cannot be assigned to an entity
of type "signed short *"
int16_t *accData[3] = malloc(sizeof(int16_t));
Error[Pe120]: return value type ("signed short **") does not match
the function type ("signed short*")
int16_t * lsm303d_readAccData(void)
{
int16_t *accData[3] = malloc(sizeof(int16_t));
...
return accData;
}
Which is the root cause of these errors?
Maybe, any option of the compiler? Do I need to add any file? Or prototype on the code?
KR!

Which is the root cause of these errors?
"a value of type "int"" is the root cause. There should be no int here! Just the signed short* (which is your int16_t*) and a void* from malloc.
This is because you are using a C90 compiler and forgot to #include <stdlib.h>. Upon finding a function with no prototype, C90 would implicitly assume you want a function returning int, which explains the compiler errors "a value of type "int"". But malloc actually returns a void*, so this is a severe bug. Solve this by including the header stdlib.h where malloc is found.
This undesired and irrational behavior of the language was fixed 17 years ago. Consider using a modern compiler instead, or configure your compiler to use the current C language standard (ISO 9899:2011).
That being said, this code doesn't make any sense either:
int16_t *accData[3] = malloc(sizeof(int16_t));
You probably meant
int16_t *accData = malloc( sizeof(int16_t[3]) );

The first error is somewhat misleading. It seems to indicate that you forgot to include <stdlib.h>, so malloc is undefined and the compiler assumes it returns int.
In any case, you are assigning a pointer to an array: this is incorrect.
Returning the address of a local automatic array is incorrect too.
You should define accData as a pointer instead of an array, and make it point to an allocated array of int16_t. You seem to want this array to hold 3 elements, otherwise modify the code accordingly:
#include <stdlib.h>
int16_t *lsm303d_readAccData(void) {
int16_t *accData = malloc(sizeof(int16_t) * 3);
...
return accData;
}
You should configure the compiler to issue more warnings and refuse obsolete constructions such as implicit int. For gcc, add -std=c99 or -std=c11 and -Wall -Wextra -Werror.

Compiling C structs

This is my code:
#include <stdio.h>
typedef struct {
const char *description;
float value;
int age;
} swag;
typedef struct {
swag *swag;
const char *sequence;
} combination;
typedef struct {
combination numbers;
const char *make;
} safe;
int main(void)
{
swag gold = { "GOLD!", 100000.0 };
combination numbers = { &gold, "6503" };
safe s = { numbers, "RAMCON" };
printf("Contents = %s\n", s.numbers.swag->description);
getchar();
return 0;
}
Whenever I compile it with the VS developer console, I get this error: error C2440: 'initializing' : cannot convert from 'combination' to 'swag *'.
However if I use gcc the console just prints: "GOLD!". Don't understand what's going on here.

What you stumbled upon is an implementation-specific variant of a popular non-standard compiler extension used in various C89/90 compilers.
The strict rules of classic C89/90 prohibited the use of non-constant objects in {} initializers. This immediately meant that it was impossible to specify an entire struct object between the {} in the initializer, since that would violate the above requirement. Under that rule you could only use scalar constants between the {}.
However, many C89/90 compilers ignored that standard requirement and allowed users to specify non-constant values when writing {} initializers for local objects. Unfortunately, this immediately created an ambiguity if user specified a complex struct object inside the {} initializer, as in your
safe s = { numbers, "RAMCON" };
The language standard did not allow this, for which reason it was not clear what this numbers initializer should apply to. There are two ways to interpret this:
The existing rules of the language said that the compiler must automatically enter each level of struct nesting and apply sequential initializers from the {} to all sequential scalar fields found in that way (actually, it is a bit more complicated, but that's the general idea).
This is exactly what your compiler did. It took the first initializer numbers, it found the first scalar field s.numbers.swag and attempted to apply the former to the latter. This expectedly produced the error you observed.
Other compiler took a more elaborate approach to that extension. When the compiler saw that the next initializer from the {} list had the same type as the target field on the left-hand side, it did not "open" the target field and did not enter the next level of nesting, but rather used the whole initializer value to initialize the whole target field.
This latter behavior is what you expected in your example (and, if I am not mistaken, this is the behavior required by C99), but your C89/90 compiler behaved in accordance with the first approach.
In other words, when you are writing C89/90 code, it is generally OK to use that non-standard extension when you specify non-constant objects in local {} initializers. But it is a good idea to avoid using struct objects in such initializers and stick to scalar initializers only.

Looks like an issue with the initializers. If you use the proper options with gcc, it will tell you this:
$ gcc -Wall -ansi -pedantic x.c
x.c: In function ‘main’:
x.c:21: warning: initializer element is not computable at load time
x.c:22: warning: initializer element is not computable at load time
which is propably the same issue VS is trying to tell you. You can make these go away if you declare gold and numbers static.

Linker doesn't show any error, weird

Suppose i have 2 C src files, A1.C, A2.C, these are the contents:
A1.C
int x;
int main(){
void f(void);
x = 5;
f();
printf("%d", x);
return 0;
}
A2.C
int x;
void f() { x = 4; }
the linker doesn't give me any errors despite the missing "extern" safe-word. i have 2 identical symbols. can someone explain why?

For gcc, you can use the -fno-common flag to turn this into an error.
The gcc documentation explains what's happening
-fno-common
In C code, controls the placement of uninitialized global variables. Unix C compilers have traditionally permitted multiple
definitions of such variables in different compilation units by
placing the variables in a common block. This is the behavior
specified by -fcommon, and is the default for GCC on most targets.
On
the other hand, this behavior is not required by ISO C, and on some
targets may carry a speed or code size penalty on variable references.
The -fno-common option specifies that the compiler should place
uninitialized global variables in the data section of the object file,
rather than generating them as common blocks. This has the effect that
if the same variable is declared (without extern) in two different
compilations, you get a multiple-definition error when you link them.
See also Tentative definitions in C99 and linking