What are identifiers in C exactly? - c

Every Google search explains them as just "names for your variables", but I have a feeling there is a distinction between the identifier and the identifier's name. Is an identifier more like an object with attributes like name, scope, linkage, and an underlying object? I ask this because I ran into some trouble trying to read through the C standard. For instance, the snippet
int main(){
int x;
extern int x;
}
fails to compile whereas
int main(){
int x;
if(1){extern int x;}
}
compiles successfully. In this question, the failure of the first snippet is explained from 6.2.2.6 in the C standard, which states that local variables have no linkage. However, in the second snippet, the local variable still has no linkage and yet there is no conflict. Now, 6.2.2.4 states
For an identifier declared with the storage-class specifier extern in a scope in which a prior declaration of that identifier is visible, if the prior declaration specifies internal or external linkage,
the linkage of the identifier at the later declaration is the same as the linkage specified at the prior
declaration. If no prior declaration is visible, or if the prior declaration specifies no linkage, then the
identifier has external linkage.
My explanation would have been that this rule is in effect in both snippets, but in the first one, the uniqueness of the underlying object of x triggers a constraint violation via 6.2.1.2 because the same identifier name is being used for two distinct objects with the same scope and name space. But this is not the explanation given in the answer to the question I linked earlier. In the second snippet, the linkage types are still conflicting, so does changing the scope of the extern declaration change the visibility of the local declaration? What is the best way to think about linkage from the abstract point of view of the C standard (without using actual implementations like gcc or clang as illustration)?

"identifier" is an element of the language grammar. After preprocessing, all tokens are one of the following: keyword, identifier, constant, string-literal or punctuator.
If a token starts with a letter (or underscore) it can only be a keyword or an identifier. If it's not in the table of keywords then it is an identifier. For more technical detail on this , see Annex A of the C Standard.
In your program x and main are identifiers, int, if and extern are keywords, 1 is a constant, and everything else is a punctuator.
Identifiers are used as names of entities. The same identifier can be used in different scopes to designate different entities (or the same entity). Linkage is the name of the process by which identifiers are associated with entities.
Sometimes the standard uses the word "identifier" to mean the entity identified by an identifier, this is covered in 6.2.1/5:
Unless explicitly stated otherwise, where this International Standard uses the term “identifier” to refer to some entity (as opposed to the syntactic construct), it refers to the entity in the relevant name space whose declaration is visible at the point the identifier occurs.
The first code is erroneous because of 6.7/3:
If an identifier has no linkage, there shall be no more than one declaration of the identifier (in a declarator or type specifier) with the same scope and in the same name space, except that: [...]
The int x; has no linkage so there shall not be another definition of x in the same scope. (The list of exceptions does not have anything relevant to this case).
In the second code, 6.7/3 is not violated because the second declaration is not in the same scope as the first one. The text you quoted explains that extern int x; names a different entity than int x; did, which is fine.
The second program has undefined behaviour (no diagnostic required) due to declaring an identifier with external linkage but not providing a definition. You may or may not see an error message.

Related

External definition of an object declared with external linkage

I'm a bit confused by the wording in 6.9 p5 of N2310 C18:
If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one. 164)
QUESTION: Is it obvious from this quote that the external definition somewhere in the program (if any) should also declare an identifier with external linkage?
As I emphasized somewhere in the entire program there shall be exactly one external definition for the identifier. It does not specify which linkage the definition should declare the identifier with. Example:
tu1.c:
int a = 10;
tu2.c:
static int a = 20;
Formally speaking we have one external definition for identifier a declared in tu1.c and another one in tu2.c so we could apply the quote I cited above to this example.
Although to denote the same entity identifiers declared in different should all be declared with external linkage as specified in 6.2.2/2:
In the set of translation units and libraries that constitutes an
entire program, each declaration of a particular identifier with
external linkage denotes the same object or function.
Which is not the case here.
See C11 §6.2.2 Linkages of identifiers:
… There are three kinds of linkage: external, internal, and none.
In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function. Within one translation unit, each declaration of an identifier with internal linkage denotes the same object or function. Each declaration of an identifier with no linkage denotes a unique entity.
If the declaration of a file scope identifier for an object or a function contains the storage-class specifier static, the identifier has internal linkage.
Emphasis added.
If a file scope variable is specified with static, it has internal linkage and isn't relevant to a discussion of variables with external linkage.

Variable declaration and definition mismatch

I am using a C89 compiler (embedded systems).
I ran into some C code where one translation unit defines a variable as bool varName;, where bool is a typedef of unsigned char. Another translation unit forward declares the variable as follows: extern char varName;.
This is obviously a type mismatch, and is an error. My question is, what exact rule does this violate? My knee-jerk reaction was that it is an ODR violation, but there is a single definition so I'm not confident that this is an ODR violation.
6.2.7p2
All declarations that refer to the same object or function shall have
compatible type; otherwise, the behavior is undefined.
The C89 standard has the same paragraph.
Declarations referfing to the same object is further explained in the paragraph on linkage:
An identifier declared in different scopes or in the same scope more
than once can be made to refer to the same object or function by a
process called linkage . There are three kinds of linkage: external,
internal, and none.
In the set of translation units and libraries that constitutes an
entire program, each instance of a particular identifier with external
linkage denotes the same object or function. Within one translation
unit, each instance of an identifier with internal linkage denotes the
same object or function. Identifiers with no linkage denote unique
entities.
Compatible types essentially means identical types, with some minor caveats (e.g., extern int foo[]; is compatible with extern int foo[3];).

Declaration and definition in C programming with extern

This may seem simple to one's eye but, this question is itching me in many ways.
my question is about declaration and defenition on variables in c.
there are actually many explanation in internet regarding this one and there is not just one solution to this issue as many view points are placed in this issue. i want to know the clear existance of this issue.
int a;
just take this is this a declaration or definition?, this one when i use printf, it has 0 as value and address as 2335860. but if this declaration then how come memory is allocated for this.
int a;
int a;
when i do this it says previous declaration of 'a' was here and redeclaration of 'a' with no linkage.
some sources say redeclaration is permitted in c and some say dont what is the truth?
int a; just take this is this a declaration or definition?
int a; if written in global scope is a tentative definition. Which means if no other definitions are available in current compilation unit, treat this as definition or else this is a declaration.
From 6.9.2 External object definitions in C11 specs:
A declaration of an identifier for an object that has file scope
without an initializer, and without a storage-class specifier or with
the storage-class specifier static, constitutes a tentative
definition. If a translation unit contains one or more tentative
definitions for an identifier, and the translation unit contains no
external definition for that identifier, then the behavior is exactly
as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation
unit, with an initializer equal to 0.
int i4; // tentative definition, external linkage
static int i5; // tentative definition, internal linkage
So you are effectively doing multiple declarations but getting the address and value because of tentative definition rule.
some sources say redeclaration is permitted in c and some say dont
what is the truth?
Redeclaration is permitted in C. But redefinition is not.
Related question: What is the difference between a definition and a declaration?
there are actually many explanation in internet
Prefer a good book instead of internet to get the hold of language. You can choose a good book from:
The Definitive C Book Guide and List
int a is a definition and can be used in place of declaration. A variable can have many declarations but must have only one definition. In case of
int a;
int a;
there are two definition of a in the same scope. Providing linkage to one of them will make your compiler happy
int a;
extern int a;

Why do we need the 'extern' keyword in C if file scope declarations have external linkage by default?

AFAIK, any declaration of a variable or a function in file scope has external linkage by default. static mean "it has internal linkage", extern -- "it maybe defined elsewhere", not "it has external linkage".
If so, why we need extern keyword? In other words, what is difference between int foo; and extern int foo; (file scope)?
The extern keyword is used primarily for variable declarations. When you forward-declare a function, the keyword is optional.
The keyword lets the compiler distinguish a forward declaration of a global variable from a definition of a variable:
extern double xyz; // Declares xyz without defining it
If you keep this declaration by itself and then use xyz in your code, you would trigger an "undefined symbol" error during the linking phase.
double xyz; // Declares and defines xyz
If you keep this declaration in a header file and use it from several C/C++ files, you would trigger a "multiple definitions" error during the linking phase.
The solution is to use extern in the header, and not use extern in exactly one C or C++ file.
As an illustration, compile the following program: (using cc -c program.c , or the equivalent)
extern char bogus[0x12345678] ;
Now remove the "extern" keyword, and compile again:
char bogus[0x12345678] ="1";
Run objdump (or the equivalent) on the two objects.
You will find that without the extern keyword space is actually allocated.
With the extern keyword the whole "bogus" thing is only a reference. You are saying to the compiler: "there must be a char bogus[xxx] somewhere, fix it up!"
Without the extern keyword you say: "I need space for a variable char bogus[xxx], give me that space!"
The confusing thing is that the actual allocation of memory for an object is postponed until link time: the compiler just adds a record to the object, informing the linker that an object should (or should not) be allocated. In all cases, the compiler at leasts will add the name (and size) of the object, so the linker/loader can fix it up.
C99 standard
I am going to repeat what others said but by quoting and interpreting the C99 N1256 draft.
First I confirm your assertion that external linkage is the default for file scope 6.2.2/5 "Linkages of identifiers":
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
The confusion point is that extern does not only alter the linkage, but also weather an object declaration is a definition or not. This matters because 6.9/5 "External definitions" says there can only be one external definition:
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than
one.
where "external definition" is defined by the grammar snippet:
translation-unit:
external-declaration
so it means a "file scope" top-level declaration.
Then 6.9.2/2 "External object definitions" says (object means "data of a variable"):
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So:
extern int i;
is not a definition, because it does have a storage-class specifier: extern.
However:
int i;
does not have a storage-class specifier, so it is a tentative definition. And if there are no more external declarations for i, then we can add the initializer equal 0 = 0 implicitly:
int i = 0;
So if we had multiple int i; in different files, the linker should in theory blow up with multiple definitions.
GCC 4.8 does not comply however, and as an extension allows multiple int i; across different files as mentioned at: https://stackoverflow.com/a/3692486/895245 .
This is implemented in ELF with a common symbol, and this extension is so common that it is mentioned in the standard at J.5.11/5 Common extensions > Multiple external definitions:
There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
Another place where extern has an effect is in block-scope declarations, see: Can local and register variables be declared extern?
If there is an initializer for the object declaration, extern has no effect:
extern int i = 0;
equals
int i = 0;
Both are definitions.
For functions, extern seems to have no effect: Effects of the extern keyword on C functions as there is no analogous concept of tentative definition.
You can only define a variable once.
If multiple files use the same variable then the variable must be redundantly declared in each file. If you do a simple "int foo;" you'll get a duplicate definition error. Use "extern" to avoid a duplicate definition error. Extern is like saying to the compiler "hey, this variable exists but don't create it. it's defined somewhere else".
The build process in C is not "smart". It won't search through all the files to see if a variable exists. You must explicitly say that the variable exists in the current file, but at the same time avoid creating it twice.
Even in the same file, the build process is not very smart. It goes top to bottom and it won't recognize a function name if it is defined below the point of use, so you must declare it higher up.

Declaration or Definition in C

From External Variables Wiki:
If neither the extern keyword nor an
initialization value are present, the
statement can be either a declaration
or a definition. It is up to the
compiler to analyse the modules of the
program and decide.
I was not able to fully grasp the meaning of this statement with respect to C. For example, does it imply that:
int i;
is not necessarily a declaration (as I have been assuming until now), but could be a definition as well (by definition of Definition & Declaration on the same webpage, no puns intended)?
In a nutshell, is the above statement:
a. just a declaration, or
b. declaration + definition?
Reference: Variable declaration and definition
Summary of answers received:
Declaration Definition Tentative Definition Initialized
int i; (inside a block) Yes Yes No No
int i=5; (inside a block) Yes Yes No Yes(to 5)
int i; (otherwise) Yes No Yes Yes(to 0)
extern int i; Yes No No No
All definitions are declarations but not vice-versa.
Assuming it's at file scope it's a 'tentative definition'. From 6.9.2/2 "External object definitions":
A declaration of an identifier for an object that has file scope without an initializer, and
without a storage-class specifier or with the storage-class specifier static, constitutes a
tentative definition. If a translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for that identifier, then
the behavior is exactly as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation unit, with an initializer
equal to 0.
This means that it would be valid to also have the following in the translation unit:
int i = 42;
since that declaration has an explicit initializer, it's the definition of the variable i.
As far as if the declaration is in a block scope, the standard says the following (6.2.2/2 "Linkages of identifiers"):
Each declaration of an identifier with
no linkage denotes a unique entity.
...
(paragraph 6) The following
identifiers have no linkage: ... a
block scope identifier for an object
declared without the storage-class
specifier extern.
So in block scope, the declaration would be a definition as well.
The C standard says that
A definition of an identifier is a declaration for that identifier that: for an object, causes storage to be reserved for that object (…)
Definitions encompass declarations, i.e., every definition is necessarily a declaration, so it doesn’t make sense to say that
int i;
is not a declaration. It is a declaration which also happens to be a definition. Or, it is a definition, hence a declaration.
In the context of variables:
A declaration of a variable is a statement which describes how this variable looks like. So:
extern int x;
in global scope translates to: "somewhere in the code, there's a variable called x which has type int and extern linkage. A declaration is necessary before you ever refer to x. (The same goes to function declarations.)
A definition is a statement which creates an instance of this variable. So:
int x;
in global scope creates a single variable of type int with extern linkage. So if you'd put that line in a header, every translation unit including that header would try to create its own copy of x, which is undesirable - that's why we only have declarations in header files. The same goes to functions: if you provide the function body, it's a definition.
Also, formally, every definition is a kind of declaration, as it also has to specify how this variable/function looks like - so if a definition already exists in a given scope, you don't need any additional declarations to use it.
From the C99 spec:
A declaration of an identifier for an object that has file scope without an initializer, and
without a storage-class specifier or with the storage-class specifier static, constitutes a
tentative definition. If a translation unit contains one or more tentative definitions for an
identifier, and the translation unit contains no external definition for that identifier, then
the behavior is exactly as if the translation unit contains a file scope declaration of that
identifier, with the composite type as of the end of the translation unit, with an initializer
equal to 0.
So this is one case in which a simple declaration without an initializer can be a declaration.
As C uses the terms:
A "definition" creates something (which occupies some sort of memory). It also describes something. This means a "definition" is also a "declaration".
A "declaration" just describes something. The idea is that the compiler needs to know how to build the code that uses the thing defined elsewhere. Later, the linker then links the use to the something.
Declarations allow you to compile code and link it (later) as a separate step.

Resources