I know how namespaces work in C++, but I´m a little bit confused of how they work in C. So, I did a bit of research about name spaces in C.
First, the respective section in ISO/IEC 9899:2018 (C18), section 6.2.3:
6.2.3 Name spaces of identifiers
1 If more than one declaration of a particular identifier is visible at any point in a translation unit, the syntactic context disambiguates uses that refer to different entities. Thus, there are separate name spaces for various categories of identifiers, as follows:
— label names (disambiguated by the syntax of the label declaration and use); — the tags of structures, unions, and enumerations (disambiguated by following any(32)) of the keywords struct, union, or enum);
— the members of structures or unions; each structure or union has a separate name space for its members(disambiguated by the type of the expression used to access the member via the . or -> operator);
— all other identifiers, called ordinary identifiers (declared in ordinary declarators or as enumeration constants).
32) There is only one name space for tags even though three are possible.
So this gives me a bit more understanding of the term in C and seems to generally have the same kind of purpose as in C++. But unfortunately, there is nothing further said in the standard about how name spaces work in C.
Apparently, it has something to do with the distinction between entities that share the same identifier and, as opposed to C++, where we declaring namespaces like:
namespace ctrl1
{
int max = 245;
}
and using namespaces, like:
using namespace ctrl1;
or
int a = ctrl1::max;
in C, the compiler is be able to disambiguate a certain use of one object automatically if the respective identifier is used. Correct me, if I´m wrong.
How does that work? How does the compiler know if he shall use one entity instead of the other in C?
I have read Name spaces in c++ and c but the question is more focused on C++ and focused on the handling of a specific example.
I also read the Name spaces in C where again the purpose of the question is more focused on a specific example, here the enum type.
My Question is:
How do name spaces work in C?
in C, the compiler is be able to disambiguate a certain use of one
object automatically if the respective identifier is used. Correct me,
if I´m wrong.
How does that work? How does the compiler know if he shall use one
entity instead of the other in C?
The excerpt from the standard already addresses this (emphasis added):
— label names (disambiguated by the syntax of the label declaration and use);
A label declaration has the form of an identifier followed by a colon, which must be followed by a statement:
a_label:
do_something;
The only use of labels is in goto statements, and the identifier in a goto statement can be only a label:
goto a_label;
— the tags of structures, unions, and enumerations (disambiguated by following any32) of the keywords struct, union, or enum);
"Following any of the keywords struct, union, or enum" means exactly what it says:
struct a_tag
union another_tag
enum a_third_tag
Those forms can appear in type definintions, type declarations, and type uses. If one of the keywords struct, union, or enum immediately precedes an identifier then that identifier is a tag; otherwise it isn't.
— the members of structures or unions; each structure or union has a
separate name space for its members(disambiguated by the type of the
expression used to access the member via the . or -> operator);
The appearance of an identifier as the right-hand operand of a . or -> operator distinguishes it as the identifier of a structure or union member. The type of the left-hand operand determines of which structure or union type. C structure and union types cannot have static members, so there is never any need to access a structure or union member relative to the type itself, absent an object of that type.
— all other identifiers, called ordinary identifiers
Anything not covered by one of the other three cases is covered by this one. That includes variable names, function names, function parameter names, built-in and typedefed type names, and enumeration constants. (I think that's a complete list, but I may have overlooked something).
How do name spaces work in C?
The only other thing I can think of to clarify is that unlike in C++, C has only implicit declaration and use of namespaces. There is no namespace keyword in C, and no syntax for explicitly referring to an identifier relative to a chosen namespace. User-defined namespaces being limited to those associated with structure and union types, the simple, implicit approach satisfactorily covers all possible cases.
Given that the category of ordinary identifiers is very broad, however, it has become conventional for authors of reusable C libraries to minimize the likelihood of name collisions by prefixing the external identifiers exposed by their libraries with characteristic short prefixes. This ad hoc namespacing is quite outside the scope of the standard, but very common.
Related
From cppreference:
1) Label name space: all identifiers declared as labels.
2) Tag names: all identifiers declared as names of structs, unions and enumerated types.
3) Member names: all identifiers declared as members of any one struct or union. Every struct and union introduces its own name space of this kind.
4) All other identifiers, called ordinary identifiers to distinguish from (1-3) (function names, object names, typedef names, enumeration constants).
This allows for code like this (among other things):
struct Point { int x, y; };
struct Point Point;
This code seems somewhat unclear to me as Point can refer to both a type and an instance of a struct. What was the motivation behind having separate name spaces for tags and other identifiers?
The actual question posed is
What was the motivation behind having separate name spaces for tags and other identifiers?
This can be answered only by reference to the standard committee's rationale document, which in fact does address the matter, however briefly:
Pre-C89 implementations varied considerably in the number of separate name spaces maintained. The position adopted in the Standard is to permit as many separate name spaces as can be distinguished by context, except that all tags (struct, union, and enum) comprise a single name space.
(C99 rationale document,* section 6.2.3)
Thus, it is explicitly intentional that code such as
struct point { int point; } point = { .point = 0 };
goto point;
point:
return point.point;
is permitted. My interpretation of the rationale is that the intention was to be unrestrictive, though it remains unclear why the different kinds of tags were not given separate namespaces. This could not have been accidental, so one or more parties represented on the committee must have opposed separate tag namespaces, and they managed to prevail. Such opposition could very well have been for business instead of technical reasons.
*As far as I am aware, there is no rationale document for the C2011 standard. At least, not yet.
I've seen structs declared two different ways.
typedef struct _myStruct {
...
} myStruct;
and
typedef struct myStruct {
...
} myStruct;
Is there a reason for the leading underscore or is this just a stylistic thing? If there is not a difference, is one of these preferred over the other?
The former was used long ago, when some compiler(s) didn't allow the tag and the typedef to use the same identifier. The latter is currently preferred, and in fact, identifiers that start with an underscore are discouraged.
There are reasons not to use the leading underscore, notably that names starting with an underscore are basically reserved for use by the implementation. The details are a little more nuanced than that, but it is easier to remember.
ISO/IEC 9899:2011
7.1.3 Reserved identifiers
7.1.3 Reserved identifiers
¶1 Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
…
Consequently, using the leading underscore is treading on thin ice. Usually, you'll get away with. However, sometimes you won't, and when you won't, you have no recourse because you've been treading outside the limits of the namespace that the standard allows you to use.
If the structure tag and the type name are the same, you don't have to guess which structure tag goes with which type name (alias).
Note that the Linux kernel coding standards reject typedefs for structures. You'll have to decide whether you want to follow that rule. Many systems do not follow it.
One other minor issue is that C++ performs the equivalent of typedef struct MyStruct MyStruct; automatically — after defining a class or struct (or union) with a tag name, you can use the tag name as a type name. It isn't identical — you can do the typedef yourself and it compiles cleanly.
Completely stylistic. Just visually differentiates the "synthetic type" from the declared variable of that type.
I tend to do :-
typedef struct {
...
} myStruct;
I see occasional questions such as "what's the difference between a declaration and a definition":
What is the difference between a definition and a declaration?
The distinction is important and intellectually it achieves two important things:
It brings to the fore the difference between reference and referent
It's how C enables separation in time of the attachment between reference and referent.
So why is a C typedef declaration not called a typedef definition?
Firstly, it's obviously a definition. It defines an alias. The new name is to be taken as referring to the existing thing. But it certainly binds the reference to a specific referent and is without doubt a defining statement.
Secondly, wouldn't it be called a typedec if it were a declaration?
Thirdly, wouldn't it avoid all those confusing questions people ask when they try and make a forward declaration using a typedef?
A typedef declaration is a definition.
N1570 6.7p5:
A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that:
for an object, causes storage to be reserved for that object;
for a function, includes the function body;
for an enumeration constant, is the (only) declaration of the identifier;
for a typedef name, is the first (or only) declaration of the identifier.
In C99, the last two bullet points were combined; C11 introduced the ability to declare the same typedef twice.
Note that only objects, functions, enumeration constants, and typedef names can have definitions. One might argue that given:
enum foo { zero, one};
it doesn't make much sense to consider this to be a definition of zero and one, but not of foo or enum foo. On the other hand, an enum, struct, or union declaration, though it creates a type that didn't previously exist, doesn't define an identifier that is that type's name -- and for structs and union, the tag name can be used (as an incomplete type) even before the type has been defined. Definitions define identifiers, not (necessarily) the entities to which they refer.
As for why it's not called a "definition" in the subsection that defines it, it's part of section 6.7 "Declarations", which covers all kinds of declarations (some of which are also definitions). The term definition is defined in the introductory part of 6.7.
As for the name typedef, it's caused a fair amount of confusion over the years since it doesn't really define a type. Perhaps typename would have been a better choice, or even typealias. But since it does define the identifier, typedef isn't entirely misleading.
What are examples of non - context free languages in C language ? How the following non-CFL exists in C language ?
a) L1 = {wcw|w is {a,b}*}
b) L2 = {a^n b^m c^n d^m| n,m >=1}
The question is clumsily worded, so I'm reading between the lines, here. Still, it's a common homework/study question.
The various ambiguities [1] in the C grammar as normally presented do not render the language non-context-free. (Indeed, they don't even render the grammars non-context-free.) The general rule "if it looks like a declaration, it's a declaration regardless of other possible parses" can probably be codified in a very complex context-free grammar (although it's not 100% obvious that that is true, since CFGs are not closed under intersection or difference), but it's easier to parse with a simpler CFG and then disambiguate according to the declaration rule.
Now, the important point about C (and most programming languages) is that the syntax of the language is quite a bit more complex than the BNF used for explanatory purposes. For example, a C program is not well-formed if a variable is used without being defined. That's a syntax error, but it's not detected by the CFG parser. The grammatical productions needed to define these cases are quite complicated, due to the complicated syntax of the language, but they're going to boil down to requiring that ids appear twice in a valid program. Hence L1 = {wcw|w is {a,b}+} (here w is the identifier, and c is way too complicated to spell out). In practice, checking this requirement is normally done with a symbol table, and the formal language rules, while precise, are not written in a logical formalism. Since L1is not a context-free language, the formalism could not be context-free, but a context-sensitive grammar can recognize L1, so it's not totally impossible. (See, for example, Algol 68.)
The symbol table is also used to decide whether a particular identifier is to be reduced to typedef-name [2]. This is required to resolve a number of ambiguities in the grammar. (It also further restricts the set of strings in the language, because there are some cases where an identifier must be resolved as a typedef-name in order for the program to be valid.)
For another type of context-sensitivity, function calls need to match function declarations in the number of arguments; this sort of requirement is modelled by L2 = {a^n b^m c^n d^m| n,m >=1} where a and c represent the definition and use of some function, and b and d represent the definition and use of a different function. (Again, in a highly-simplified form.)
This second requirement is possibly less clearly a syntactic requirement. Other languages (Python, for example) allow function calls with any number of arguments, and detect a argument/parameter count match as a semantic error only detected at runtime. In the case of C, however, a mismatch is clearly a syntax error.
In short, the set of grammatically valid strings which constitute the C language is a proper subset of the set of strings recognized by the CFG presented in the C language definition; the set of valid parses is a proper subset of the set of derivations generated by the CFG, and the language itself is (a) unambiguous, and (b) not context-free.
Note 1: Most of these are not really ambiguities, because they depend upon how a given identifier is resolved (typedef name, function identifier, declared variable,...).
Note 2: It is not the case that identifier must be resolved as a typedef-name if it happens to be one; that only happens in places where the reduction is possible. It is not a syntax error to use the same identifier for both a type and a variable, even in the same scope. (It's not a good idea, but it's valid.) The following example, adapted from an example in section 6.7.8 of the standard, shows the use of t as both a field name and a typedef:
typedef signed int t;
struct tag {
unsigned t:4; // field named 't' of type unsigned int
const t:5; // unnamed field of type 't' (signed int)
};
These things aren't context-free in C:
foo * bar; // foo multiplied by bar or declaration of bar pointing to foo?
foo(*bar); // foo called with *bar as param or declaration of bar pointing to foo?
foo bar[2] // is bar an array of foo or a pointer to foo?
foo (bar baz) // is foo a function or a pointer to a function?
can we say that identifier are alias of variables?
are identifier and variables same?
To say it another way, identifiers are the names given to things (such as variables and functions). They identify the thing which they are naming.
No.
int f() { }
f is an identifier. It is not a variable.
Identifier is the fancy term used to mean ‘name’. In C, identifiers are used to refer to a number of things: we've already seen them used to name variables and functions. They are also used to give names to some things we haven't seen yet, amongst which are labels and the ‘tags’ of structures, unions, and enums.
An identifier is used for any variable, function, data definition, etc. In the C programming language, an identifier is a combination of alphanumeric characters, the first being a letter of the alphabet or an underline, and the remaining being any letter of the alphabet, any numeric digit, or the underline. and you know about variables.
please check C Tutorial - Chapter 1
No, from C99 (6.2.1):
An identifier can denote an object; a
function; a tag or a member of a
structure, union, or enumeration; a
typedef name; a label name; a macro
name; or a macro parameter.