Why does C have keywords starting with underscore - c

Most keywords in C (or in any language for that matter) starts with a letter. But there are some keywords that starts with an underscore? They keywords are: _Alignas, _Alignof, _Atomic, _Bool, _Complex, _Generic, _Imaginary, _Noreturn, _Static_assert and _Thread_local.
I find it amazingly strange. If it was a hidden global constant or internal function that's not really a part of the API, I would understand it. But these are keywords.
I find it extra strange when C actually have a macros called bool and static_assert, and that their implementations is using the very keywords I just mentioned.

C developed and become very popular before it was planned by a standards committee. In consequence, there was a lot of existing code.
When setting a C standard, or updating an old standard, an important goal is not to “break” old code. It is desirable that code that worked with previous compilers continue to work with new versions of the C language.
Introducing a new keyword (or any new definition or meaning of a word) can break old code, since, when compiling, the word will have its new keyword meaning and not the identifier meaning it had with the previous compilers. The code will have to be edited. In addition to the expense of paying people to edit the code, this has a risk of introducing bugs if any mistakes are made.
To deal with this, a rule was made that identifiers starting with underscore were reserved. Making this rule did not break much old software, since most people writing software choose to use identifiers beginning with letters, not underscore. This rule gives the C standard a new ability: By using underscore when adding new keywords or other new meanings for words, it is able to do so without breaking old code, as long as that old code obeyed the rule. For example, adding a new keyword, _Bool for a Boolean type, would not break any code that had not used identifiers beginning with an underscore.
New versions of the C standard sometimes introduce new meanings for words that do not begin with an underscore, such as bool. However, these new meanings are generally not introduced in the core language. Rather, they are introduced only in new headers. In making a bool type, the C standard provided a new header, <stdbool.h>. Since old code could not be including <stdbool.h> since it did not exist when the code was written, defining bool in <stdbool.h> would not break old code. At the same time, it gives programmers writing new code the ability to use the new bool feature by including <stdbool.h>, which defines bool as a macro that is replaced by _Bool.

In the standard, any name that begins with a double underscore or an underscore followed by an uppercase letter is reserved. This is useful because C lacks named namespaces. By reserving all such symbols, new and implementation specific keywords can be introduced to the language without clashing with symbols defined in existing code.
The macros such as bool and static_assert are "convenience macros", they allow you to use the reserved keyword symbols without the underscores and capitals at the small risk of a name clash. However they provide a means to resolve a name clash because unlike a keyword, and macro may be #undefined, or the header that defines it excluded and the internal keyword used directly. Moreover unmodified legacy code will not be broken because by definition it will not include the headers that did not exist at the time of writing
The unadorned keywords have been defined in the language since the language's inception (with the exception of inline and restrict defined since C99), so will not cause a conflict with legacy code symbols. All _Xxxx keywords have been defined at or since C99.
Unlike many languages in common use today, C has been around since the 1970's and standardised since 1989 - there is a huge amount of existing code that must remain compilable on modern compilers while at the same time the language cannot remain unchanged - if it did it might no longer be in such common use.

Eric and Clifford has provided good answers, but I add a quote from the C11 standard to support it.
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
https://port70.net/~nsz/c/c11/n1570.html#7.1.3
One might also consider this:
Typedef names beginning with int or uint and ending with _t may be added to the types defined in the stdint.h header. Macro names beginning with INT or UINT and ending with _MAX, _MIN, or _C may be added to the macros defined in the stdint.h header.
https://port70.net/~nsz/c/c11/n1570.html#7.31.10p1

Related

Does the ISO 9899 standard has reserved any use of the _t suffix for identifiers?

I can read in many books and an other SO questions that the standard may expand the set of identifiers such as size_t or int32_t, so it reserves any use of the _t suffix for identifiers.
Is that true?
I could not find anything that discourage the use of this suffix in the ISO9899:1999 standard, but that standard is hard to read :(
No, it's not true.
The standard reserves the right to add identifiers starting with int or uint and ending _t to the stdint.h header (§7.31.10). Those identifiers are technically only reserved if that header is included but since it almost always is, they should be treated as reserved.
In general, the standard reserves identifiers defined in standard headers, or mentioned in the future directions for standard headers (§7.31). Identifiers having external linkage (library functions) are reserved for that use (which doesn't stop you from using them as local or static variables, for example). If the library header is included, then its identifiers are reserved for use at file scope. Read §7.1.3 for details.
As that section indicates, the only identifiers unconditionally reserved are those which start with an underscore followed by a capital letter or a second underscore.
While reading the standard, it's important to understand the difference between contexts in which a name is reserved:
Reserved for any use (identifiers starting with an underscore followed by another underscore or a capital letter): these identifiers may be used by the implementation as macros or special symbols which are handled in some idiosyncratic way by the compiler. Do not ever define one of these in your code, and use the ones which are documented only as indicated by the documentation. Do not use such a symbol if it is not documented, even if you see it used in some standard library header. Or someone else's code.
Reserved at file scope (other identifiers starting with an underscore, not part of any standard header): These identifiers will not be used as macros, and you must not define them as macros either. You may use them as local variables, labels, parameters, and struct or union members. Personally, I wouldn't do this, but it's permitted. I prefer to put an underscore at the end of an identifier which is used in some internal context.
Reserved at file scope and as a macro name (any identifier mentioned in an included standard header, including in the future directions clause): Again, since these identifiers may be macros, you should treat them as off limits if you #include the associated header. The standard does allow you to #undef an identifier used as a function name in the standard library, although you might find that performance suffers because the macro wraps a construct with equivalent semantics but optimised performance.
Reserved for use as an identifier with external linkage (any identifier defined in any standard library header as having external linkage, whether or not the header is included, including the identifier errno): The weakest reservation. If you don't include the associated header, you're free to use such an identifier, even at file scope, as long as it is not externally visible. So it could be a file-scope static or an enumeration member or the tag of a struct or union. The point of this clause is not to allow you to deliberately shadow the name of a standard library function. Rather, it is to protect you from future additions to the standard library which might export an external symbol you're currently using. Of course, if your current use is as an externally visible identifier, you're still going to have a future problem. But on the whole, externally visible symbols should be prefixed with a package name to avoid name collisions with other libraries.
Having said all that, it's unwise to use an identifier that looks like it might be a standard identifier. Posix includes a list of over a hundred patterns for identifier names that it might use in the future, including all identifiers ending _t, so if you expect your code to be used in a Posix environment, you'll want to avoid those names. And while future C standard revisions might avoid adding new type names to existing headers (aside from the integer typenames mentioned above), you don't really want to preclude using any such new types, since they may well be useful. (And, according to a comment by #JensGustedt, who knows a lot more about the workings of the C working group than I do, there will be a couple of new type names in existing headers in C2x.)
The _t suffix is not reserved by ISO 9899 as such. The future library directions for C11 revision does only say that (C11 7.31.10):
Typedef names beginning with int or uint and ending with _t may be
added to the types defined in the <stdint.h> header. [...]
That said, there are great many types with _t suffix defined in C11:
char16_t, char32_t, clock_t, cnd_t, constraint_handler_t, div_t, double_t, errno_t, fenv_t, fexcept_t, float_t, fpos_t, imaxdiv_t,
int_fastN_t, int_leastN_t, intmax_t, intN_t, intptr_t, ldiv_t, lldiv_t, max_align_t, mbstate_t, mtx_t, ptrdiff_t, rsize_t,
sig_atomic_t, size_t, thrd_start_t, thrd_t, time_t, tss_dtor_t, tss_t, uint_fastN_t, uint_leastN_t,
uintmax_t, uintN_t, uintptr_t, wchar_t, wctrans_t, wctype_t, wint_t
POSIX, on the other hand, reserves the _t suffix for system use. The POSIX 1003.1 rationale has this excerpt:
To allow implementors to provide their own types, all conforming applications are required to avoid symbols ending in _t, which permits the implementor to provide additional types.
All in all, considering that the chances are that you want to use your C code in a POSIX system now or later, to steer away from using _t for your own types.
Standard C allows you to use the _t suffix so long as you don't end up with a token that starts with a double underscore. (Note that C++ restricts this further in that a double underscore is not allowed anywhere in the token; worth adhering to should you anticipate your code reaching C++.)
It's POSIX that reserves _t.

C variable name beginning with _ allowed, and (0x8) meaning? [duplicate]

I am trying to understand when a developer needs to define a C variable with preceding '_'. What is the reason for it?
For example:
uint32_t __xyz_ = 0;
Maybe this helps, from C99, 7.1.3 ("Reserved Identifiers"):
All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
Moral: For ordinary user code, it's probably best not to start identifiers with an underscore.
(On a related note, I think you should also stay clear from naming types with a trailing _t, which is reserved for standard types.)
It is a trick used in the header files of C implementations for global symbols, in order to prevent eventual conflicts with other symbols defined by the user.
Since C lacks a namespace feature, this is a rudimentary approach to avoid name collisions with the user.
Declaring such symbols in your own header and source files is not encouraged because it can introduce naming conflicts between your code and the C implementation. Even if that doesn't produce a conflict on your current implementation, you are still prone to strange conflicts across different/future implementations, since they are free to use other symbols prefixed with underscores.
whether its C or not, the leading underscore provides the programmer a status indication so he does not have to go look it up. In PHP, or any object oriented language where we deal with tens of thousands of properties and methods written by 1000's of authors, seeing an underscore prefix removes the need to go dig through the class andlook up whether its declared private, or protected or public. thats an immense time saver. the practice started before C, i am sure...

macro expansion order with included files

Let's say I have a macro in an inclusion file:
// a.h
#define VALUE SUBSTITUTE
And another file that includes it:
// b.h
#define SUBSTITUTE 3
#include "a.h"
Is it the case that VALUE is now defined to SUBSTITUTE and will be macro expanded in two passes to 3, or is it the case that VALUE has been set to the macro expanded value of SUBSTITUTE (i.e. 3)?
I ask this question in the interest of trying to understand the Boost preprocessor library and how its BOOST_PP_SLOT defines work (edit: and I mean the underlying workings). Therefore, while I am asking the above question, I'd also be interested if anyone could explain that.
(and I guess I'd also like to know where the heck to find the 'painted blue' rules are written...)
VALUE is defined as SUBSTITUTE. The definition of VALUE is not aware at any point that SUBSTITUTE has also been defined. After VALUE is replaced, whatever it was replaced by will be scanned again, and potentially more replacements applied then. All defines exist in their own conceptual space, completely unaware of each other; they only interact with one another at the site of expansion in the main program text (defines are directives, and thus not part of the program proper).
The rules for the preprocessor are specified alongside the rules for C proper in the language standard. The standard documents themselves cost money, but you can usually download the "final draft" for free; the latest (C11) can be found here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
For at-home use the draft is pretty much equivalent to the real thing. Most people who quote the standard are actually looking at copies of the draft. (Certainly it's closer to the actual standard than any real-world C compiler is...)
There's a more accessible description of the macro rules in the GCC manual: http://gcc.gnu.org/onlinedocs/cpp/Self_002dReferential-Macros.html
Additionally... I couldn't tell you much about the Boost preprocessor library, not having used it, but there's a beautiful pair of libraries by the same authors called Order and Chaos that are very "clean" (as macro code goes) and easy to understand. They're more academic in tone and intended to be pure rather than portable; which might make them easier reading.
(Since I don't know Boost PP I don't know how relevant this is to your question but) there's also a good introductory example of the kids of techniques these libraries use for advanced metaprogramming constructs in this answer: Is the C99 preprocessor Turing complete?

What does double underscore ( __const) mean in C?

extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
I found the above function definition in /usr/include/netinet/ether.h on a Linux box.
Can someone explain what the double underscores mean in front of const (keyword), addr (identifier) and at last __THROW.
In C, symbols starting with an underscore followed by either an upper-case letter or another underscore are reserved for the implementation. You as a user of C should not create any symbols that start with the reserved sequences. In C++, the restriction is more stringent; you the user may not create a symbol containing a double-underscore.
Given:
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
The __const notation is there to allow for the possibility (somewhat unlikely) that a compiler that this code is used with supports prototype notations but does not have a correct understanding of the C89 standard keyword const. The autoconf macros can still check whether the compiler has working support for const; this code could be used with a broken compiler that does not have that support.
The use of __hostname and __addr is a protection measure for you, the user of the header. If you compile with GCC and the -Wshadow option, the compiler will warn you when any local variables shadow a global variable. If the function used just hostname instead of __hostname, and if you had a function called hostname(), there'd be a shadowing. By using names reserved to the implementation, there is no conflict with your legitimate code.
The use of __THROW means that the code can, under some circumstances, be declared with some sort of 'throw specification'. This is not standard C; it is more like C++. But the code can be used with a C compiler as long as one of the headers (or the compiler itself) defines __THROW to empty, or to some compiler-specific extension of the standard C syntax.
Section 7.1.3 of the C standard (ISO 9899:1999) says:
7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any use or for use as file
scope identifiers.
— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.154)
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.
Footnote 154) The list of reserved identifiers with external linkage includes errno, math_errhandling,
setjmp, and va_end.
See also What are the rules about using an underscore in a C++ identifier; a lot of the same rules apply to both C and C++, though the embedded double-underscore rule is in C++ only, as mentioned at the top of this answer.
C99 Rationale
The C99 Rationale says:
7.1.3 Reserved identifiers
To give implementors maximum latitude in packing library functions into files, all external
identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has
the same specification. Thus, for instance, strtod may be defined in the same object module as printf, with no fear that link-time conflicts will occur. Equally, strtod may call printf, or printf may call strtod, for whatever reason, with no fear that the wrong function will be called.
Also reserved for the implementor are all external identifiers beginning with an underscore, and all other identifiers beginning with an underscore followed by a capital letter or an underscore. This gives a name space for writing the numerous behind-the-scenes non-external macros and functions a library needs to do its job properly.
With these exceptions, the Standard assures the programmer that all other identifiers are available, with no fear of unexpected collisions when moving programs from one
implementation to another5. Note, in particular, that part of the name space of internal identifiers beginning with underscore is available to the user: translator implementors have not been the only ones to find use for “hidden” names. C is such a portable language in many respects that the issue of “name space pollution” has been and is one of the principal barriers to writing completely portable code. Therefore the Standard assures that macro and typedef names are reserved only if the associated header is explicitly included.
5 See §6.2.1 for a discussion of some of the precautions an implementor should take to keep this promise. Note also that any implementation-defined member names in structures defined in <time.h> and <locale.h> must begin with an underscore, rather than following the pattern of other names in those structures.
And the relevant part of the rationale for §6.2.1 Scopes of identifiers is:
Although the scope of an identifier in a function prototype begins at its declaration and ends at the end of that function’s declarator, this scope is ignored by the preprocessor. Thus an identifier
in a prototype having the same name as that of an existing macro is treated as an invocation of that macro. For example:
#define status 23
void exit(int status);
generates an error, since the prototype after preprocessing becomes
void exit(int 23);
Perhaps more surprising is what happens if status is defined
#define status []
Then the resulting prototype is
void exit(int []);
which is syntactically correct but semantically quite different from the intent.
To protect an implementation’s header prototypes from such misinterpretation, the implementor must write them to avoid these surprises. Possible solutions include not using identifiers in prototypes, or using names in the reserved name space (such as __status or _Status).
See also P J Plauger The Standard C Library (1992) for an extensive discussion of name space rules and library implementations. The book refers to C90 rather than any later version of the standard, but most of the implementation advice in it remains valid to this day.
Names with double leading underscores are reserved for use by the implementation. This does not necessarily mean they are internal per se, although they often are.
The idea is, you're not allowed to to use any names starting with __, so the implementation is free to use them in places like macro expansions, or in the names of syntax extensions (e.g. __gcnew is not part of C++, but Microsoft can add it to C++/CLI confident that no existing code should have something like int __gcnew; in it that would stop compiling).
To find out what these specific extensions mean, i.e. __const you'll need to consult the documentation for your specific compiler/platform. In this particular case, you should probably consider the prototype in the documentation (e.g. http://www.kernel.org/doc/man-pages/online/pages/man3/ether_aton.3.html) to be the function's interface and ignore the __const and __THROW decorations that appear in the actual header.
By convention in some libraries, this indicates that a particular symbol is for internal use and not intended to be part of the public API of the library.
The underscore in __const means that this keyword is a compiler extension and using it is not portable (The const keyword was added to C in a later revision, 89 I think).
The __THROW is also some kind of extension, I assume that it gets defined to some __attribute__(something) if gcc is used, But I'm not sure on that and too lazy to check.
The __addr can mean anything the programmer wanted it to mean, It's just a name.

Whatever happened to the 'entry' keyword?

While cruising through my white book the other day, I noticed in the list of C keywords.
entry is one of the keywords on that list.
It is reserved for future use. Thinking back to my Fortran days, there was a function of some sort that used an entry statement to make a second argument signature, or entry point into a function.
Is this what entry was originally intended to be used for? or something completely different?
What is the story on the entry keyword?
I had no idea, so I googled to find something about this. This is what I found.
First, it was included as a reserved keyword.
Q: What was the entry keyword mentioned in K&R1?
A: It was reserved to allow functions with multiple, differently-named entry points, but it has been withdrawn.
(From http://archives.devshed.com/forums/c-c-134/c-programming-faqs-371017.html.)
It was never standardized; some compilers used it, in a very personal way.
It was later declared obsolete, I guess.
In FORTRAN, "ENTRY" could declare a second entry point into a subroutine. It was a structured programming nightware, and fortunately C decided not to adopt it.
The entry keyword came from PL/I and allowed multiple entry points into a function. The keyword was implemented by some compilers but was never standardized.
To complement the accepted answer 'entry' is mentioned in K&R1:
2.3 Keywords
The following identifiers are reserved for use as keywords, and may not be used otherwise
int extern else
char register for
float typedef do
double static while
struct goto switch
union return case
long sizeof default
short break entry
unsigned continue
auto if
and here:
The entry keyword is not currently implemented by any compiler but is
reserved for future use. Some implementations also reserve the words 'fortran'
and 'asm'.
Then in the Rationale for the ANSI C language (C89) it is mentioned here:
3.1.1 Keyword
[...]
The keywords 'entry' 'fortran', and 'asm' have not been included since they were either never used, or are not portable. Uses of 'fortran' and 'asm' as keywords are not as common extensions.

Resources