What is the conventional way to set up your include guards? I usually write them as (for example.h):
#ifndef _EXAMPLE_H_
#define _EXAMPLE_H_
#include "example.h"
#endif
Does underscore convention matter? I've seen conflicting information when I googled this. Does the _EXAMPLE_H_ even have to match the name of the header?
Does underscore convention matter?
Yes. It matters.
Identifiers with a leading underscore followed upper case letter is reserved for implementation. So what you have would cause undefined behaviour.
The following is the C standard's specification for naming the identifiers (C11 draft):
7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its
associated subclause, and optionally declares or defines identifiers
listed in its associated future library directions subclause and
identifiers which are always reserved either for any use or for use as
file scope identifiers.
— All identifiers that begin with an underscore and either an
uppercase letter or another underscore are always reserved for any
use.
— All identifiers that begin with an underscore are always reserved
for use as identifiers with file scope in both the ordinary and tag
name spaces.
— Each macro name in any of the following subclauses (including the
future library directions) is reserved for use as specified if any of
its associated headers is included; unless explicitly stated otherwise
(see 7.1.4). — All identifiers with external linkage in any of the
following subclauses (including the future library directions) and
errno are always reserved for use as identifiers with external
linkage.184) — Each identifier with file scope listed in any of the
following subclauses (including the future library directions) is
reserved for use as a macro name and as an identifier with file scope
in the same name space if any of its associated headers is included.
No other identifiers are reserved. If the program declares or defines
an identifier in a context in which it is reserved (other than as
allowed by 7.1.4), or defines a reserved identifier as a macro name,
the behavior is undefined.
If the program removes (with #undef) any macro definition of an
identifier in the first group listed above, the behavior is undefined.
Without violating any of the above, the include guard name can be anything and doesn't have to be the name of the header file. But usually the convention I have seen/used is to use same name as that of the header file name so that it doesn't cause any unnecessary confusion.
There is no absolute requirement as to how include guards are named. It does not have to match the header name. I have seen (and used myself) some that use a UUID, basically composed of a randomly generated hexadecimal string.
Technically as KingsIndian said, identifiers beginning with underscores are reserved:
The rules, paraphrased from ANSI Sec. 4.1.2.1, are:
1. All identifiers beginning with an underscore followed
by an upper-case letter or another underscore are always
reserved (all scopes, all namespaces).
2. All identifiers beginning with an underscore are reserved
for ordinary identifiers (functions, variables, typedefs, enumeration
constants) with file scope.
...
comp.lang.c FAQ list · Question 1.29
Maybe the new ISO C11 (?) standard relaxes these rules, but that's been the bottom line for a while.
Related
Is this identifier non-problematic:
_var
C11, 7.1.3 Reserved identifiers, 1
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
Does it follow from this that user-defined identifiers beginning with a single underscore are non-problematic?
Yes. As long as:
are at block scope (includes enum/struct/union tags)
OR are struct/union members
OR are function parameters
_ is followed by neither capital nor another underscore
E.g.
struct X { int _a; };
int main() { int _a; }
void foo(int _a);
No, there is a problem.
You forgot to include one more quote
All identifiers that begin with an underscore and either an uppercase
letter or another underscore are always reserved for any use.
So you may not declare an identifier beginning with one underscore followed by an uppercase letter.
In general it is a bad style of programming using identifiers starting with underscore because the reader of the code can think that this identifier is reserved by the implementation.
Does it follow from this that user-defined identifiers beginning with a single underscore are non-problematic?
No, that list item merely tells you certain things are problematic. It makes no statement that other things are non-problematic.
The same paragraph tells you that all identifiers listed in the header subclauses are reserved if header that declares them is included, possibly for any use, so they are problematic. There are additional issues listed in that paragraph.
C 2018 6.4.2.1 5 and 6 tell you that identifiers longer than the minimums listed in 5.2.4.1 (63 characters for internal identifiers, 31 for external) may be a problem; the behavior is not defined if two identifiers differ only beyond the limit of significant characters the implementation imposes.
C 2018 6.4.2 also allows identifiers with implementation-defined characters, so such identifiers may work in some implementations and not others.
Inside the main function or user defined function, you can write like that
You won't get any compile error.
int main()
{
int _var;
float _var1;
}
It's common in C (and other languages) to use prefixes and suffixes for names of variables and functions. Particularly, one occasionally sees the use of underscores, before or after a "proper" identifier, e.g. _x and _y variables, or _print etc. But then, there's also the common wisdom of avoiding names starting with underscore, so as to not clash with the C standard library implementation.
So, where and where is it ok to use underscores?
Good-enough rule of thumb
Don't start your identifier with an underscore.
That's it. You might still have a conflict with some file-specific definitions (see below), but those will just get you an error message which you can take care of.
Safe, slightly restrictive, rule of thumb
Don't start your identifier with:
An underscore.
Any 1-3 letter prefix, followed by an underscore, which isn't a proper word (e.g. a_, st_)
memory_ or atomic_.
and don't end your identifier with either _MIN or _MAX.
These rules forbid a bit more than what is actually reserved, but are relatively easy to remember.
More detailed rules
This is based on the C2x standard draft (and thus covers previous standards' reservations) and the glibc documentation.
Don't use:
The prefix __ (two underscores).
A prefix of one underscore followed by a capital letter (e.g. _D).
For identifiers visible at file scope - the prefix _.
The following prefixes with underscores, when followed by a lowercase letter: atomic_, memory_, memory_order_, cnd_, mtx_, thrd_, tss_
The following prefixes with underscores, when followed by an uppercase ltter : LC_, SIG_, ATOMIC, TIME_
The suffix _t (that's a POSIX restriction; for C proper, you can use this suffix unless your identifier begins with int or uint)
Additional restrictions are per-library-header-file rather than universal (some of these are POSIX restrictions):
If you use header file...
You can't use identifiers with ...
dirent.h
Prefix d_
fcntl.h
Prefixes l_, F_, O_, and S_
grp.h
Prefix gr_
limits.h
Suffix _MAX (also probably _MIN)
pwd.h
Prefix pw_
signal.h
Prefixes sa_ and SA_
sys/stat.h
Prefixes st_ and S_
sys/times.h
Prefix tms_
termios.h
Prefix c_
And there are additional restrictions not involving underscores of course.
The C standard, library chapter, reserves certain identifiers (emphasis mine):
C17 7.1.3 Reserved identifiers
— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) and errno are always reserved for use as identifiers with
external linkage.184)
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
Where "reserved for any use" means reserved for the compiler/standard library, see What's the meaning of "reserved for any use"? "Reserved for the implementation" also means reserved for the compiler/standard library.
Furthermore, Future library directions C17.31 reserve a lot of identifiers - it's a big chapter, I'll only quote the most notable parts:
7.31.10 Integer types <stdint.h>
Typedef names beginning with int or uint and ending with _t may be added to the
types defined in the <stdint.h> header. Macro names beginning with INT or UINT
and ending with _MAX, _MIN, or _C may be added to the macros defined in the
<stdint.h> header.
7.31.12 General utilities <stdlib.h>
Function names that begin with str and a lowercase letter may be added to the
declarations in the <stdlib.h> header.
7.31.13 String handling <string.h>
Function names that begin with str, mem, or wcs and a lowercase letter may be added to the declarations in the <string.h> header.
To answer your question directly:
So, where and where is it ok to use underscores?
Strictly speaking: nowhere. You should never declare identifiers starting with underscore, since they may clash with the standard library or language keywords etc. Though as is hinted from the bold text above, you may use one underscore followed by lower case in a local namespace.
I can read in many books and an other SO questions that the standard may expand the set of identifiers such as size_t or int32_t, so it reserves any use of the _t suffix for identifiers.
Is that true?
I could not find anything that discourage the use of this suffix in the ISO9899:1999 standard, but that standard is hard to read :(
No, it's not true.
The standard reserves the right to add identifiers starting with int or uint and ending _t to the stdint.h header (§7.31.10). Those identifiers are technically only reserved if that header is included but since it almost always is, they should be treated as reserved.
In general, the standard reserves identifiers defined in standard headers, or mentioned in the future directions for standard headers (§7.31). Identifiers having external linkage (library functions) are reserved for that use (which doesn't stop you from using them as local or static variables, for example). If the library header is included, then its identifiers are reserved for use at file scope. Read §7.1.3 for details.
As that section indicates, the only identifiers unconditionally reserved are those which start with an underscore followed by a capital letter or a second underscore.
While reading the standard, it's important to understand the difference between contexts in which a name is reserved:
Reserved for any use (identifiers starting with an underscore followed by another underscore or a capital letter): these identifiers may be used by the implementation as macros or special symbols which are handled in some idiosyncratic way by the compiler. Do not ever define one of these in your code, and use the ones which are documented only as indicated by the documentation. Do not use such a symbol if it is not documented, even if you see it used in some standard library header. Or someone else's code.
Reserved at file scope (other identifiers starting with an underscore, not part of any standard header): These identifiers will not be used as macros, and you must not define them as macros either. You may use them as local variables, labels, parameters, and struct or union members. Personally, I wouldn't do this, but it's permitted. I prefer to put an underscore at the end of an identifier which is used in some internal context.
Reserved at file scope and as a macro name (any identifier mentioned in an included standard header, including in the future directions clause): Again, since these identifiers may be macros, you should treat them as off limits if you #include the associated header. The standard does allow you to #undef an identifier used as a function name in the standard library, although you might find that performance suffers because the macro wraps a construct with equivalent semantics but optimised performance.
Reserved for use as an identifier with external linkage (any identifier defined in any standard library header as having external linkage, whether or not the header is included, including the identifier errno): The weakest reservation. If you don't include the associated header, you're free to use such an identifier, even at file scope, as long as it is not externally visible. So it could be a file-scope static or an enumeration member or the tag of a struct or union. The point of this clause is not to allow you to deliberately shadow the name of a standard library function. Rather, it is to protect you from future additions to the standard library which might export an external symbol you're currently using. Of course, if your current use is as an externally visible identifier, you're still going to have a future problem. But on the whole, externally visible symbols should be prefixed with a package name to avoid name collisions with other libraries.
Having said all that, it's unwise to use an identifier that looks like it might be a standard identifier. Posix includes a list of over a hundred patterns for identifier names that it might use in the future, including all identifiers ending _t, so if you expect your code to be used in a Posix environment, you'll want to avoid those names. And while future C standard revisions might avoid adding new type names to existing headers (aside from the integer typenames mentioned above), you don't really want to preclude using any such new types, since they may well be useful. (And, according to a comment by #JensGustedt, who knows a lot more about the workings of the C working group than I do, there will be a couple of new type names in existing headers in C2x.)
The _t suffix is not reserved by ISO 9899 as such. The future library directions for C11 revision does only say that (C11 7.31.10):
Typedef names beginning with int or uint and ending with _t may be
added to the types defined in the <stdint.h> header. [...]
That said, there are great many types with _t suffix defined in C11:
char16_t, char32_t, clock_t, cnd_t, constraint_handler_t, div_t, double_t, errno_t, fenv_t, fexcept_t, float_t, fpos_t, imaxdiv_t,
int_fastN_t, int_leastN_t, intmax_t, intN_t, intptr_t, ldiv_t, lldiv_t, max_align_t, mbstate_t, mtx_t, ptrdiff_t, rsize_t,
sig_atomic_t, size_t, thrd_start_t, thrd_t, time_t, tss_dtor_t, tss_t, uint_fastN_t, uint_leastN_t,
uintmax_t, uintN_t, uintptr_t, wchar_t, wctrans_t, wctype_t, wint_t
POSIX, on the other hand, reserves the _t suffix for system use. The POSIX 1003.1 rationale has this excerpt:
To allow implementors to provide their own types, all conforming applications are required to avoid symbols ending in _t, which permits the implementor to provide additional types.
All in all, considering that the chances are that you want to use your C code in a POSIX system now or later, to steer away from using _t for your own types.
Standard C allows you to use the _t suffix so long as you don't end up with a token that starts with a double underscore. (Note that C++ restricts this further in that a double underscore is not allowed anywhere in the token; worth adhering to should you anticipate your code reaching C++.)
It's POSIX that reserves _t.
In the standard library (glibc) I see functions defined with leading double underscores, such as __mmap in sys/mman.h. What is the purpose? And how can we still call a function mmap which doesn't seem to be declared anywhere. I mean we include sys/mman.h for that, but sys/mman.h doesn't declare mmap, it declares only __mmap.
From GNU's manual:
In addition to the names documented in this manual, reserved names
include all external identifiers (global functions and variables) that
begin with an underscore (‘_’) and all identifiers regardless of use
that begin with either two underscores or an underscore followed by a
capital letter are reserved names. This is so that the library and
header files can define functions, variables, and macros for internal
purposes without risk of conflict with names in user programs.
This is a convention which is also used by C and C++ vendors.
Names with leading double underscore are reserved for internal use by the implementation (compiler/standard library/etc.). They should never appear in your code. The purpose of this reserved namespace is to give the system headers names they can use without potentially clashing with names used in your program.
ISO 9899:2011
7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its
associated subclause, and optionally declares or defines identifiers
listed in its associated future library directions subclause and
identifiers which are always reserved either for any use or for use as
file scope identifiers.
— All identifiers that begin with an
underscore and either an uppercase letter or another underscore are
always reserved for any use.
— All identifiers that begin with an
underscore are always reserved for use as identifiers with file scope
in both the ordinary and tag name spaces.
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
I found the above function definition in /usr/include/netinet/ether.h on a Linux box.
Can someone explain what the double underscores mean in front of const (keyword), addr (identifier) and at last __THROW.
In C, symbols starting with an underscore followed by either an upper-case letter or another underscore are reserved for the implementation. You as a user of C should not create any symbols that start with the reserved sequences. In C++, the restriction is more stringent; you the user may not create a symbol containing a double-underscore.
Given:
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
The __const notation is there to allow for the possibility (somewhat unlikely) that a compiler that this code is used with supports prototype notations but does not have a correct understanding of the C89 standard keyword const. The autoconf macros can still check whether the compiler has working support for const; this code could be used with a broken compiler that does not have that support.
The use of __hostname and __addr is a protection measure for you, the user of the header. If you compile with GCC and the -Wshadow option, the compiler will warn you when any local variables shadow a global variable. If the function used just hostname instead of __hostname, and if you had a function called hostname(), there'd be a shadowing. By using names reserved to the implementation, there is no conflict with your legitimate code.
The use of __THROW means that the code can, under some circumstances, be declared with some sort of 'throw specification'. This is not standard C; it is more like C++. But the code can be used with a C compiler as long as one of the headers (or the compiler itself) defines __THROW to empty, or to some compiler-specific extension of the standard C syntax.
Section 7.1.3 of the C standard (ISO 9899:1999) says:
7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any use or for use as file
scope identifiers.
— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.154)
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.
Footnote 154) The list of reserved identifiers with external linkage includes errno, math_errhandling,
setjmp, and va_end.
See also What are the rules about using an underscore in a C++ identifier; a lot of the same rules apply to both C and C++, though the embedded double-underscore rule is in C++ only, as mentioned at the top of this answer.
C99 Rationale
The C99 Rationale says:
7.1.3 Reserved identifiers
To give implementors maximum latitude in packing library functions into files, all external
identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has
the same specification. Thus, for instance, strtod may be defined in the same object module as printf, with no fear that link-time conflicts will occur. Equally, strtod may call printf, or printf may call strtod, for whatever reason, with no fear that the wrong function will be called.
Also reserved for the implementor are all external identifiers beginning with an underscore, and all other identifiers beginning with an underscore followed by a capital letter or an underscore. This gives a name space for writing the numerous behind-the-scenes non-external macros and functions a library needs to do its job properly.
With these exceptions, the Standard assures the programmer that all other identifiers are available, with no fear of unexpected collisions when moving programs from one
implementation to another5. Note, in particular, that part of the name space of internal identifiers beginning with underscore is available to the user: translator implementors have not been the only ones to find use for “hidden” names. C is such a portable language in many respects that the issue of “name space pollution” has been and is one of the principal barriers to writing completely portable code. Therefore the Standard assures that macro and typedef names are reserved only if the associated header is explicitly included.
5 See §6.2.1 for a discussion of some of the precautions an implementor should take to keep this promise. Note also that any implementation-defined member names in structures defined in <time.h> and <locale.h> must begin with an underscore, rather than following the pattern of other names in those structures.
And the relevant part of the rationale for §6.2.1 Scopes of identifiers is:
Although the scope of an identifier in a function prototype begins at its declaration and ends at the end of that function’s declarator, this scope is ignored by the preprocessor. Thus an identifier
in a prototype having the same name as that of an existing macro is treated as an invocation of that macro. For example:
#define status 23
void exit(int status);
generates an error, since the prototype after preprocessing becomes
void exit(int 23);
Perhaps more surprising is what happens if status is defined
#define status []
Then the resulting prototype is
void exit(int []);
which is syntactically correct but semantically quite different from the intent.
To protect an implementation’s header prototypes from such misinterpretation, the implementor must write them to avoid these surprises. Possible solutions include not using identifiers in prototypes, or using names in the reserved name space (such as __status or _Status).
See also P J Plauger The Standard C Library (1992) for an extensive discussion of name space rules and library implementations. The book refers to C90 rather than any later version of the standard, but most of the implementation advice in it remains valid to this day.
Names with double leading underscores are reserved for use by the implementation. This does not necessarily mean they are internal per se, although they often are.
The idea is, you're not allowed to to use any names starting with __, so the implementation is free to use them in places like macro expansions, or in the names of syntax extensions (e.g. __gcnew is not part of C++, but Microsoft can add it to C++/CLI confident that no existing code should have something like int __gcnew; in it that would stop compiling).
To find out what these specific extensions mean, i.e. __const you'll need to consult the documentation for your specific compiler/platform. In this particular case, you should probably consider the prototype in the documentation (e.g. http://www.kernel.org/doc/man-pages/online/pages/man3/ether_aton.3.html) to be the function's interface and ignore the __const and __THROW decorations that appear in the actual header.
By convention in some libraries, this indicates that a particular symbol is for internal use and not intended to be part of the public API of the library.
The underscore in __const means that this keyword is a compiler extension and using it is not portable (The const keyword was added to C in a later revision, 89 I think).
The __THROW is also some kind of extension, I assume that it gets defined to some __attribute__(something) if gcc is used, But I'm not sure on that and too lazy to check.
The __addr can mean anything the programmer wanted it to mean, It's just a name.