This question already has answers here:
What does double underscore ( __const) mean in C?
(4 answers)
What is the reason for underscore in C variable name definition?
(3 answers)
Closed 8 years ago.
I was reading K&R book. I read:
...name intended for use only by functions of the standard library
begin with _ so they are less likely to collide with the names in
the user program...
What does this exactly means please explain real simple and practical way.
What i understood is:
if i want to use sqrt defined in math.h then
#include <math.h>
#define sqrt(x) x*x*x
main()
{
int x=4;
_sqrt(x); // That is from the header file math.h
sqrt(x); // my own defined macro
/*or its the reverse way _sqrt for my own defined macro so it won't collide with original sqrt i.e. without _ for sqrt from math.h */
return 0;
}
Now, I read a code on stackoverflow using __. sys/syscall.h is not present in windows so we have to use
#if __linux
#include <sys/syscall.h>
#elif defined(_WIN32) || defined(_WIN64)
#include <windows.h>
#endif
Where exactly is __ used and what's the difference b/w __ & _.
Here's what the C standard says (section 7.1.3):
All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
(The section goes on to list specific identifiers and sets of identifiers reserved by certain standard headers.)
What this means is that for example, the implementation (either the compiler or a standard header) can use the name __FOO for anything it likes. If you define that identifier in your own code, your program's behavior is undefined. If you're "lucky", you'll be using an implementation that doesn't happen to define it, and your program will work as expected.
This means you simply should not define any such identifiers in your own code (unless your own code is part of a C implementation -- and if you have to ask, it isn't). There's no need to define such identifiers anyway; there's hardly any shortage of unreserved identifiers.
You can use an identifier like _foo as long as it's defined locally (not at file scope) -- but personally I find it much easier just to avoid using leading underscores at all.
Incidentally, your example of _sqrt doesn't necessarily illustrate the point. An implementation may define the name _sqrt in <math.h> (since anything defined there is at file scope), but there's no particular reason to expect that it will do so. When I compile your program, I get a warning:
c.c:7:1: warning: implicit declaration of function ‘_sqrt’ [-Wimplicit-function-declaration]
because <math.h> on my system doesn't define that identifier, and a link-time fatal error:
/tmp/cc1ixRmL.o: In function `main':
c.c:(.text+0x1a): undefined reference to `_sqrt'
because there's no such symbol in the library.
It's a naming convention, this means that violating this rule will not immediately and directly lead to breaking your program, but it's a really really really really really really [ + infinite times ] a good idea to follow the convention.
The essence of the convention is to reserve :
naming starting with _ for the language entities, which includes the standard library
naming starting with __ for the compiler internals
it's also a really platform specific topic most of the times, many vendors respect this convention but they also have their own naming conventions and guidelines .
You can find more by search for c double underscore naming convention
tl; dr You got it backwards. Name your own stuff without leading underscores, unless you're writing a library for someone else. The standard library and compilers use the technique to signal that certain names are internal, and not to be used directly.
Underscores for Uniqueness
In C, there are no namespaces. In other words, all names included into a file can collide with each other. If foo.h and bar.h both define x, an error will occur when they are both included.
Now, x is a pretty common name. Collision is almost guaranteed, and the writers of foo.h and bar.h must realize that. So, in the interest of avoiding future problems for the programmers that will use their code, they change the name to _x.
Alternatives
Common names do occur. Before resorting to underscoring, try:
Separating private from public variables in .c and .h files. Most clashing names are private, and don't belong in the header.
Prefixing your code with the name of the module: foo_x and bar_x won't collide.
Related
I'm new to C and have read that each function may only be defined once, but I can't seem to reconcile this with what I'm seeing in the console. For example, I am able to overwrite the definition of printf without an error or warning:
#include <stdio.h>
extern int printf(const char *__restrict__format, ...) {
putchar('a');
}
int main() {
printf("Hello, world!");
return 0;
}
So, I tried looking up the one-definition rule in the standard and found Section 6.9 (5) on page 155, which says (emphasis added):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier delared with external linkage is used in an expression [...], somewhere in the entire program there shall be exactly one external definition for that identifier; otherwise, there shall be no more than one.
My understanding of linkage is very shaky, so I'm not sure if this is the relevant clause or what exactly is meant by "entire program". But if I take "entire program" to mean all the stuff in <stdio.h> + my source file, then shouldn't I be prohibited from redefining printf in my source file since it has already been defined earlier in the "entire program" (i.e. in the stdio bit of the program)?
My apologies if this question is a dupe, I couldn't find any existing answers.
The C standard does not define what happens if there is more than one definition of a function.
… shouldn't I be prohibited…
The C standard has no jurisdiction over what you do. It specifies how C programs are interpreted, not how humans may behave. Although some of its rules are written using “shall,” this is not a command to the programmer about what they may or may not do. It is a rhetorical device for specifying the semantics of C programs. C 2018 4 2 tells us what it actually means:
If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined…
So, when you provide a definition of printf and the standard C library provides a definition of printf, the C standard does not specify what happens. In common practice, several things may happen:
The linker uses your printf. The printf in the library is not used.
The compiler has built-in knowledge of printf and uses that in spite of your definition of printf.
If your printf is in a separate source module, and that module is compiled and inserted into a library, then which printf the program uses depends on the order the libraries are specified to the linker.
While the C standard does not define what happens if there are multiple definitions of a function (or an external symbol in general), linkers commonly do. Ordinarily, when a linker processes a library file, its behavior is:
Examine each module in the library. If the module defines a symbol that is referenced by a previously incorporated object module but not yet defined, then include that module in the output the linker is building. If the module does not define any such symbol, do not use it.
Thus, for ordinary functions, the behavior of multiple definitions that appear in library files is defined by the linker, even though it is not defined by the C standard. (There can be complications, though. Suppose a program uses cos and sin, and the linker has already included a module that defines cos when it finds a library module that defines both sin and cos. Because the linker has an unresolved reference to sin, it includes this library module, which brings in a second definition of cos, causing a multiple-definition error.)
Although the linker behavior may be well defined, this still leaves the issue that compilers have built-in knowledge about the standard library functions. Consider this example. Here, I added a second printf, so the program has:
printf("Hello, world!");
printf("Hello, world!\n");
The program output is “aHello, world.\n”. This shows the program used your definition for the first printf call but used the standard behavior for the second printf call. The program behaves as if there are two different printf definitions in the same program.
Looking at the assembly language shows what happens. For the second call, the compiler decided that, since printf("Hello, world!\n"); is printing a string with no conversion specifications and ending with a new-line character, it can use the more-efficient puts routine instead. So the assembly language has call puts for the second printf. The compiler cannot do this for the first printf because it does not end with a new-line character, which puts automatically adds.
Please aware of declaration and definition. The term are totally different.
stdio.h only provide the declaration. And therefore, when you declare/define in your file, as long as the prototype is similar, it is fine with this.
You are free to define in your source file. And if it is available, the final program will link to the yours instead of the one in library.
For macros, are there any name limitations other than it needs to be an identifier? For example, would something like the following be valid?
#define assert getchar
#include <stdio.h>
int main(void)
{
assert();
}
Code link: https://godbolt.org/z/ra63na.
main:
push rbp
mov rbp, rsp
mov eax, 0
call getchar
mov eax, 0
pop rbp
ret
And does the preprocessor have any knowledge of the C language? Or is it more like a find-and-replace program?
For macros, are there any name limitations other than it needs to be an identifier?
Yes, they are subject to the provisions of section 7.1.3 of the language specification ("Reserved Identifiers"), in particular:
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use
[including as macro names].
[...]
Each macro name in any of the [standard library specification] subclauses (including the future library directions) is reserved for
use as specified if any of its associated headers is included; unless
explicitly stated otherwise
[...]
Each identifier with file scope listed in any of the [standard library specification] subclauses (including the future library
directions) is reserved for use as a macro name and as an identifier
with file scope in the same name space if any of its associated
headers is included.
[...] If the program declares or defines an identifier in a context in
which it is reserved (other than as allowed by 7.1.4), or defines a
reserved identifier as a macro name, the behavior is undefined.
The second bullet point in particular would be relevant to your example code if it also included the assert.h header. The identifier assert would then reserved for use as a macro name. That you use it as one would trigger undefined behavior. That does not place any particular requirements on the implementation -- in fact that's exactly the meaning of "undefined behavior". It does not require the implementation to accept the code, nor to reject it, nor to emit any kind of diagnostic in either case. If it did accept it, the preprocessor would not be required to perform macro substitution on assert, nor would it be forbidden to do so, nor, in fact, would it be required to behave in a way that seems in any way rational or predictable.
Similar would apply based on the third bullet point if you defined getchar as a macro name in code that includes stdio.h, as the example does. The code actually presented is ok, however.
You also ask,
And does the preprocessor have any knowledge of the C language? Or is
it more like a find-and-replace program?
A little. The C preprocessor is not a general-purpose macro language, and attempts to use it as one often go poorly. The preprocessor's input is a series of tokens, determined according to rules consistent with C syntax, and it uses the same syntax for identifiers that C does. Conditional inclusion directives recognize a subset of the arithmetic expressions of C, and they work in terms of one of the host implementation's integer data types. The preprocessor (or at least the tokenization stage preceding it) understands C string literals and character constants, so macro replacement does not affect the contents of these.
This is covered in section 7.1.2 and 7.1.3 of the standard (C11). Here is a selection of rules pertaining to macros:
If used, a header shall be included outside of any external declaration or definition, and it shall first be included before the first reference to any of the functions or objects it declares, or to any of the types or macros it defines.
The program shall not have any macros with names lexically identical to keywords currently defined prior to the inclusion of the header or when any macro defined in the header is expanded.
Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise.
Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
So the exact program you posted is correct, since <assert.h> has not been included. But it would be undefined behaviour if you did include that header.
It's really dumb. It understands enough to do token replacement, but not much more.
For example: #define test fail will replace test in test(...) but not tested or "test".
Since C has a very basic syntax writing a parser that can work through and identify tokens like that is actually not that hard. Making it understand the totality of C syntax is beyond the scope of that tool.
In other words, for an input program like:
#define test fail
int main() {
test(9, "test", tested());
return 0;
}
The C pre-processor breaks this up into tokens that end up something like:
[ "#", "define", "test", "fail" ]
[ "int", "main", "(", ")", "{" ]
[ "test", "(", "9", "\"test\"", "tested", "(", ")", ")", ";" ]
...
Where each of those is processed using the simple pre-processor grammar.
This is slightly more complicated because macros can include arguments, but you get the idea. The grammar used is a simple subset of the whole C grammar.
Yes it is valid. No the pre-processor is not language aware. The pre-processor does exactly what it is told - included content, replaces macros - if that results in invalid syntax, the compiler must detect that.
Other then C symbol naming rules, there are no C language dependencies or reserved words. All pre-processor directives start # which is not a valid C symbol name so there is no need for reserved words.
The pre-processor can be run on its own - either by command line option to the compiler driver or in the of the GUN tool chain it is a standalone executable cpp - making it useful for purposes other than just C and C++ source pre-processing.
Most keywords in C (or in any language for that matter) starts with a letter. But there are some keywords that starts with an underscore? They keywords are: _Alignas, _Alignof, _Atomic, _Bool, _Complex, _Generic, _Imaginary, _Noreturn, _Static_assert and _Thread_local.
I find it amazingly strange. If it was a hidden global constant or internal function that's not really a part of the API, I would understand it. But these are keywords.
I find it extra strange when C actually have a macros called bool and static_assert, and that their implementations is using the very keywords I just mentioned.
C developed and become very popular before it was planned by a standards committee. In consequence, there was a lot of existing code.
When setting a C standard, or updating an old standard, an important goal is not to “break” old code. It is desirable that code that worked with previous compilers continue to work with new versions of the C language.
Introducing a new keyword (or any new definition or meaning of a word) can break old code, since, when compiling, the word will have its new keyword meaning and not the identifier meaning it had with the previous compilers. The code will have to be edited. In addition to the expense of paying people to edit the code, this has a risk of introducing bugs if any mistakes are made.
To deal with this, a rule was made that identifiers starting with underscore were reserved. Making this rule did not break much old software, since most people writing software choose to use identifiers beginning with letters, not underscore. This rule gives the C standard a new ability: By using underscore when adding new keywords or other new meanings for words, it is able to do so without breaking old code, as long as that old code obeyed the rule. For example, adding a new keyword, _Bool for a Boolean type, would not break any code that had not used identifiers beginning with an underscore.
New versions of the C standard sometimes introduce new meanings for words that do not begin with an underscore, such as bool. However, these new meanings are generally not introduced in the core language. Rather, they are introduced only in new headers. In making a bool type, the C standard provided a new header, <stdbool.h>. Since old code could not be including <stdbool.h> since it did not exist when the code was written, defining bool in <stdbool.h> would not break old code. At the same time, it gives programmers writing new code the ability to use the new bool feature by including <stdbool.h>, which defines bool as a macro that is replaced by _Bool.
In the standard, any name that begins with a double underscore or an underscore followed by an uppercase letter is reserved. This is useful because C lacks named namespaces. By reserving all such symbols, new and implementation specific keywords can be introduced to the language without clashing with symbols defined in existing code.
The macros such as bool and static_assert are "convenience macros", they allow you to use the reserved keyword symbols without the underscores and capitals at the small risk of a name clash. However they provide a means to resolve a name clash because unlike a keyword, and macro may be #undefined, or the header that defines it excluded and the internal keyword used directly. Moreover unmodified legacy code will not be broken because by definition it will not include the headers that did not exist at the time of writing
The unadorned keywords have been defined in the language since the language's inception (with the exception of inline and restrict defined since C99), so will not cause a conflict with legacy code symbols. All _Xxxx keywords have been defined at or since C99.
Unlike many languages in common use today, C has been around since the 1970's and standardised since 1989 - there is a huge amount of existing code that must remain compilable on modern compilers while at the same time the language cannot remain unchanged - if it did it might no longer be in such common use.
Eric and Clifford has provided good answers, but I add a quote from the C11 standard to support it.
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
https://port70.net/~nsz/c/c11/n1570.html#7.1.3
One might also consider this:
Typedef names beginning with int or uint and ending with _t may be added to the types defined in the stdint.h header. Macro names beginning with INT or UINT and ending with _MAX, _MIN, or _C may be added to the macros defined in the stdint.h header.
https://port70.net/~nsz/c/c11/n1570.html#7.31.10p1
Going through the K&R ansi C programming language book (second version), on page 82 an example is given for a programming files/folders layout.
What I don't understand is, while calc.h gets included in main (use of functions), getop.c (definition of getop) and stack.c (definition of push and pop), it does not get included into getch.c, even though getch and ungetch are defined there.
Although it's a good idea to include the header file it's not required as getch.c doesn't actually use the function declared in calc.h, it could even get by if it only used those already defined in getch.c.
The reason it's a good idea to include the header file anyway is because it would provide some safety if you use modern style prototypes and definitions. The compiler should namely complain if for example getop isn't defined in getop.c with the same signature as in calc.h.
calc.h contains the declaration of getch() and ungetch(). It is included by files that want to use these functions (and, therefore, need their signature).
getch.c, instead, contains the definition of getch() and ungetch(). Therefore, there is no need of including their declaration (which is implicitly defined in the definition).
The omission you have so aptly discovered can be a source of a real problem. In order to benefit fully from C's static type checking across a multi-translation-unit program (which is almost anything nontrivial), we must ensure that the site which defines an external name (such as a function) as well as all the sites which refer to the name, have the same declaration in scope, ideally from a single source: one header file where that name is declared.
If the definition doesn't have the declaration in scope, then it is possible to change the definition so that it no longer matches the declaration. The program will still translate and link, resulting in undefined behavior when the function is called or the object is used.
If you use the GNU compiler, you can guard against this problem using -Wmissing-prototypes. Straight from the gcc manual page:
-Wmissing-prototypes (C and Objective-C only)
Warn if a global function is defined without a previous prototype
declaration. This warning is issued even if the definition itself
provides a prototype. The aim is to detect global functions that
fail to be declared in header files.
Without diagnosis, this kind of thing, such as forgetting a header file, can happen to the best of us.
One possible reason why the header was forgotten is that the example project uses the "one big common header" convention. The "one big common header" approach lets the programmer forget all about headers. Everything just sees everything else and the #include "calc.h" which makes it work is just a tiny footnote that can get swallowed up in the amnesia. :)
The other aspect is that the authors had spent a lot of time programming in pre-ANSI "Classic" C without prototype declarations. In Classic C, header files are mainly for common type declarations and macros. The habit is that if a source file doesn't need some type or macros that are defined in some header, then it doesn't need to include that header. A resurgence of that habit could be what is going on here.
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
I found the above function definition in /usr/include/netinet/ether.h on a Linux box.
Can someone explain what the double underscores mean in front of const (keyword), addr (identifier) and at last __THROW.
In C, symbols starting with an underscore followed by either an upper-case letter or another underscore are reserved for the implementation. You as a user of C should not create any symbols that start with the reserved sequences. In C++, the restriction is more stringent; you the user may not create a symbol containing a double-underscore.
Given:
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr)
__THROW;
The __const notation is there to allow for the possibility (somewhat unlikely) that a compiler that this code is used with supports prototype notations but does not have a correct understanding of the C89 standard keyword const. The autoconf macros can still check whether the compiler has working support for const; this code could be used with a broken compiler that does not have that support.
The use of __hostname and __addr is a protection measure for you, the user of the header. If you compile with GCC and the -Wshadow option, the compiler will warn you when any local variables shadow a global variable. If the function used just hostname instead of __hostname, and if you had a function called hostname(), there'd be a shadowing. By using names reserved to the implementation, there is no conflict with your legitimate code.
The use of __THROW means that the code can, under some circumstances, be declared with some sort of 'throw specification'. This is not standard C; it is more like C++. But the code can be used with a C compiler as long as one of the headers (or the compiler itself) defines __THROW to empty, or to some compiler-specific extension of the standard C syntax.
Section 7.1.3 of the C standard (ISO 9899:1999) says:
7.1.3 Reserved identifiers
Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any use or for use as file
scope identifiers.
— All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
— All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
— Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise (see 7.1.4).
— All identifiers with external linkage in any of the following subclauses (including the
future library directions) are always reserved for use as identifiers with external
linkage.154)
— Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
No other identifiers are reserved. If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.
Footnote 154) The list of reserved identifiers with external linkage includes errno, math_errhandling,
setjmp, and va_end.
See also What are the rules about using an underscore in a C++ identifier; a lot of the same rules apply to both C and C++, though the embedded double-underscore rule is in C++ only, as mentioned at the top of this answer.
C99 Rationale
The C99 Rationale says:
7.1.3 Reserved identifiers
To give implementors maximum latitude in packing library functions into files, all external
identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names, not even if the user function has
the same specification. Thus, for instance, strtod may be defined in the same object module as printf, with no fear that link-time conflicts will occur. Equally, strtod may call printf, or printf may call strtod, for whatever reason, with no fear that the wrong function will be called.
Also reserved for the implementor are all external identifiers beginning with an underscore, and all other identifiers beginning with an underscore followed by a capital letter or an underscore. This gives a name space for writing the numerous behind-the-scenes non-external macros and functions a library needs to do its job properly.
With these exceptions, the Standard assures the programmer that all other identifiers are available, with no fear of unexpected collisions when moving programs from one
implementation to another5. Note, in particular, that part of the name space of internal identifiers beginning with underscore is available to the user: translator implementors have not been the only ones to find use for “hidden” names. C is such a portable language in many respects that the issue of “name space pollution” has been and is one of the principal barriers to writing completely portable code. Therefore the Standard assures that macro and typedef names are reserved only if the associated header is explicitly included.
5 See §6.2.1 for a discussion of some of the precautions an implementor should take to keep this promise. Note also that any implementation-defined member names in structures defined in <time.h> and <locale.h> must begin with an underscore, rather than following the pattern of other names in those structures.
And the relevant part of the rationale for §6.2.1 Scopes of identifiers is:
Although the scope of an identifier in a function prototype begins at its declaration and ends at the end of that function’s declarator, this scope is ignored by the preprocessor. Thus an identifier
in a prototype having the same name as that of an existing macro is treated as an invocation of that macro. For example:
#define status 23
void exit(int status);
generates an error, since the prototype after preprocessing becomes
void exit(int 23);
Perhaps more surprising is what happens if status is defined
#define status []
Then the resulting prototype is
void exit(int []);
which is syntactically correct but semantically quite different from the intent.
To protect an implementation’s header prototypes from such misinterpretation, the implementor must write them to avoid these surprises. Possible solutions include not using identifiers in prototypes, or using names in the reserved name space (such as __status or _Status).
See also P J Plauger The Standard C Library (1992) for an extensive discussion of name space rules and library implementations. The book refers to C90 rather than any later version of the standard, but most of the implementation advice in it remains valid to this day.
Names with double leading underscores are reserved for use by the implementation. This does not necessarily mean they are internal per se, although they often are.
The idea is, you're not allowed to to use any names starting with __, so the implementation is free to use them in places like macro expansions, or in the names of syntax extensions (e.g. __gcnew is not part of C++, but Microsoft can add it to C++/CLI confident that no existing code should have something like int __gcnew; in it that would stop compiling).
To find out what these specific extensions mean, i.e. __const you'll need to consult the documentation for your specific compiler/platform. In this particular case, you should probably consider the prototype in the documentation (e.g. http://www.kernel.org/doc/man-pages/online/pages/man3/ether_aton.3.html) to be the function's interface and ignore the __const and __THROW decorations that appear in the actual header.
By convention in some libraries, this indicates that a particular symbol is for internal use and not intended to be part of the public API of the library.
The underscore in __const means that this keyword is a compiler extension and using it is not portable (The const keyword was added to C in a later revision, 89 I think).
The __THROW is also some kind of extension, I assume that it gets defined to some __attribute__(something) if gcc is used, But I'm not sure on that and too lazy to check.
The __addr can mean anything the programmer wanted it to mean, It's just a name.