I have seen examples of labels in ARM with and without colons following the symbol name. Is the colon required?
I was under the impression that colons are required, but an example from ARM's site is missing them: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0100a/armasm_ceghjdfb.htm
The presence or absence of a colon after a label is a matter of assembler syntax and does not get included in the object file and therefore not an ABI concern. GNU assembler requires a trailing colon, some other assemblers prohibit a trailing colon.
The preceding underscores required by some ABIs are not treated specially by assemblers and are required to be compatible with a C compiler for those ABIs.
Related
I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
I have some confusion for contents about variable names in K & R C. Original text as below:
At least the first 31 characters of an internal name are significant. For function names and external variables, the number may be less than 31, because external names may be used by assemblers and loaders over which the language has no control. For external names, the standard guarantees uniqueness only for 6 characters and a single case. Keywords like if, else, int, float, etc., are reserved: you can't use them as variable names. They must be in lower case.
It's wise to choose variable names that are related to the purpose of the variable, and that are unlikely to get mixed up typographically. We tend to use short names for local variables, especially loop indices, and longer names for external variables.
What confused me was the external names, the standard guarantees uniqueness only for 6 characters and a single case. Does it means that for external names, only the 6 leading chars are valid and remaining chars are all ignored? For example, we defined two external variable myexvar1 and myexvar2, the compiler will treat these two variables as one? If this is true, why they advise us to use longer names for external variables?
Does it means that for external names, only the 6 leading chars are valid and remaining chars are all ignored? For example, we defined two external variable myexvar1 and myexvar2, the compiler will treat these two variables as one?
Yes this was true in 1990. Or rather, 6 unique leading characters of external identifiers was what the C90 standard set as minimum limit for a compiler. This was of course madness - which is why this limit was increased to 31 in C99.
In practice, most C90 compilers had at least 31 unique characters for internal and external identifiers both.
If this is true, why they advise us to use longer names for external variables?
Not sure if they advise it. But the coding style used in K&R is often plain horrible, so it is definitely not a book you should consult for coding style advise.
In modern C, it is required (C17 5.2.4.1) that we have:
63 significant initial characters in an internal identifier or a macro name
31 significant initial characters in an external identifier
So don't worry too much about which limitations the dinosaurs faced, but follow modern standard C.
As pointed out in another answer, even the restriction of 31 significant initial characters for external identifiers is listed as obsolete, meaning this might get increased even further, to 255, in future standards.
Truth be told K&R is pretty old, so I assume things have changes since then.
I really don't know the reason why the give exactly 6 characters here:
For external names, the standard guarantees uniqueness only for 6 characters and a single case.
But you have to understand that all compiler does is translating a translation unit (usually a *.c file) into an object file (*.o). That's it. Compiler does not produce a ready to run program.
Those object files might contain references to unresolved symbols to be found in other object files as well as a table of their own external symbols, the ones they provide to be referenced from the outside. The symbols do have textual names, which are the names you've given to your external variables.
Linkers and dynamic loaders still have to do their jobs to build the program and get it running. Along the way the have to resolve all unresolved symbols, so they perform textual lookup for those symbols in object files. Linkers and loaders are not compiler. The might have their own rules about treating those names (back in the days of K&R, I guess). That's what this ...
because external names may be used by assemblers and loaders over which the language has no control.
... is about.
These days though all your K&R concerns sound outdated and irrelevant. Pick a newer standard to follow.
This is due to the historical background concerning the length of exported symbols to the linker of the system.
I quote from The New C Standard -- An Economic and Cultural Commentary.
The values of 6 and 10 were chosen so that the encodings \u1234 and
\U12345678 could be used.
The Fortran significant character limit of six was followed by many
suppliers of linkers for a long time. The need for longer identifiers
to support name mangling in C++ ensured that most modern linkers
support many more significant characters in an external identifier.
Common Implementations
Historically, the number of significant
characters in an external identifier was driven by the behavior of the
host vendor-supplied linker. Only since the success of MS-DOS have
developers become used to translator vendors supplying their own
linker. Previously, most linkers tended to be supplied by the hardware
vendor. The mainframe world tended to be driven by the requirements of
Fortran, which had six significant characters in an internal or
external identifier. In this environment it was not always possible to
replace the system linker by one supporting more significant
characters. The importance of the mainframe environment waned in the
1990s. In modern environments it is very often possible to obtain
alternative linkers.
So the main issue was to be able to link together libraries compiled in C with libraries compiled in Fortran, and Fortran imposed the limit of 6.
You can read more at the given reference.
That's a legacy of the past that is not anymore important. No today compiler has those limits, and that was something that dates from the times the old unix was made. The reasons were (then and today) the limits imposed by the compiler to the names in the symbol table (31) and the limit the linker used (6) in that time.
But that's not applicable anymore. At least you can be sure that today's linkers will allow different identifiers to state different with at least a common prefix of length 100.
I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
Let's assume the following code
int __foo(void) {
return 0;
}
int _BAR(void) {
return 3;
}
int main(void) {
return __foo() & _BAR();
}
Double underscore and a single underscore followed by an uppercase letter symbols are reserved and therefore not allowed (This is a C++ question, but it also mentions the C rules).
I tried -Wall -Wextra -pedantic options on gcc and -Weverything option on clang, both do not warn about this.
Is there any way to enable a compiler warning for this?
GCC and Clang appear not to offer such a feature.
The documentation for GCC warning messages is here (for version 8.2; to seek documentation for other versions, start here). None of them mention checking for reserved identifiers or identifiers that begin with an underscore followed by an underscore or capital letter, except certain special cases (such as the built-in __FILE__) that are not useful for this question.
Clang’s documentation is here (that appears to be a link for the current version, so expected it to be updated in the future). It similarly has no mention of checking for reserved identifiers.
In Clang, -Weverything enables all diagnostics, so, if no diagnostic appears when compiling sample code with -Weverything, the desired diagnostic is not implemented in Clang.
There does not appear to be any reason a compiler cannot do this. Clang does track where source text originates. For example, if macro expansions result in a syntax error, Clang prints multiple diagnostic lines showing the names, line numbers, and file names of the macros involved. Furthermore, Clang suppresses warnings in system headers and can be told to treat additional files (such as headers for libraries) similarly with #pragma clang system_header. So it seems feasible for Clang to produce a warning for any reserved identifier that does not originate in a system header. The lack of such a feature may be due to lack of demand.
A compiler can't practically warn you of this. Once the preprocessor has included any standard library files (which can of course contain double underscores), the compiler doesn't really know the origins of such code.
A good IDE or static analyser can warn you however.
I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.