Name decoration in C - c

Does any standard mandate name decoration?
As far as I know most (all?) conforming implementations add underscore prefix to the name of each exported symbol. Is this guaranteed by a C, POSIX or some other standard?

I'm not sure about a "standard" but the prepended underscore seems to be a very common convention dating from the 1970s. From What is the reason function names are prefixed with an underscore by the compiler?:
At the time that UNIX was rewritten in C in about 1974, its authors
already had extensive assember language libraries, and it was easier
to mangle the names of new C and C-compatible code than to go back and
fix all the existing code.
It is required for C names if you want to interoperate with the Microsoft compilers on Windows (win32 only; win64 does not use decoration since it has only a single standard calling convention).
https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names
See also: Why do C compilers prepend underscores to external names?

Related

Could anyone help me to solve "clang: error: Undefined symbol _revstring" [duplicate]

I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

Why not all the standard headers are preceded with std prefix?

Why not all the standard headers are preceded with std prefix? I.e. why complex.h and not stdcomplex.h?
Why? why not? Who knows? The header files that make up the standard libraries began evolving into that category before a standard existed, over years of revisions by developers and scrutiny by C committee members. Many of the original authors and committee members who developed and canonized these files are now part of the big compiler in the sky and not available to answer the question "why" the standard naming convention is not really conventional or standard. But reading this wiki page on the topic may at least allow you to get a little history and context.
The naming has historically nothing to do with formal ISO standardization and I don't think there ever was an ambition to toss std in front of everything standard library.
Earliest mention of a standard library seems to be K&R The C programming Language 1st edition from 1978, where below chapter 8.5 we can read:
The data structure that describes a file is contained in the file stdio.h, which must be included (by #include) in any source file that uses routines from the standard library.
Notably K&R refers to it as the standard library, not the standard input/output library. So maybe (and this is speculation) this header was originally intended to be the whole standard library for C. This is the only one of the current ISO standard library headers I can find mentioned in the book and it pre-dates formal standardization by more than ten years.
Then later during ANSI/ISO standardization the headers stdarg, stddef, stdio and stdlib were added to the first standard, but these are just 4 out of 15 standard headers using the std prefix. Various C or Unix de-facto standard headers just got added to the standard pretty much arbitrary. There was no sound rationale for anything, least of all API or naming. They just tossed in various already present "good to have" Unix libs into the standard.
Notably, the original C standard only guaranteed 6 unique letters for standard headers and all the original headers have names with 6 or less letters. This was expanded to 8 letters in C99.
C99 continued the tradition of arbitrary naming, adding a whole bunch of new headers, of which only stdint and stdbool have the std prefix. That C11 named most new headers with std prefix might be some influence from C++, but notably C11 also added uchar.h without prefix.

Why symbols are prefixed with an underscore in compiled files? [duplicate]

I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

What is the purpose of Microsoft's underscore C functions?

This question is about the same subject as strdup or _strdup? but it is not the same. That question asks how to work around MS's renamings, this question asks why they did it in the first place.
For some reason Microsoft has deprecated a whole slew of POSIX C functions and replaced them with _-prefixed variants. One example among many is isatty:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/posix-isatty
This POSIX function is deprecated. Use the ISO C++ conformant _isatty instead.
What exactly is ISO C++ conformant about _isatty? It appears to me that the MSDN help is totally wrong.
The other questions answer explained how to deal with this problem. You add the _CRT_NONSTDC_NO_DEPRECATE define. Fine. But I want to know what Microsoft's thinking is. What was their point in renaming and deprecating functions? Was it just to make C programmers lives even harder?
The fact that _isatty() is ISO C++ conformant makes sense if you think of it like a language-lawyer.
Under ISO C++, the compiler is only supposed to provide the functions in the standard (at least for the standard headers) -- they're not allowed to freely add extra functions, because it could conflict with functions declared in the code being compiled. Since isatty() is not listed in the standard, providing an isatty() function in a standard header would not be ISO C++ compliant.
However, the standard does allow the compiler to provide any function it wants as long as the function starts with a single underscore. So -- language lawyer time -- _isatty() is compliant with ISO C++.
I believe that's the logic that leads to the error message being phrased the way it is.
(Now, in this specific case, isatty() was provided in io.h, which is not actually a C++ standard header, so technically Microsoft could provide it and still claim to be standards-conformant. But, they had other non-compliant functions like strcmpi() in string.h, which is a standard header. So, for consistency, they deprecated all of the POSIX functions the same way and they all report the same error message.)
Names starting with an underscore, like _isatty are reserved for the implementation. They do not have a meaning defined by ISO C++, nor by ISO C, and you can't use them for your own purposes. So Microsoft is entirely right in using this prefix, and POSIX is actually wrong.
C++ has namespaces, so a hypthetical "Posix C++" could define namespace posix, but POSIX has essentially become fossilized - no new innovation in that area.
isatty & co., although POSIX, are not standard C, and are provided as "extensions" by the VC++ runtime1.
As such, they are prefixed with an underscore supposedly to avoid name clashes - as names starting with an underscore followed by a lowercase letter are reserved for implementation-defined stuff at global scope. So if, for example, you wanted to use an actual POSIX compatibility layer providing its own versions of these functions, they wouldn't have to fight with the VC++-provided "fake" ones for the non-underscored names.
Extensions which have no presumption to be actually POSIX-compliant, by the way.

Why do C compilers prepend underscores to external names?

I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

Resources