C variable name beginning with _ allowed, and (0x8) meaning? [duplicate] - c

I am trying to understand when a developer needs to define a C variable with preceding '_'. What is the reason for it?
For example:
uint32_t __xyz_ = 0;

Maybe this helps, from C99, 7.1.3 ("Reserved Identifiers"):
All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
Moral: For ordinary user code, it's probably best not to start identifiers with an underscore.
(On a related note, I think you should also stay clear from naming types with a trailing _t, which is reserved for standard types.)

It is a trick used in the header files of C implementations for global symbols, in order to prevent eventual conflicts with other symbols defined by the user.
Since C lacks a namespace feature, this is a rudimentary approach to avoid name collisions with the user.
Declaring such symbols in your own header and source files is not encouraged because it can introduce naming conflicts between your code and the C implementation. Even if that doesn't produce a conflict on your current implementation, you are still prone to strange conflicts across different/future implementations, since they are free to use other symbols prefixed with underscores.

whether its C or not, the leading underscore provides the programmer a status indication so he does not have to go look it up. In PHP, or any object oriented language where we deal with tens of thousands of properties and methods written by 1000's of authors, seeing an underscore prefix removes the need to go dig through the class andlook up whether its declared private, or protected or public. thats an immense time saver. the practice started before C, i am sure...

Related

Does the ISO 9899 standard has reserved any use of the _t suffix for identifiers?

I can read in many books and an other SO questions that the standard may expand the set of identifiers such as size_t or int32_t, so it reserves any use of the _t suffix for identifiers.
Is that true?
I could not find anything that discourage the use of this suffix in the ISO9899:1999 standard, but that standard is hard to read :(
No, it's not true.
The standard reserves the right to add identifiers starting with int or uint and ending _t to the stdint.h header (§7.31.10). Those identifiers are technically only reserved if that header is included but since it almost always is, they should be treated as reserved.
In general, the standard reserves identifiers defined in standard headers, or mentioned in the future directions for standard headers (§7.31). Identifiers having external linkage (library functions) are reserved for that use (which doesn't stop you from using them as local or static variables, for example). If the library header is included, then its identifiers are reserved for use at file scope. Read §7.1.3 for details.
As that section indicates, the only identifiers unconditionally reserved are those which start with an underscore followed by a capital letter or a second underscore.
While reading the standard, it's important to understand the difference between contexts in which a name is reserved:
Reserved for any use (identifiers starting with an underscore followed by another underscore or a capital letter): these identifiers may be used by the implementation as macros or special symbols which are handled in some idiosyncratic way by the compiler. Do not ever define one of these in your code, and use the ones which are documented only as indicated by the documentation. Do not use such a symbol if it is not documented, even if you see it used in some standard library header. Or someone else's code.
Reserved at file scope (other identifiers starting with an underscore, not part of any standard header): These identifiers will not be used as macros, and you must not define them as macros either. You may use them as local variables, labels, parameters, and struct or union members. Personally, I wouldn't do this, but it's permitted. I prefer to put an underscore at the end of an identifier which is used in some internal context.
Reserved at file scope and as a macro name (any identifier mentioned in an included standard header, including in the future directions clause): Again, since these identifiers may be macros, you should treat them as off limits if you #include the associated header. The standard does allow you to #undef an identifier used as a function name in the standard library, although you might find that performance suffers because the macro wraps a construct with equivalent semantics but optimised performance.
Reserved for use as an identifier with external linkage (any identifier defined in any standard library header as having external linkage, whether or not the header is included, including the identifier errno): The weakest reservation. If you don't include the associated header, you're free to use such an identifier, even at file scope, as long as it is not externally visible. So it could be a file-scope static or an enumeration member or the tag of a struct or union. The point of this clause is not to allow you to deliberately shadow the name of a standard library function. Rather, it is to protect you from future additions to the standard library which might export an external symbol you're currently using. Of course, if your current use is as an externally visible identifier, you're still going to have a future problem. But on the whole, externally visible symbols should be prefixed with a package name to avoid name collisions with other libraries.
Having said all that, it's unwise to use an identifier that looks like it might be a standard identifier. Posix includes a list of over a hundred patterns for identifier names that it might use in the future, including all identifiers ending _t, so if you expect your code to be used in a Posix environment, you'll want to avoid those names. And while future C standard revisions might avoid adding new type names to existing headers (aside from the integer typenames mentioned above), you don't really want to preclude using any such new types, since they may well be useful. (And, according to a comment by #JensGustedt, who knows a lot more about the workings of the C working group than I do, there will be a couple of new type names in existing headers in C2x.)
The _t suffix is not reserved by ISO 9899 as such. The future library directions for C11 revision does only say that (C11 7.31.10):
Typedef names beginning with int or uint and ending with _t may be
added to the types defined in the <stdint.h> header. [...]
That said, there are great many types with _t suffix defined in C11:
char16_t, char32_t, clock_t, cnd_t, constraint_handler_t, div_t, double_t, errno_t, fenv_t, fexcept_t, float_t, fpos_t, imaxdiv_t,
int_fastN_t, int_leastN_t, intmax_t, intN_t, intptr_t, ldiv_t, lldiv_t, max_align_t, mbstate_t, mtx_t, ptrdiff_t, rsize_t,
sig_atomic_t, size_t, thrd_start_t, thrd_t, time_t, tss_dtor_t, tss_t, uint_fastN_t, uint_leastN_t,
uintmax_t, uintN_t, uintptr_t, wchar_t, wctrans_t, wctype_t, wint_t
POSIX, on the other hand, reserves the _t suffix for system use. The POSIX 1003.1 rationale has this excerpt:
To allow implementors to provide their own types, all conforming applications are required to avoid symbols ending in _t, which permits the implementor to provide additional types.
All in all, considering that the chances are that you want to use your C code in a POSIX system now or later, to steer away from using _t for your own types.
Standard C allows you to use the _t suffix so long as you don't end up with a token that starts with a double underscore. (Note that C++ restricts this further in that a double underscore is not allowed anywhere in the token; worth adhering to should you anticipate your code reaching C++.)
It's POSIX that reserves _t.

Why does C have keywords starting with underscore

Most keywords in C (or in any language for that matter) starts with a letter. But there are some keywords that starts with an underscore? They keywords are: _Alignas, _Alignof, _Atomic, _Bool, _Complex, _Generic, _Imaginary, _Noreturn, _Static_assert and _Thread_local.
I find it amazingly strange. If it was a hidden global constant or internal function that's not really a part of the API, I would understand it. But these are keywords.
I find it extra strange when C actually have a macros called bool and static_assert, and that their implementations is using the very keywords I just mentioned.
C developed and become very popular before it was planned by a standards committee. In consequence, there was a lot of existing code.
When setting a C standard, or updating an old standard, an important goal is not to “break” old code. It is desirable that code that worked with previous compilers continue to work with new versions of the C language.
Introducing a new keyword (or any new definition or meaning of a word) can break old code, since, when compiling, the word will have its new keyword meaning and not the identifier meaning it had with the previous compilers. The code will have to be edited. In addition to the expense of paying people to edit the code, this has a risk of introducing bugs if any mistakes are made.
To deal with this, a rule was made that identifiers starting with underscore were reserved. Making this rule did not break much old software, since most people writing software choose to use identifiers beginning with letters, not underscore. This rule gives the C standard a new ability: By using underscore when adding new keywords or other new meanings for words, it is able to do so without breaking old code, as long as that old code obeyed the rule. For example, adding a new keyword, _Bool for a Boolean type, would not break any code that had not used identifiers beginning with an underscore.
New versions of the C standard sometimes introduce new meanings for words that do not begin with an underscore, such as bool. However, these new meanings are generally not introduced in the core language. Rather, they are introduced only in new headers. In making a bool type, the C standard provided a new header, <stdbool.h>. Since old code could not be including <stdbool.h> since it did not exist when the code was written, defining bool in <stdbool.h> would not break old code. At the same time, it gives programmers writing new code the ability to use the new bool feature by including <stdbool.h>, which defines bool as a macro that is replaced by _Bool.
In the standard, any name that begins with a double underscore or an underscore followed by an uppercase letter is reserved. This is useful because C lacks named namespaces. By reserving all such symbols, new and implementation specific keywords can be introduced to the language without clashing with symbols defined in existing code.
The macros such as bool and static_assert are "convenience macros", they allow you to use the reserved keyword symbols without the underscores and capitals at the small risk of a name clash. However they provide a means to resolve a name clash because unlike a keyword, and macro may be #undefined, or the header that defines it excluded and the internal keyword used directly. Moreover unmodified legacy code will not be broken because by definition it will not include the headers that did not exist at the time of writing
The unadorned keywords have been defined in the language since the language's inception (with the exception of inline and restrict defined since C99), so will not cause a conflict with legacy code symbols. All _Xxxx keywords have been defined at or since C99.
Unlike many languages in common use today, C has been around since the 1970's and standardised since 1989 - there is a huge amount of existing code that must remain compilable on modern compilers while at the same time the language cannot remain unchanged - if it did it might no longer be in such common use.
Eric and Clifford has provided good answers, but I add a quote from the C11 standard to support it.
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
https://port70.net/~nsz/c/c11/n1570.html#7.1.3
One might also consider this:
Typedef names beginning with int or uint and ending with _t may be added to the types defined in the stdint.h header. Macro names beginning with INT or UINT and ending with _MAX, _MIN, or _C may be added to the macros defined in the stdint.h header.
https://port70.net/~nsz/c/c11/n1570.html#7.31.10p1

C universal macro names - gcc -fextended-identifiers

I'm looking for how can I write identifiers name with characters like [ ' " or #.
Everytime that I try to do that, I give the error:
error: macro names must be identifiers
But learning about gcc, I found this option:
-fextended-identifiers
But it seems not working like I wanted, please, somebody know how to accomplish that?
Identifiers can't include such characters. It is defined that way in the language syntax, identifiers are letters, digits or underline (and mustn't begin with a digit to avoid ambiguity with litteral numbers).
If it was possible this would conflict with the C compiler (that uses [ for arrays) and C preprocessor syntax (that uses #). Extended identifiers extension only allow using characters non forbidden by the language syntax inside identifiers (basically unicode foreign letters, etc.).
But if you really, really want to do this, nothings forbids you to preprocess your source files with your own "extended macro preprocessor", practically creating a new "C like" language. That looks like a terrible idea, but it's not really hard to do. Then you'll see soon enough by yourself why it's not a good idea...
According to this link, -fextended-identifiers only enables UTF-8 support for identifiers, so it won't help in your case.
So, answer is: You can't use such characters in macro identifiers.
Even if the extended identifier characters support was fully enabled, it wouldn't help you get characters such as:
[ ' " #
enabled for identifiers. The standard allows 'universal character names' or 'other implementation-defined characters' to be part of an identifier, but they cannot be part of the basic character set. Out of the basic character set, only _, letters and digits can be part of an identifier name (6.4.2.1 Identifiers/General).

What style to use when naming types in C

According to this stack overflow answer, the "_t" postfix on type names is reserved in C. When using typedef to create a new opaque type, I'm used to having some sort of indication in the name that this is a type. Normally I would go with something like hashmap_t but now I need something else.
Is there any standard naming scheme for types in C? In other languages, using CapsCase like Hashmap is common, but a lot of C code I see doesn't use upper case at all. CapsCase works fairly nicely with a library prefix too, like XYHashmap.
So is there a common rule or standard for naming types in C?
Yes, POSIX reserves names ending _t if you include any of the POSIX headers, so you are advised to stay clear of those - in theory. I work on a project that has run afoul of such names two or three times over the last twenty or so years. You can minimize the risk of collision by using a corporate prefix (your company's TLA and an underscore, for example), or by using mixed case names (as well as the _t suffix); all the collisions I've seen have been short and all-lower case (dec_t, loc_t, ...).
Other than the system-provided (and system-reserved) _t suffix, there is no specific widely used convention. One of the mixed-case systems (camelCase or InitialCaps) works well. A systematic prefix works well too - the better libraries tend to be careful about these.
If you do decide to use lower-case and _t suffix, do make sure that you use long enough names and check diligently against the POSIX standard, the primary platforms you work on, and any you think you might work on to avoid unnecessary conflicts. The worst problems come when you release some name example_t to customers and then find there is a conflict on some new platform. Then you have to think about making customers change their code, which they are always reluctant to do. It is better to avoid the problem up front.
The Indian Hill style guidelines have some suggestions:
Individual projects will no doubt have
their own naming conventions. There
are some general rules however.
Names with leading and trailing underscores are reserved for system
purposes and should not be used for
any user-created names. Most systems
use them for names that the user
should not have to know. If you must
have your own private identifiers,
begin them with a letter or two
identifying the package to which they
belong.
#define constants should be in all CAPS.
Enum constants are Capitalized or in all CAPS
Function, typedef, and variable names, as well as struct, union, and
enum tag names should be in lower
case.
Many macro "functions" are in all CAPS. Some macros (such as getchar and
putchar) are in lower case since they
may also exist as functions.
Lower-case macro names are only
acceptable if the macros behave like a
function call, that is, they evaluate
their parameters exactly once and do
not assign values to named parameters.
Sometimes it is impossible to write a
macro that behaves like a function
even though the arguments are
evaluated exactly once.
Avoid names that differ only in case, like foo and Foo. Similarly,
avoid foobar and foo_bar. The
potential for confusion is
considerable.
Similarly, avoid names that look like each other. On many terminals and
printers, 'l', '1' and 'I' look quite
similar. A variable named 'l' is
particularly bad because it looks so
much like the constant '1'.
In general, global names (including
enums) should have a common prefix
identifying the module that they
belong with. Globals may alternatively
be grouped in a global structure.
Typedeffed names often have "_t"
appended to their name.
Avoid names that might conflict with
various standard library names. Some
systems will include more library code
than you want. Also, your program may
be extended someday.
C only reserves some uses of a _t suffix. As far as I can tell, this is only current identifiers ending with _t plus any identifier that starts int or uint (7.26.8). However, POSIX may reserve more.
It's a general problem in C, since you have extremely flat namespaces, and there's no silver bullet. If you're familiar with CapCase names and they work well for you, then you should continue to use them. Otherwise, you'll have to evaluate the goals of the current project and see which solution best meets them.
CapsCase is often used for types in C.
For instance, if you look at projects in the GNOME ecosystem (GTK+, GDK, GLib, GObject, Clutter, etc.), you'll see types like GtkButton or ClutterStageWindow. They only use CapsCase for data types; function names and variables are all lower-case with underscore separators - e.g. clutter_actor_get_geometry().
Type naming schemes are like indentation conventions - they generate religious wars with people asserting some sort of moral superiority for their preferred approach. It is certainly preferable to follow the style in existing code, or in related projects (e.g. for me, GNOME over the last few years.)
However, if you're starting from scratch and have no template, there's no hard-and-fast rule. If you're interested in coding efficiently and leaving work at reasonable hour so you can go home and have a beer or whatever, you certainly should pick a style and stick to it for your project, but it matters very little exactly which style you pick.
One alternate solution that works reasonably well is to use uppercase for all type names and macro names. Global variables may be CapCase (CamelBack) and all local variables lower case.
This technique helps to improve readability and also takes advantage of language syntax which reduces the number of pollution characters in variable names; e.g. gvar, kvar, type_t, etc. For example, data types cannot be syntatically confused with any other type.
Global variables are easily distinguished from locals by having at least one upper case letter.
I agree that prefixed or postfixed underscores should be avoided in all token names.
Lets look at the example below.
Its readily clear that InvertedCount is a global due to its case. It's equally clear that INT32U and RET_ERR are types due to their sytax. Its also clear that INVERT_VAL() is a macro due to the fact thats its on the right hand side and there is no cast so it cant be a data type.
One thing is for sure though. Whichever method you use, it should be inline with your organizations coding standard. For me, the least amount of clutter, the better.
Of course, style is a different issue.
#define INVERT_VAL(x) (~x)
#define CALIBRATED_VAL 100u
INT32U InvertedCount;
typedef enum {
ERR_NONE = 0,
...
} RET_ERR;
RET_ERR my_func (void)
{
INT32U val;
INT32U check_sum;
val = CALIBRATED_VAL; // --> Lower case local variable.
check_sum = INVERT_VAL(val); // --> Clear use of macris.
InvertedCount = checksum; // --> Upper case global variable.
// Looks different no g prefix required.
...
return (ERR_NONE);
}
There are many ideas and opinion on this subject, but there is no one universal standard for naming types. The most important thing is to be consistent. In the absence of coding standards, when maintaining code, resist the urge to use another naming convention. Introducing a new naming convention, even if it's perfect, can add unnecessary complexity.
This is actually a great topic to raise when interviewing people. I've never come across a good programmer that didn't have an opinion on this. No opinion or no passion in the answer indicates that the person isn't an experienced programmer.

epoll_data_t question (specifically about C data types)

The union epoll_data_t looks like:
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
This is more of a general C question, but why are the leading double underscores __uint{32,64} types used instead of just uint{32,64} without the underscores? I don't really understand why/when you would use the underscore version, but I thought that uint32 without underscores would be the proper thing to use in a union publicly modifiable to the outside world.
A leading underscore is reserved to the compiler/library vendor to avoid creating symbols in the global namespace that collide with symbols created by their customers. Unfortunately, customers have been using this too for their own "system level" declarations, as do 3rd party library vendors, forcing the vendors to start using two underscores. Symbols with 3 underscores have been found in the wild but are not yet wide-spread.
Fixed-width integer types were standardized with C99. Before that, compiler and library authors introduced their own types, of which these might be a remnant; afaik MS still doesn't ship stdint.h with Visul Studio.
Directly from wikipedia [http://en.wikipedia.org/wiki/Underscore]
Many clashes were possible within the
external identifier linkage space
which potentially mingles code
generated by various high level
compilers, runtime libraries required
by each of these compilers, compiler
generated helper functions, and
program startup code, of which some
fraction was inevitably compiled from
system assembly language. Within this
collision domain the underscore
character quickly became entrenched as
the primary mechanism for
differentiating the external linkage
space. It was common practice for C
compilers to prepend a leading
underscore to all external scope
program identifiers to avert clashes
with contributions from runtime
language support. Furthermore, when
the C/C++ compiler needed to introduce
names into external linkage as part of
the translation process, these names
were often distinguished with some
combination of multiple leading or
trailing underscores.

Resources