Does Comments/Identifiers can impact on code performance/operability? - c

Today i was presented with a wiered fact (or not)
it was said:
"At it is disallowed to write long, descriptive identifier names, and forbidden to write Comments for Linux Drivers written in ANSI C."
When i asked "WTF? Why?" i was told it caused performence issues and errors of such...
not many details there.
I am supprised, but have to ask...
Can this be real?
knowing that Comments are stripped by the compilation pre-processor,
and that Identifiers are either way converted to adresses.
so... Can it cause Problems ?

Well, ANSI C is a standard, and a standard is something itself that everyone must follow (I mean compiler designers and programmers, if they decide to support it).
ANSI C standard states that exported identifiers (yeah, exported identifiers are stored as symbols in symbols table as is, not just addresses) must not be longer than 6 characters, and non-exported identifiers are ok to be not longer than 31 character.
On commenting. Except some obvious pitfalls like accidental code swallowing by multi-line commenting, I recommend you to read Coding Style article for Kernel developers which explains what kind of comments are not encouraged.

Absolutely not. Whatever identifier you used in your code, they will be translated to symbols by compiler.
Also, all comments will be ignored by the compilation pre-processor.
The only effect of comments are help you understand code more quickly .

The only performance impact comments can have is during compile time, though I would say it is neglectable, unless you write whole books as comments.
The identifer names are translated to symbols, so there is also, at best, a performance impact at compiletime, which again is neglectable. Identifer names might hit a maximum limit, but to be honest, I never encountered a problem because of to long identifier names.

No, the first step in the compilation is pre-process your source code to remove comments and do other tricks like expanding macros.
Identifiers are often translated into pointers (to symbol table entries).

Related

Could anyone help me to solve "clang: error: Undefined symbol _revstring" [duplicate]

I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat() system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's asm label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
From Wikipedia:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.
This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

variable names in C by Dennis Ritchie [duplicate]

When taken literally, it makes sense, but what exactly does it mean to be a significant character of a variable name?
I'm a beginning learner of C using K&R. Here's a direct quote from the book:
"At least the first 31 characters of an internal name are significant. For function names and external variables, the number may be less than 31, because external names may be used by assemblers and loaders over which the language has no control. For external names, the standard guarantees only for 6 characters and a single case."
By the way, what does it mean by "single case"?
Single Case usually means "lower case". Except in some OS's where it means "upper case". The point is that mixed case is not guaranteed to work.
abcdef
ABCDEF
differ only in case. This is not guaranteed to work.
The "Significance" issue is one of how many letters can be the same.
Let's say we only have 6 significant characters.
a_very_long_name
a_very_long_name_thats_too_similar
Look different, but the first 16 characters are the same. Since only 6 are significant, those are the same variable.
It means what you fear it means. For external names, the C standard at the time K&R 2nd ed. was written really does give only six case-insensitive characters! So you can't have afoobar and aFooBaz as independent entities.
This absurd limitation (which was to accommodate legacy linkers now long-gone) is no longer relevant to any environment much. The C99 standard offers 31 case-sensitive characters for external names and 63 internally, and commonly-used linkers in practice support much longer names.
It just means that if you have two variables named
abcdefghijklmnopqrstuvwxyz78901A,
and
abcdefghijklmnopqrstuvwxyz78901B,
that there is no guarantee that will be treated as different, separate variables...
It means that :
foobar1
foobar2
might be the same external name, because only the first 6 characters need be considered. The single case means that upper and lower case names need not be distinguished.
Please note that almost all modern linkers will consider much longer names, thogh there will still be a limit, dependent on the linker.
G'day,
One of the problems with this limited symbol resolution occurs at link time.
Multiple symbols with the same name can exist across several libraries and the link editor usually only takes the first one it finds that matches what it is looking for.
So, using S.Lott's example from above, if your link editor is searching for the symbol "a_very_long_name" and it finds a library on its search path that contains the symbol "a_very_long_name_thats_too_similar" it will take this one. This will happen even if the library that contains the symbol that you want, i.e. "a_very_long_name" has been specified in your command. For example specifying the libraries as:
-L/my/library/path -lmy_wrong_lib -lmy_correct_lib
There are now compiler options, or more correctly compile time options which are passed through to the link editor, which enforce a search for multiple symbols in your link path. These are then usually raised as errors at link time.
In addition, many compilers, e.g. gcc, will default to such behaviour. You have to explicitly enable multiple definitions to allow the link editor to proceed without raising a fatal error if it finds multiple definitions for a symbol.
BTW I'd highly recommend working through the exercises in conjunction with Clovis Tondo's book "The C Answer Book 2nd ed.".
Doing this really helps make C stick in your mind.
HTH
cheers,

Does C have namespaces similar to C++?

From Programming Language Pragmatics by Michael Scott
Modern versions of C and C++ include a namespace mechanism
that provides module-like data hiding
Does C have namespaces similar to C++?
Are the "identifier name spaces" mentioned in C in a Nutshell the "namespaces" mentioned in Scott's book, and similar to namespaces in C++?
Thanks.
No, C does not have a namespace mechanism whereby you can provide “module-like data hiding”.
book quality
I do not know anything about the book you cited, but the word “namespaces” is one of those that gets overloaded to a lot of different meanings, just like “window”. (I question the validity of anything the author says for getting such a major point about one of the world’s oldest and most widespread computer languages so brazenly wrong.)
name spaces in C
“Name spaces” in C are a completely different mechanism, working for a completely different purpose. These are the name spaces discussed in “C in a Nutshell”. The words mean something different than C++ namespaces. Since David Rankin bothered to lookup chapter and section referencing the C11 Standard, these are the name spaces used in C:
label names
struct/union/enum tags
struct/union members
everything else (including enum values)
a quick blurb about scope
Keep in mind that this says nothing about scope, which is a separate mechanism. For example, a global variable and a variable local to a function may have the same name; nevertheless they share the same name space. The difference is that the global’s visibility is obscured by the local variable.
value of namespaces in C++
It is still unclear whether namespaces were a very useful extension to C++, and the argument as to its righteousness continues. The C crowd (mostly) agrees that the headache that adding namespaces would involve doesn’t justify the ends. I couldn’t find anything particularly useful on the interwebs right off the top of my keyboard, except for a couple of bland blurbs about emulating them using structs or (even worse) using macro abuse. If you really want to dig, you could probably find some useful discussions archived on the comp.lang.c newsgroup.
No, C has nothing like C++ namespaces. Most people have to fake what C++ does using a kind of underscore notation at best. This is at least what I do instead of trying to pack things into structs. Your IDE will still help with code assists, you just have to get used to using the underscore instead of a . for everything
C++
MyNamespace::MyObject.myMethodOrVar ...
Ends up looking like this in C
MyNamespace_MyObject_myMethodOrVar
May not be as smooth as C++ or Java, but it works and still helps avoid name collision. It's just a pain in the ass.
And yes, this doesn't give you syntactic devices like use. It is what it is I'm afraid.

Large C macros. What's the benefit?

I've been working with a large codebase written primarily by programmers who no longer work at the company. One of the programmers apparently had a special place in his heart for very long macros. The only benefit I can see to using macros is being able to write functions that don't need to be passed in all their parameters (which is recommended against in a best practices guide I've read). Other than that I see no benefit over an inline function.
Some of the macros are so complicated I have a hard time imagining someone even writing them. I tried creating one in that spirit and it was a nightmare. Debugging is extremely difficult, as it takes N+ lines of code into 1 in the a debugger (e.g. there was a segfault somewhere in this large block of code. Good luck!). I had to actually pull the macro out and run it un-macro-tized to debug it. The only way I could see the person having written these is by automatically generating them out of code written in a function after he had debugged it (or by being smarter than me and writing it perfectly the first time, which is always possible I guess).
Am I missing something? Am I crazy? Are there debugging tricks I'm not aware of? Please fill me in. I would really like to hear from the macro-lovers in the audience. :)
To me the best use of macros is to compress code and reduce errors. The downside is obviously in debugging, so they have to be used with care.
I tend to think that if the resulting code isn't an order of magnitude smaller and less prone to errors (meaning the macros take care of some bookkeeping details) then it wasn't worth it.
In C++, many uses like this can be replaced with templates, but not all. A simple example of Macros that are useful are in the event handler macros of MFC -- without them, creating event tables would be much harder to get right and the code you'd have to write (and read) would be much more complex.
If the macros are extremely long, they probably make the code short but efficient. In effect, he might have used macros to explicitly inline code or remove decision points from the run-time code path.
It might be important to understand that, in the past, such optimizations weren't done by many compilers, and some things that we take for granted today, like fast function calls, weren't valid then.
To me, macros are evil. With their so many side effects, and the fact that in C++ you can gain same perf gains with inline, they are not worth the risk.
For ex. see this short macro:
#define max(a, b) ((a)>(b)?(a):(b))
then try this call:
max(i++, j++)
More. Say you have
#define PLANETS 8
#define SOCCER_MIDDLE_RIGHT 8
if an error is thrown, it will refer to '8', but not either of its meaninful representations.
I only know of two reasons for doing what you describe.
First is to force functions to be inlined. This is pretty much pointless, since the inline keyword usually does the same thing, and function inlining is often a premature micro-optimization anyway.
Second is to simulate nested functions in C or C++. This is related to your "writing functions that don't need to be passed in all their parameters" but can actually be quite a bit more powerful than that. Walter Bright gives examples of where nested functions can be useful.
There are other reasons to use of macros, such as using preprocessor-specific functionality (like including __FILE__ and __LINE__ in autogenerated error messages) or reducing boilerplate code in ways that functions and templates can't (the Boost.Preprocessor library excels here; see Boost.ScopeExit or this sample enum code for examples), but these reasons don't seem to apply for doing what you describe.
Very long macros will have performance drawbacks, like increased compiled binary size, and there are certainly other reasons for not using them.
For the most problematic macros, I would consider running the code through the preprocessor, and replacing the macro output with function calls (inline if possible) or straight LOC. If the macros exists for compatibility with other architectures/OS's, you might be stuck though.
Part of the benefit is code replication without the eventual maintenance cost - that is, instead of copying code elsewhere you create a macro from it and only have to edit it once...
Of course, you could also just make a method to be called but that is sort of more work... I'm against much macro use myself, just trying to present a potential rationale.
There are a number of good reasons to write macros in C.
Some of the most important are for creating configuration tables using x-macros, for making function like macros that can accept multiple parameter types as inputs and converting tables from human readable/configurable/understandable values into computer used values.
I cant really see a reason for people to write very long macros, except for the historic automatic function inline.
I would say that when debugging complex macros, (when writing X macros etc) I tend to preprocess the source file and substitute the preprocessed file for the original.
This allows you to see the C code generated, and gives you real lines to work with in the debugger.
I don't use macros at all. Inline functions serve every useful purpose a macro can do. Macro allow you to do very weird and counterintuitive things like splitting up identifiers (How does someone search for the identifier then?).
I have also worked on a product where a legacy programmer (who thankfully is long gone) also had a special love affair with Macros. His 'custom' scripting language is the height of sloppiness. This was compounded by the fact that he wrote his C++ classes in C, meaning all class functions and variables were all public. Anyways, he wrote almost everything in macro's and variadic functions (Another hideous monstrosity foisted on the world). So instead of writing a proper template class he would use a Macro instead! He also resorted to macro's to create factory classes as well, instead of normal code... His code is pretty much unmaintanable.
From what I have seen, macro's can be used when they are small and are used declaratively and don't contain moving parts like loops, and other program flow expressions. It's OK if the macro is one or at the most two lines long and it declares and instance of something. Something that won't break during runtime. Also macro's should not contain class definitions, or function definitions. If the macro contains code that needs to be stepped into using a debugger than the macro should be removed and replace with something else.
They can also be useful for wrapping custom tracing/debugging functionality. For instance you want custom tracing in debug builds but not release builds.
Anyways when you are working in legacy code like that, just be sure to remove a bit of the macro mess a bit at a time. If you keep it up, with enough time eventually you will remove them all and make life a bit easier for yourself. I have done this in the past, with especially messy macro's. What I do is turn on the compiler switch to have the preprocessor generate an output file. Then I raid that file, and copy the code, re-indent it, and replace the macro with the generated code. Thank goodness for that compiler feature.
Some of the legacy code I've worked with used macros very extensively in the place of methods. The reasoning was that the computer/OS/runtime had an extremely small stack, so that stack overflows were a common problem. Using macros instead of methods meant that there were fewer methods on the stack.
Luckily, most of that code was obsolete, so it is (mostly) gone now.
C89 did not have inline functions. If using a compiler with extensions disabled (which is a desirable thing to do for several reasons), then the macro might be the only option.
Although C99 came out in 1999, there was resistance to it for a long time; commercial compiler vendors didn't feel it was worth their time to implement C99. Some (e.g. MS) still haven't. So for many companies it was not a viable practical decision to use C99 conforming mode, even up to today in the case of some compilers.
I have used C89 compilers that did have an extension for inline functions, but the extension was buggy (e.g. multiple definition errors when there should not be), things like that may dissuade a programmer from using inline functions.
Another thing is that the macro version effectively forces that the function will actually be inlined. The C99 inline keyword is only a compiler hint and the compiler may still decide to generate a single instance of the function code which is linked like a non-inline function. (One compiler that I still use will do this if the function is not trivial and returning void).

(K&R) At least the first 31 characters of an internal name are significant?

When taken literally, it makes sense, but what exactly does it mean to be a significant character of a variable name?
I'm a beginning learner of C using K&R. Here's a direct quote from the book:
"At least the first 31 characters of an internal name are significant. For function names and external variables, the number may be less than 31, because external names may be used by assemblers and loaders over which the language has no control. For external names, the standard guarantees only for 6 characters and a single case."
By the way, what does it mean by "single case"?
Single Case usually means "lower case". Except in some OS's where it means "upper case". The point is that mixed case is not guaranteed to work.
abcdef
ABCDEF
differ only in case. This is not guaranteed to work.
The "Significance" issue is one of how many letters can be the same.
Let's say we only have 6 significant characters.
a_very_long_name
a_very_long_name_thats_too_similar
Look different, but the first 16 characters are the same. Since only 6 are significant, those are the same variable.
It means what you fear it means. For external names, the C standard at the time K&R 2nd ed. was written really does give only six case-insensitive characters! So you can't have afoobar and aFooBaz as independent entities.
This absurd limitation (which was to accommodate legacy linkers now long-gone) is no longer relevant to any environment much. The C99 standard offers 31 case-sensitive characters for external names and 63 internally, and commonly-used linkers in practice support much longer names.
It just means that if you have two variables named
abcdefghijklmnopqrstuvwxyz78901A,
and
abcdefghijklmnopqrstuvwxyz78901B,
that there is no guarantee that will be treated as different, separate variables...
It means that :
foobar1
foobar2
might be the same external name, because only the first 6 characters need be considered. The single case means that upper and lower case names need not be distinguished.
Please note that almost all modern linkers will consider much longer names, thogh there will still be a limit, dependent on the linker.
G'day,
One of the problems with this limited symbol resolution occurs at link time.
Multiple symbols with the same name can exist across several libraries and the link editor usually only takes the first one it finds that matches what it is looking for.
So, using S.Lott's example from above, if your link editor is searching for the symbol "a_very_long_name" and it finds a library on its search path that contains the symbol "a_very_long_name_thats_too_similar" it will take this one. This will happen even if the library that contains the symbol that you want, i.e. "a_very_long_name" has been specified in your command. For example specifying the libraries as:
-L/my/library/path -lmy_wrong_lib -lmy_correct_lib
There are now compiler options, or more correctly compile time options which are passed through to the link editor, which enforce a search for multiple symbols in your link path. These are then usually raised as errors at link time.
In addition, many compilers, e.g. gcc, will default to such behaviour. You have to explicitly enable multiple definitions to allow the link editor to proceed without raising a fatal error if it finds multiple definitions for a symbol.
BTW I'd highly recommend working through the exercises in conjunction with Clovis Tondo's book "The C Answer Book 2nd ed.".
Doing this really helps make C stick in your mind.
HTH
cheers,

Resources