Is there any harm in putting comments first in C header file? - c

Is there any meaningful downside, on modern compilers, to putting comments starting at the beginning of a header file?
That is, something like the following in great_header.h:
/*
* this file defines the secret to life
* etc
* (c) 2017 ascended being
*/
#pragma once
#ifndef NAMESPACE_GREAT_HEADER_H_
#define NAMESPACE_GREAT_HEADER_H_
... (actual contents)
#endif // ifndef NAMESPACE_GREAT_HEADER_H_
In the past, I remember caveats such as "#pragma once only first if it is the first line in the file", and similar rules for include-guard optimization - but I'm not sure if that is still the case. It would be convenient for me, and for automated tools which extract top-of-header info if comments could be the first thing in the file.

According to GCC Preprocessor Internals manual, the multiple include optimization mechanism is not affected by comments:
The Multiple-Include Optimization
Under what circumstances is such an optimization valid? If the file were included a second time, it can only be optimized away if that inclusion would result in no tokens to return, and no relevant directives to process. Therefore the current implementation imposes requirements and makes some allowances as follows:
There must be no tokens outside the controlling #if-#endif pair, but
whitespace and comments are permitted.
It doesn't mention #pragma once there, which I suspect is treated separately. Referring to 2.5 Alternatives to Wrapper #ifndef:
Another way to prevent a header file from being included more than
once is with the ‘#pragma once’ directive. If ‘#pragma once’ is seen
when scanning a header file, that file will never be read again, no
matter what.

Related

Computed Includes in C

I was reading the C Preprocessor guide page on gnu.org on computed includes which has the following explanation:
2.6 Computed Includes
Sometimes it is necessary to select one of several different header
files to be included into your program. They might specify
configuration parameters to be used on different sorts of operating
systems, for instance. You could do this with a series of
conditionals,
#if SYSTEM_1
# include "system_1.h"
#elif SYSTEM_2
# include "system_2.h"
#elif SYSTEM_3 …
#endif
That rapidly becomes tedious. Instead, the preprocessor offers the
ability to use a macro for the header name. This is called a computed
include. Instead of writing a header name as the direct argument of
‘#include’, you simply put a macro name there instead:
#define SYSTEM_H "system_1.h"
…
#include SYSTEM_H
This doesn't make sense to me. The first code snippet allows for optionality based on which system type you encounter by using branching if elifs. The second seems to have no optionality as a macro is used to define a particular system type and then the macro is placed into the include statement without any code that would imply its definition can be changed. Yet, the text implies these are equivalent and that the second is a shorthand for the first. Can anyone explain how the optionality of the first code snippet exists in the second? I also don't know what code is implied to be contained in the "..." in the second code snippet.
There's some other places in the code or build system that define or don't define the macros that are being tested in the conditionals. What's suggested is that instead of those places defining lots of different SYSTEM_1, SYSTEM_2, etc. macros, they'll just define SYSTEM_H to the value that's desired.
Most likely this won't actually be in an explicit #define, instead of will be in a compiler option, e.g.
gcc -DSYSTEM_H='"system_1.h"' ...
And this will most likely actually come from a setting in a makefile or other configuration file.

Parsing C files without preprocessing it

I want to run simple analysis on C files (such as if you call foo macro with INT_TYPE as argument, then cast the response to int*), I do not want to prerprocess the file, I just want to parse it (so that, for instance, I'll have correct line numbers).
Ie, I want to get from
#include <a.h>
#define FOO(f)
int f() {FOO(1);}
an list of tokens like
<include_directive value="a.h"/>
<macro name="FOO"><param name="f"/><result/></macro>
<function name="f">
<return>int</return>
<body>
<macro_call name="FOO"><param>1</param></macro_call>
</body>
</function>
with no need to set include path, etc.
Is there any preexisting parser that does it? All parsers I know assume C is preprocessed. I want to have access to the macros and actual include instructions.
Our C Front End can parse code containing preprocesser elements can do this to fair extent and still build a usable AST. (Yes, the parse tree has precise file/line/column number information).
There are a number of restrictions, which allows it to handle most code. In those few cases it cannot handle, often a small, easy change to the source file giving equivalent code solves the problem.
Here's a rough set of rules and restrictions:
#includes and #defines can occur wherever a declaration or statement can occur, but not in the middle of a statement. These rarely cause a problem.
macro calls can occur where function calls occur in expressions, or can appear without semicolon in place of statements. Macro calls that span non-well-formed chunks are not handled well (anybody surprised?). The latter occur occasionally but not rarely and need manual revision. OP's example of "j(v,oid)*" is problematic, but this is really rare in code.
#if ... #endif must be wrapped around major language concepts (nonterminals) (constant, expression, statement, declaration, function) or sequences of such entities, or around certain non-well-formed but commonly occurring idioms, such as if (exp) {. Each arm of the conditional must contain the same kind of syntactic construct as the other arms. #if wrapped around random text used as bad kind of comment is problematic, but easily fixed in the source by making a real comment. Where these conditions are not met, you need to modify the original source code, often by moving the #if #elsif #else #end a few tokens.
In our experience, one can revise a code base of 50,000 lines in a few hours to get around these issues. While that seems annoying (and it is), the alternative is to not be able to parse the source code at all, which is far worse than annoying.
You also want more than just a parser. See Life After Parsing, to know what happens after you succeed in getting a parse tree. We've done some additional work in building symbol tables in which the declarations are recorded with the preprocessor context in which they are embedded, enabling type checking to include the preprocessor conditions.
You can have a look at this ANTLR grammar. You will have to add rules for preprocessor tokens, though.
Your specific example can be handled by writing your own parsing and ignore macro expansion.
Because FOO(1) itself can be interpreted as a function call.
When more cases are considered however, the parser is much more difficult. You can refer PDF Link to find more information.

Separate specific #ifdef branches

In short: I want to generate two different source trees from the current one, based only on one preprocessor macro being defined and another being undefined, with no other changes to the source.
If you are interested, here is my story...
In the beginning, my code was clean. Then we made a new product, and yea, it was better. But the code saw only the same peripheral devices, so we could keep the same code.
Well, almost.
There was one little condition that needed to be changed, so I added:
#if defined(PRODUCT_A)
condition = checkCat();
#elif defined(PRODUCT_B)
condition = checkCat() && checkHat();
#endif
...to one and only one source file. In the general all-source-files-include-this header file, I had:
#if !(defined(PRODUCT_A)||defined(PRODUCT_B))
#error "Don't make me replace you with a small shell script. RTFM."
#endif
...so that people couldn't compile it unless they explicitly defined a product type.
All was well. Oh... except that modifications were made, components changed, and since the new hardware worked better we could significantly re-write the control systems. Now when I look upon the face of the code, there are more than 60 separate areas delineated by either:
#ifdef PRODUCT_A
...
#else
...
#endif
...or the same, but for PRODUCT_B. Or even:
#if defined(PRODUCT_A)
...
#elif defined(PRODUCT_B)
...
#endif
And of course, sometimes sanity took a longer holiday and:
#ifdef PRODUCT_A
...
#endif
#ifdef PRODUCT_B
...
#endif
These conditions wrap anywhere from one to two hundred lines (you'd think that the last one could be done by switching header files, but the function names need to be the same).
This is insane. I would be better off maintaining two separate product-based branches in the source repo and porting any common changes. I realise this now.
Is there something that can generate the two different source trees I need, based only on PRODUCT_A being defined and PRODUCT_B being undefined (and vice-versa), without touching anything else (ie. no header inclusion, no macro expansion, etc)?
I believe Coan will do what you're looking for. From the link:
Given a configuration and some source code, Coan can answer a range of questions about how the source code would appear to the C/C++ preprocessor if that configuration of symbols had been applied in advance.
And also:
Source code re-written by Coan is not preprocessed code as produced by the C preprocessor. It still contains comments, macro-invocations, and #-directives. It is still source code, but simplified in accordance with the chosen configuration.
So you could run it twice, first specifying product A and then product B.

C macro/#define indentation?

I'm curious as to why I see nearly all C macros formatted like this:
#ifndef FOO
# define FOO
#endif
Or this:
#ifndef FOO
#define FOO
#endif
But never this:
#ifndef FOO
#define FOO
#endif
(moreover, vim's = operator only seems to count the first two as correct.)
Is this due to portability issues among compilers, or is it just a standard practice?
I've seen it done all three ways, it seems to be a matter of style, not of syntax
While usually the second example is the most common, i've seen cases where the first (or third) is used to help distinguish multiple levels of #ifdefs. Sometimes the logic can become deeply nested and the only way to understand it at a glance is to use indentation much like it is common practice to indent blocks of code between { and }.
IIRC, older C preprocessors required the # to be the first character on the line (though I've never actually encountered one that had this requirement).
I never seen your code like your first example. I usually wrote preprocessor directives as in your second example. I found that it visually interfered with the indentation of the actual code less (not that I write in C anymore).
The GNU C Preprocessor manual says:
Preprocessing directives are lines in
your program that start with '#'.
Whitespace is allowed before and after
the '#'.
For preference I use the third style, with the exception of include guards, for which I use the second style.
I don't like the first style at all - I think of #define as being a preprocessor instruction, even though really of course it isn't, it's a # followed by the preprocessor instruction define. But since I do think of it that way, it seems wrong to separate them. I expect text editors written by people who advocate that style will have a block indent/un-indent that works on code written in that style. But I would hate to encounter it using a text editor that didn't.
There's no point pandering to ancient preprocessors where the # has to be the first character of the line, unless you can also list off the top of your head all the other differences between those implementations and standard C, in order to avoid the other things you could possibly do that they would not support. Of course if you genuinely are working with a pre-standard compiler, fair enough.
Preprocessor directives are lines included in our programs that are not actually program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#).Whitespace is allowed before and after the '#'. As soon as a newline character is found, the preprocessor directive is considered to end.
There is no other rule as far the standard of C/C++ concerned,So it remains as the matter of style and readability issue,I have seen/wrote programs only in the second way that you posted,although the third one seems more readable.

How to track down cause of missing struct from include files in C?

I have a rather large project I'm porting, and in one of the MANY headers I've included a file that contains a struct definition for pmc_mdep. (prior in the file its just declared, but later its actually defined).
Trying to compile it gives me errors about that struct being an incomplete type (which I believe means that it's lacking a definition).
When I run the preprocessor over this project, it does include that file, but the preprocessor output does not have the struct definition (but does include enum's from that file).
Is there a method to figure out why some of a header file gets to the preprocessor output, and some does not?
TIA
(Also, this is not the only compile error, the port is half done but it should be at least getting past this part)
I usually just track back from the structure to find all the enclosing "#ifdef" and "#if" lines that the preprocessor will encounter and see which one is controlling the removal of the structure from the input stream into the compiler.
That generally works pretty quickly for all but the hairiest of header files (i.e., those with a great many nested conditional compile statements). For those, I generally have a look at the preprocessor output to identify the last line in the header file that made it to the compiler input stream.
Almost certainly the next line will be a conditional compile statement where you haven't met the condition for inclusion.
For example, if this is the header file, you would need to track back to see that _KERNEL should be defined in order to get the declaration and definition.
I'm afraid not; you will have to look for #ifdefs that surround your area of interest and track down why those symbols are not defined. If it's porting to Linux/UNIX and you are missing things from the standard headers, you might have not defined the right _XOPEN_SOURCE or _BSD_SOURCE in your Makefile or config.h .
The most likely reason is there's a #define somewhere around the definition. Since the corresponding symbol is not defined or defined to some other value the definition is not included even when the header itself is included. You'll have to inspect this manually.
Raymond Chen has a blog post about this.
You may find yourself in a twisty maze of #ifdefs. Or you may be wondering why your macros aren't working.
I have these lines in my header file:
#define MM_BUSY 0x0001
#define MM_IDLE 0x0002
but when I try to use them, I get errors.
sample.cpp(23): error C2065: 'MM_BUSY': undeclared identifier
sample.cpp(40): error C2065: 'MM_IDLE': undeclared identifier
Any idea why this is happening?
Solution: Use #error to track down the problem the same way you'd scatter printf around to track down a regular bug.
Source: Use the #error directive to check whether the compiler even sees you
I do not think that there is a better way beside checking the preprocessor output to know why one file is included or not. Here is the gcc's preprocessor's output format that may help you understand the preprocessor's ouput.
Also, another way you may have a try to compare the outputs between that you are porting and the existing one.
You said:
I have a rather large project I'm porting, and in one of the MANY headers I've included a file that contains a struct definition for pmc_mdep. (Prior in the file its just declared, but later its actually defined).
Trying to compile it gives me errors about that struct being an incomplete type (which I believe means that it's lacking a definition).
This error can occur if you try to embed a pmc_mdep into some other structure before you have defined a pmc_mdep fully. Note that you can embed pointers to incomplete types into structures, but not actual instances of the incomplete type.
You also discuss running the preprocessor over the file that should define the structure, and you see enums form the header, but not the structure definition. That suggests that maybe you have a stray comment that is removing the structure unintentionally, or perhaps you have the structure definition embedded between #ifdef XXX and #endif but XXX is not defined when you do the compilation. It could even be #if 0.
I'd run the C preprocessor on just the header that contains the structure definition to see what that produces; it will be shorter than trying to look at the output for the entire program (source file). If I couldn't spot the issue swiftly, I'd mark parts with something like stray enums to see which ones get through and which ones don't.

Resources