In short: I want to generate two different source trees from the current one, based only on one preprocessor macro being defined and another being undefined, with no other changes to the source.
If you are interested, here is my story...
In the beginning, my code was clean. Then we made a new product, and yea, it was better. But the code saw only the same peripheral devices, so we could keep the same code.
Well, almost.
There was one little condition that needed to be changed, so I added:
#if defined(PRODUCT_A)
condition = checkCat();
#elif defined(PRODUCT_B)
condition = checkCat() && checkHat();
#endif
...to one and only one source file. In the general all-source-files-include-this header file, I had:
#if !(defined(PRODUCT_A)||defined(PRODUCT_B))
#error "Don't make me replace you with a small shell script. RTFM."
#endif
...so that people couldn't compile it unless they explicitly defined a product type.
All was well. Oh... except that modifications were made, components changed, and since the new hardware worked better we could significantly re-write the control systems. Now when I look upon the face of the code, there are more than 60 separate areas delineated by either:
#ifdef PRODUCT_A
...
#else
...
#endif
...or the same, but for PRODUCT_B. Or even:
#if defined(PRODUCT_A)
...
#elif defined(PRODUCT_B)
...
#endif
And of course, sometimes sanity took a longer holiday and:
#ifdef PRODUCT_A
...
#endif
#ifdef PRODUCT_B
...
#endif
These conditions wrap anywhere from one to two hundred lines (you'd think that the last one could be done by switching header files, but the function names need to be the same).
This is insane. I would be better off maintaining two separate product-based branches in the source repo and porting any common changes. I realise this now.
Is there something that can generate the two different source trees I need, based only on PRODUCT_A being defined and PRODUCT_B being undefined (and vice-versa), without touching anything else (ie. no header inclusion, no macro expansion, etc)?
I believe Coan will do what you're looking for. From the link:
Given a configuration and some source code, Coan can answer a range of questions about how the source code would appear to the C/C++ preprocessor if that configuration of symbols had been applied in advance.
And also:
Source code re-written by Coan is not preprocessed code as produced by the C preprocessor. It still contains comments, macro-invocations, and #-directives. It is still source code, but simplified in accordance with the chosen configuration.
So you could run it twice, first specifying product A and then product B.
Related
Original Question
What I'd like is not a standard C pre-processor, but a variation on it which would accept from somewhere - probably the command line via -DNAME1 and -UNAME2 options - a specification of which macros are defined, and would then eliminate dead code.
It may be easier to understand what I'm after with some examples:
#ifdef NAME1
#define ALBUQUERQUE "ambidextrous"
#else
#define PHANTASMAGORIA "ghostly"
#endif
If the command were run with '-DNAME1', the output would be:
#define ALBUQUERQUE "ambidextrous"
If the command were run with '-UNAME1', the output would be:
#define PHANTASMAGORIA "ghostly"
If the command were run with neither option, the output would be the same as the input.
This is a simple case - I'd be hoping that the code could handle more complex cases too.
To illustrate with a real-world but still simple example:
#ifdef USE_VOID
#ifdef PLATFORM1
#define VOID void
#else
#undef VOID
typedef void VOID;
#endif /* PLATFORM1 */
typedef void * VOIDPTR;
#else
typedef mint VOID;
typedef char * VOIDPTR;
#endif /* USE_VOID */
I'd like to run the command with -DUSE_VOID -UPLATFORM1 and get the output:
#undef VOID
typedef void VOID;
typedef void * VOIDPTR;
Another example:
#ifndef DOUBLEPAD
#if (defined NT) || (defined OLDUNIX)
#define DOUBLEPAD 8
#else
#define DOUBLEPAD 0
#endif /* NT */
#endif /* !DOUBLEPAD */
Ideally, I'd like to run with -UOLDUNIX and get the output:
#ifndef DOUBLEPAD
#if (defined NT)
#define DOUBLEPAD 8
#else
#define DOUBLEPAD 0
#endif /* NT */
#endif /* !DOUBLEPAD */
This may be pushing my luck!
Motivation: large, ancient code base with lots of conditional code. Many of the conditions no longer apply - the OLDUNIX platform, for example, is no longer made and no longer supported, so there is no need to have references to it in the code. Other conditions are always true. For example, features are added with conditional compilation so that a single version of the code can be used for both older versions of the software where the feature is not available and newer versions where it is available (more or less). Eventually, the old versions without the feature are no longer supported - everything uses the feature - so the condition on whether the feature is present or not should be removed, and the 'when feature is absent' code should be removed too. I'd like to have a tool to do the job automatically because it will be faster and more reliable than doing it manually (which is rather critical when the code base includes 21,500 source files).
(A really clever version of the tool might read #include'd files to determine whether the control macros - those specified by -D or -U on the command line - are defined in those files. I'm not sure whether that's truly helpful except as a backup diagnostic. Whatever else it does, though, the pseudo-pre-processor must not expand macros or include files verbatim. The output must be source similar to, but usually simpler than, the input code.)
Status Report (one year later)
After a year of use, I am very happy with 'sunifdef' recommended by the selected answer. It hasn't made a mistake yet, and I don't expect it to. The only quibble I have with it is stylistic. Given an input such as:
#if (defined(A) && defined(B)) || defined(C) || (defined(D) && defined(E))
and run with '-UC' (C is never defined), the output is:
#if defined(A) && defined(B) || defined(D) && defined(E)
This is technically correct because '&&' binds tighter than '||', but it is an open invitation to confusion. I would much prefer it to include parentheses around the sets of '&&' conditions, as in the original:
#if (defined(A) && defined(B)) || (defined(D) && defined(E))
However, given the obscurity of some of the code I have to work with, for that to be the biggest nit-pick is a strong compliment; it is valuable tool to me.
The New Kid on the Block
Having checked the URL for inclusion in the information above, I see that (as predicted) there is an new program called Coan that is the successor to 'sunifdef'. It is available on SourceForge and has been since January 2010. I'll be checking it out...further reports later this year, or maybe next year, or sometime, or never.
I know absolutely nothing about C, but it sounds like you are looking for something like unifdef. Note that it hasn't been updated since 2000, but there is a successor called "Son of unifdef" (sunifdef).
Also you can try this tool http://coan2.sourceforge.net/
something like this will remove ifdef blocks:
coan source -UYOUR_FLAG --filter c,h --recurse YourSourceTree
I used unifdef years ago for just the sort of problem you describe, and it worked fine. Even if it hasn't been updated since 2000, the syntax of preprocessor ifdefs hasn't changed materially since then, so I expect it will still do what you want. I suppose there might be some compile problems, although the packages appear recent.
I've never used sunifdef, so I can't comment on it directly.
Around 2004 I wrote a tool that did exactly what you are looking for. I never got around to distributing the tool, but the code can be found here:
http://casey.dnsalias.org/exifdef-0.2.zip (that's a dsl link)
It's about 1.7k lines and implements enough of the C grammar to parse preprocessor statements, comments, and strings using bison and flex.
If you need something similar to a preprocessor, the flexible solution is Wave (from boost). It's a library designed to build C-preprocessor-like tools (including such things as C++03 and C++0x preprocessors). As it's a library, you can hook into its input and output code.
I was reading the C Preprocessor guide page on gnu.org on computed includes which has the following explanation:
2.6 Computed Includes
Sometimes it is necessary to select one of several different header
files to be included into your program. They might specify
configuration parameters to be used on different sorts of operating
systems, for instance. You could do this with a series of
conditionals,
#if SYSTEM_1
# include "system_1.h"
#elif SYSTEM_2
# include "system_2.h"
#elif SYSTEM_3 …
#endif
That rapidly becomes tedious. Instead, the preprocessor offers the
ability to use a macro for the header name. This is called a computed
include. Instead of writing a header name as the direct argument of
‘#include’, you simply put a macro name there instead:
#define SYSTEM_H "system_1.h"
…
#include SYSTEM_H
This doesn't make sense to me. The first code snippet allows for optionality based on which system type you encounter by using branching if elifs. The second seems to have no optionality as a macro is used to define a particular system type and then the macro is placed into the include statement without any code that would imply its definition can be changed. Yet, the text implies these are equivalent and that the second is a shorthand for the first. Can anyone explain how the optionality of the first code snippet exists in the second? I also don't know what code is implied to be contained in the "..." in the second code snippet.
There's some other places in the code or build system that define or don't define the macros that are being tested in the conditionals. What's suggested is that instead of those places defining lots of different SYSTEM_1, SYSTEM_2, etc. macros, they'll just define SYSTEM_H to the value that's desired.
Most likely this won't actually be in an explicit #define, instead of will be in a compiler option, e.g.
gcc -DSYSTEM_H='"system_1.h"' ...
And this will most likely actually come from a setting in a makefile or other configuration file.
Is there any meaningful downside, on modern compilers, to putting comments starting at the beginning of a header file?
That is, something like the following in great_header.h:
/*
* this file defines the secret to life
* etc
* (c) 2017 ascended being
*/
#pragma once
#ifndef NAMESPACE_GREAT_HEADER_H_
#define NAMESPACE_GREAT_HEADER_H_
... (actual contents)
#endif // ifndef NAMESPACE_GREAT_HEADER_H_
In the past, I remember caveats such as "#pragma once only first if it is the first line in the file", and similar rules for include-guard optimization - but I'm not sure if that is still the case. It would be convenient for me, and for automated tools which extract top-of-header info if comments could be the first thing in the file.
According to GCC Preprocessor Internals manual, the multiple include optimization mechanism is not affected by comments:
The Multiple-Include Optimization
Under what circumstances is such an optimization valid? If the file were included a second time, it can only be optimized away if that inclusion would result in no tokens to return, and no relevant directives to process. Therefore the current implementation imposes requirements and makes some allowances as follows:
There must be no tokens outside the controlling #if-#endif pair, but
whitespace and comments are permitted.
It doesn't mention #pragma once there, which I suspect is treated separately. Referring to 2.5 Alternatives to Wrapper #ifndef:
Another way to prevent a header file from being included more than
once is with the ‘#pragma once’ directive. If ‘#pragma once’ is seen
when scanning a header file, that file will never be read again, no
matter what.
I am looking through some C source code and I don't understand the following part
#if 1
typedef unsigned short PronId;
typedef unsigned short LMId;
# define LM_NGRAM_INT
#else
typedef unsigned int LMId;
typedef unsigned int PronId;
# undef LM_NGRAM_INT
#endif
Why would someone do #if 1? Isn't it true that only the first block will ever be processed?
Yes.. Only the first block will be processed --- until someone changes the 1 to a 0. Then the other block will be compiled. This is a convenient way to temporary switch blocks of code in and out while testing different algorithms.
So that one can quickly choose which part to compile by changing the #if 1 to #if 0.
One of the fundamental properties of software is that computer program is cheap to modify.
That's why certain code is written in such a way that it will make modification easier. That's why they need various patterns, like "interface", or "proxy".
And that's why you sometimes see weird constructs like #if 1-#else-#endif, an only purpose of which is to easily switch the part of code that will be compiled, by small effort: changing 1 to 0.
I put that in my code when I need to test different set of parameters. Usually my product will ship with different defaults than what I can work with in a debug environment, so I put the shipping defaults in a #if 1 and the debug defaults in the #else with a #warning to warn me it's being built with debug defaults.
For experimenting with various code paths.
It is just a different way to comment out big piece of code, so, editor auto indentation would not break indentation (commented block of code would be indented as text, not as code).
I'm actually using it as a kludge to make code folding easier; if I wrap a section of code in an #if 1 ... #endif, I can fold it in my editor. (The code in question is very macro-heavy, and not written by me, so more traditional ways of making a huge block of code manageable won't work.)
The cleaner way of doing it is probably doing something like:
#if ALGO1
#else
#endif
But, you will have to pass in ALGO1 to the compiler args somewhere...for example in a makefile, you need to add -DALGO1=1 (if no 1 is provided, 1 is assumed). Ref: http://www.amath.unc.edu/sysadmin/DOC4.0/c-compiler/user_guide/cc_options.doc.html
This is more work...so, usually, for quick checks, #if 1 is used. And in some cases, forgotten and left behind as well :-)
It's another way of saying for #if true it was most likely a result of code that was previously checking for another symbol then refactored to always be true.
What would the purpose of this construct in a c file be?:
#define _TIMERC
#include "timer.h"
#undef _TIMERC
I am aware of the guard for preventing multiple inclusion of a header file. This doesn't appear to be whats happening though.
thanks!
Here's a scenario to illustrate...
Lets say that timer.h provides a macro tick_count() that returns the number of timer interrupts that occured.
One module (rpm_reader.h) using timer A for interval timing:
#define _TIMERA
#include "timer.h"
#undef _TIMERA
In another module (lap_time.h) is using timer C for its interval timing
#define _TIMERC
#include "timer.h"
#undef _TIMERC
The rpm_reader would return the tick count from timer A when it called tick_count() and lap_time would get its count from timer C.
(My apologies for answering my own question, but asking the question helped me come to this revelation.)
Often times a library header file will have multiple options, that are enabled and disabled by macro defines. This will enable such an option.
More typically these are set at a global scope by configuring your build system to add (for eg with gcc) -D_TIMERC to the compilers command line.
I was wondering if it could be this:
The header file in this case is intended to allow multiple inclusions with different defines established before the each #include.
If in the timer.h there is a block of code (interrupt code) for timers A, B and C for each timer in the microcontroller. In some cases timer A is required in one module and timer C is required in another module.
I think your self-answer is right. There is most likely conditional stuff in the included header and the "calling" file knows which specific set of conditional "stuff" it wants to include.
It does not necessarily have to do with multiple includes - it can just be special cases depending on the "calling" context.
I am not exactly sure why one would undefine though. I can't think of a case where I would mix and match so not sure why an undefine is necessary.
At the risk of stating the obvious, "timer.h" expects to have _TIMERC and the rest of your code does not.
Clearly not good practice in the general case, but I have seen similar when including third party code. Can get nasty when you have #defs that clash...
For the record, common practice to avoid multiple includes of the same header file is to put the guard in the file itself, not to rely on some external define... ^_^
The headers start with:
#ifndef header_name_h
#define header_name_h
and end with:
#endif
Of course, the def style can vary.
Thus, on first inclusion, we go past the #ifndef (not yet defined) and we set the macro.
On second inclusion, if any, we just jump to end of file, nothing is included.