I am looking through some C source code and I don't understand the following part
#if 1
typedef unsigned short PronId;
typedef unsigned short LMId;
# define LM_NGRAM_INT
#else
typedef unsigned int LMId;
typedef unsigned int PronId;
# undef LM_NGRAM_INT
#endif
Why would someone do #if 1? Isn't it true that only the first block will ever be processed?
Yes.. Only the first block will be processed --- until someone changes the 1 to a 0. Then the other block will be compiled. This is a convenient way to temporary switch blocks of code in and out while testing different algorithms.
So that one can quickly choose which part to compile by changing the #if 1 to #if 0.
One of the fundamental properties of software is that computer program is cheap to modify.
That's why certain code is written in such a way that it will make modification easier. That's why they need various patterns, like "interface", or "proxy".
And that's why you sometimes see weird constructs like #if 1-#else-#endif, an only purpose of which is to easily switch the part of code that will be compiled, by small effort: changing 1 to 0.
I put that in my code when I need to test different set of parameters. Usually my product will ship with different defaults than what I can work with in a debug environment, so I put the shipping defaults in a #if 1 and the debug defaults in the #else with a #warning to warn me it's being built with debug defaults.
For experimenting with various code paths.
It is just a different way to comment out big piece of code, so, editor auto indentation would not break indentation (commented block of code would be indented as text, not as code).
I'm actually using it as a kludge to make code folding easier; if I wrap a section of code in an #if 1 ... #endif, I can fold it in my editor. (The code in question is very macro-heavy, and not written by me, so more traditional ways of making a huge block of code manageable won't work.)
The cleaner way of doing it is probably doing something like:
#if ALGO1
#else
#endif
But, you will have to pass in ALGO1 to the compiler args somewhere...for example in a makefile, you need to add -DALGO1=1 (if no 1 is provided, 1 is assumed). Ref: http://www.amath.unc.edu/sysadmin/DOC4.0/c-compiler/user_guide/cc_options.doc.html
This is more work...so, usually, for quick checks, #if 1 is used. And in some cases, forgotten and left behind as well :-)
It's another way of saying for #if true it was most likely a result of code that was previously checking for another symbol then refactored to always be true.
Related
Original Question
What I'd like is not a standard C pre-processor, but a variation on it which would accept from somewhere - probably the command line via -DNAME1 and -UNAME2 options - a specification of which macros are defined, and would then eliminate dead code.
It may be easier to understand what I'm after with some examples:
#ifdef NAME1
#define ALBUQUERQUE "ambidextrous"
#else
#define PHANTASMAGORIA "ghostly"
#endif
If the command were run with '-DNAME1', the output would be:
#define ALBUQUERQUE "ambidextrous"
If the command were run with '-UNAME1', the output would be:
#define PHANTASMAGORIA "ghostly"
If the command were run with neither option, the output would be the same as the input.
This is a simple case - I'd be hoping that the code could handle more complex cases too.
To illustrate with a real-world but still simple example:
#ifdef USE_VOID
#ifdef PLATFORM1
#define VOID void
#else
#undef VOID
typedef void VOID;
#endif /* PLATFORM1 */
typedef void * VOIDPTR;
#else
typedef mint VOID;
typedef char * VOIDPTR;
#endif /* USE_VOID */
I'd like to run the command with -DUSE_VOID -UPLATFORM1 and get the output:
#undef VOID
typedef void VOID;
typedef void * VOIDPTR;
Another example:
#ifndef DOUBLEPAD
#if (defined NT) || (defined OLDUNIX)
#define DOUBLEPAD 8
#else
#define DOUBLEPAD 0
#endif /* NT */
#endif /* !DOUBLEPAD */
Ideally, I'd like to run with -UOLDUNIX and get the output:
#ifndef DOUBLEPAD
#if (defined NT)
#define DOUBLEPAD 8
#else
#define DOUBLEPAD 0
#endif /* NT */
#endif /* !DOUBLEPAD */
This may be pushing my luck!
Motivation: large, ancient code base with lots of conditional code. Many of the conditions no longer apply - the OLDUNIX platform, for example, is no longer made and no longer supported, so there is no need to have references to it in the code. Other conditions are always true. For example, features are added with conditional compilation so that a single version of the code can be used for both older versions of the software where the feature is not available and newer versions where it is available (more or less). Eventually, the old versions without the feature are no longer supported - everything uses the feature - so the condition on whether the feature is present or not should be removed, and the 'when feature is absent' code should be removed too. I'd like to have a tool to do the job automatically because it will be faster and more reliable than doing it manually (which is rather critical when the code base includes 21,500 source files).
(A really clever version of the tool might read #include'd files to determine whether the control macros - those specified by -D or -U on the command line - are defined in those files. I'm not sure whether that's truly helpful except as a backup diagnostic. Whatever else it does, though, the pseudo-pre-processor must not expand macros or include files verbatim. The output must be source similar to, but usually simpler than, the input code.)
Status Report (one year later)
After a year of use, I am very happy with 'sunifdef' recommended by the selected answer. It hasn't made a mistake yet, and I don't expect it to. The only quibble I have with it is stylistic. Given an input such as:
#if (defined(A) && defined(B)) || defined(C) || (defined(D) && defined(E))
and run with '-UC' (C is never defined), the output is:
#if defined(A) && defined(B) || defined(D) && defined(E)
This is technically correct because '&&' binds tighter than '||', but it is an open invitation to confusion. I would much prefer it to include parentheses around the sets of '&&' conditions, as in the original:
#if (defined(A) && defined(B)) || (defined(D) && defined(E))
However, given the obscurity of some of the code I have to work with, for that to be the biggest nit-pick is a strong compliment; it is valuable tool to me.
The New Kid on the Block
Having checked the URL for inclusion in the information above, I see that (as predicted) there is an new program called Coan that is the successor to 'sunifdef'. It is available on SourceForge and has been since January 2010. I'll be checking it out...further reports later this year, or maybe next year, or sometime, or never.
I know absolutely nothing about C, but it sounds like you are looking for something like unifdef. Note that it hasn't been updated since 2000, but there is a successor called "Son of unifdef" (sunifdef).
Also you can try this tool http://coan2.sourceforge.net/
something like this will remove ifdef blocks:
coan source -UYOUR_FLAG --filter c,h --recurse YourSourceTree
I used unifdef years ago for just the sort of problem you describe, and it worked fine. Even if it hasn't been updated since 2000, the syntax of preprocessor ifdefs hasn't changed materially since then, so I expect it will still do what you want. I suppose there might be some compile problems, although the packages appear recent.
I've never used sunifdef, so I can't comment on it directly.
Around 2004 I wrote a tool that did exactly what you are looking for. I never got around to distributing the tool, but the code can be found here:
http://casey.dnsalias.org/exifdef-0.2.zip (that's a dsl link)
It's about 1.7k lines and implements enough of the C grammar to parse preprocessor statements, comments, and strings using bison and flex.
If you need something similar to a preprocessor, the flexible solution is Wave (from boost). It's a library designed to build C-preprocessor-like tools (including such things as C++03 and C++0x preprocessors). As it's a library, you can hook into its input and output code.
To avoid impossible situation one could reduce the problem to two cases.
Case 1
The first (simplest) case is situation where the preprocessor has a chance to detect it, that is there's a preprocessor directive that depends on a macro being predefined (that is defined before the first line of input) or not. For example:
#ifdef FOO
#define BAR 42
#else
#define BAR 43
#endif
depends on FOO being predefined or not. However the file
#undef FOO
#ifdef FOO
#define BAR 42
#endif
does not. A harder case would be to detect if the dependency actually does matter, which it doesn't in the above cases (as neither FOO or BAR affects the output).
Case 2
The second (harder) case is where successful compilation depends on predefined macros:
INLINE int fubar(void) {
return 42;
}
which is perfectly fine as far as the preprocessor is concerned whether or not ENTRY_POINT is predefined, but unless INLINE is carefully defined that code won't compile. Similarily we could in this case it might be possible to exclude cases where the output isn't affected, but I can't find an example of that. The complication here is that in the example:
int fubar(void) {
return 42;
}
the fubar being predefined can alter the successful compilation of this, so one would probably need to restrict it to cases where a symbol need to be predefined in order to compile successfully.
I guess such a tool would be something similar to a preprocessor (and C parser in the second case). The question is if there is such a tool? Or is there a tool that only handles the first case? Or none at all?
In C everything can be (re)defined, so there is no way to know in advance what is intended to be (re)defined. Usually some naming conventions helps us to figure out what is meant to be a macro (like upper-case). Therefore it is not possible to have such tool. Of course if you assume that the compilation errors are caused by missing macro definitions then you can use them to analyze what is missing.
In short: I want to generate two different source trees from the current one, based only on one preprocessor macro being defined and another being undefined, with no other changes to the source.
If you are interested, here is my story...
In the beginning, my code was clean. Then we made a new product, and yea, it was better. But the code saw only the same peripheral devices, so we could keep the same code.
Well, almost.
There was one little condition that needed to be changed, so I added:
#if defined(PRODUCT_A)
condition = checkCat();
#elif defined(PRODUCT_B)
condition = checkCat() && checkHat();
#endif
...to one and only one source file. In the general all-source-files-include-this header file, I had:
#if !(defined(PRODUCT_A)||defined(PRODUCT_B))
#error "Don't make me replace you with a small shell script. RTFM."
#endif
...so that people couldn't compile it unless they explicitly defined a product type.
All was well. Oh... except that modifications were made, components changed, and since the new hardware worked better we could significantly re-write the control systems. Now when I look upon the face of the code, there are more than 60 separate areas delineated by either:
#ifdef PRODUCT_A
...
#else
...
#endif
...or the same, but for PRODUCT_B. Or even:
#if defined(PRODUCT_A)
...
#elif defined(PRODUCT_B)
...
#endif
And of course, sometimes sanity took a longer holiday and:
#ifdef PRODUCT_A
...
#endif
#ifdef PRODUCT_B
...
#endif
These conditions wrap anywhere from one to two hundred lines (you'd think that the last one could be done by switching header files, but the function names need to be the same).
This is insane. I would be better off maintaining two separate product-based branches in the source repo and porting any common changes. I realise this now.
Is there something that can generate the two different source trees I need, based only on PRODUCT_A being defined and PRODUCT_B being undefined (and vice-versa), without touching anything else (ie. no header inclusion, no macro expansion, etc)?
I believe Coan will do what you're looking for. From the link:
Given a configuration and some source code, Coan can answer a range of questions about how the source code would appear to the C/C++ preprocessor if that configuration of symbols had been applied in advance.
And also:
Source code re-written by Coan is not preprocessed code as produced by the C preprocessor. It still contains comments, macro-invocations, and #-directives. It is still source code, but simplified in accordance with the chosen configuration.
So you could run it twice, first specifying product A and then product B.
I'm curious as to why I see nearly all C macros formatted like this:
#ifndef FOO
# define FOO
#endif
Or this:
#ifndef FOO
#define FOO
#endif
But never this:
#ifndef FOO
#define FOO
#endif
(moreover, vim's = operator only seems to count the first two as correct.)
Is this due to portability issues among compilers, or is it just a standard practice?
I've seen it done all three ways, it seems to be a matter of style, not of syntax
While usually the second example is the most common, i've seen cases where the first (or third) is used to help distinguish multiple levels of #ifdefs. Sometimes the logic can become deeply nested and the only way to understand it at a glance is to use indentation much like it is common practice to indent blocks of code between { and }.
IIRC, older C preprocessors required the # to be the first character on the line (though I've never actually encountered one that had this requirement).
I never seen your code like your first example. I usually wrote preprocessor directives as in your second example. I found that it visually interfered with the indentation of the actual code less (not that I write in C anymore).
The GNU C Preprocessor manual says:
Preprocessing directives are lines in
your program that start with '#'.
Whitespace is allowed before and after
the '#'.
For preference I use the third style, with the exception of include guards, for which I use the second style.
I don't like the first style at all - I think of #define as being a preprocessor instruction, even though really of course it isn't, it's a # followed by the preprocessor instruction define. But since I do think of it that way, it seems wrong to separate them. I expect text editors written by people who advocate that style will have a block indent/un-indent that works on code written in that style. But I would hate to encounter it using a text editor that didn't.
There's no point pandering to ancient preprocessors where the # has to be the first character of the line, unless you can also list off the top of your head all the other differences between those implementations and standard C, in order to avoid the other things you could possibly do that they would not support. Of course if you genuinely are working with a pre-standard compiler, fair enough.
Preprocessor directives are lines included in our programs that are not actually program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#).Whitespace is allowed before and after the '#'. As soon as a newline character is found, the preprocessor directive is considered to end.
There is no other rule as far the standard of C/C++ concerned,So it remains as the matter of style and readability issue,I have seen/wrote programs only in the second way that you posted,although the third one seems more readable.
I was asked a question in C last night and I did not know the answer since I have not used C much since college so I thought maybe I could find the answer here instead of just forgetting about it.
If a person has a define such as:
#define count 1
Can that person find the variable name count using the 1 that is inside it?
I did not think so since I thought the count would point to the 1 but do not see how the 1 could point back to count.
Building on #Cade Roux's answer, if you use a preprocessor #define to associate a value with a symbol, the code won't have any reference to the symbol once the preprocessor has run:
#define COUNT (1)
...
int myVar = COUNT;
...
After the preprocessor runs:
...
int myVar = (1);
...
So as others have noted, this basically means "no", for the above reason.
The simple answer is no they can't. #Defines like that are dealt with by the preprocessor, and they only point in one direction. Of course the other problem is that even the compiler wouldn't know - as a "1" could point to anything - multiple variables can have the same value at the same time.
Can that person find the variable name "count" using the 1 that is inside it?
No
As I'm sure someone more eloquent and versed than me will point out #define'd things aren't compiled into the source, what you have is a pre-processor macro which will go through the source and change all instance of 'count' it finds with a '1'.
However, to shed more light on the question you were asked, because C is a compiled language down to the machine code you are never going to have the reflection and introspection you have with a language like Java, or C#. All the naming is lost after compilation unless you have a framework built around your source/compiler to do some nifty stuff.
Hope this helps. (excuse the pun)
Unfortunately this is not possible.
#define statements are instructions for the preprocessor, all instances of count are replaced with 1. At runtime there is no memory location associated with count, so the effort is obviously futile.
Even if you're using variables, after compilation there will be no remnants of the original identifiers used in the program. This is generally only possible in dynamic languages.
One trick used in C is using the # syntax in macros to obtain the string literal of the of the macro parameter.
#define displayInt(val) printf("%s: %d\n",#val,val)
#define displayFloat(val) printf("%s: %d\n",#val,val)
#define displayString(val) printf("%s: %s\n",#val,val)
int main(){
int foo=123;
float bar=456.789;
char thud[]="this is a string";
displayInt(foo);
displayFloat(bar);
displayString(thud);
return 0;
}
The output should look something like the following:
foo: 123
bar: 456.789
thud: this is a string
#define count 1 is a very bad idea, because it prevents you from naming any variables or structure fields count.
For example:
void copyString(char* dst, const char* src, size_t count) {
...
}
Your count macro will cause the variable name to be replaced with 1, preventing this function from compiling:
void copyString(char* dst, const char* src, size_t 1) {
...
}
C defines are a pre-processor directive, not a variable. The pre-processor will go through your C file and replace where you write count with what you've defined it as, before compiling. Look at the obfuscated C contest entries for some particularly enlightened uses of this and other pre-processor directives.
The point is that there is no 'count' to point at a '1' value. It just a simple/find replace operation that happens before the code is even really compiled.
I'll leave this editable for someone who actually really knows C to correct.
count isn't a variable. It has no storage allocated to it and no entry in the symbol table. It's a macro that gets replaced by the preprocessor before passing the source code to the compiler.
On the off chance that you aren't asking quite the right question, there is a way to get the name using macros:
#define SHOW(sym) (printf(#sym " = %d\n", sym))
#define count 1
SHOW(count); // prints "count = 1"
The # operator converts a macro argument to a string literal.
#define is a pre-processor directive, as such it is not a "variable"
What you have there is actually not a variable, it is a preprocessor directive. When you compile the code the preprocessor will go through and replace all instaces of the word 'count' in that file with 1.
You might be asking if I know 1 can I find that count points to it? No. Because the relationship between variables names and values is not a bijection there is no way back. Consider
int count = 1;
int count2 = 1;
perfectly legal but what should 1 resolve to?
In general, no.
Firstly, a #define is not a variable, it is a compiler preprocessor macro.
By the time the main phase of the compiler gets to work, the name has been replaced with the value, and the name "count" will not exist anywhere in the code that is compiled.
For variables, it is not possible to find out variable names in C code at runtime. That information is not kept. Unlike languages like Java or C#, C does not keep much metadata at all, in compiles down to assembly language.
Directive starting with "#" are handled by the pre-processor which usually does text substitution before passing the code to the 'real' compiler. As such, there is no variable called count, it's as if all "count" strings in your code are magically replaced with the "1" string.
So, no, no way to find that "variable".
In case of a macro this is preprocessed and the resulting output is compiled. So it is absolutely no way to find out that name because after the preprocessor finnishes his job the resulting file would contain '1' instead of 'count' everywhere in the file.
So the answer is no.
If they are looking at the C source code (which they will be in a debugger), then they will see something like
int i = count;
at that point, they can search back and find the line
#define count 1
If, however, all they have is variable iDontKnowWhat, and they can see it contans 1, there is no way to track that back to 'count'.
Why? Because the #define is evaluated at preprocessor time, which happens even before compilation (though for almost everyone, it can be viewed as the first stage of compilation). Consequently the source code is the only thing that has any information about 'count', like knowing that it ever existed. By the time the compiler gets a look in, every reference to 'count' has been replaced by the number '1'.
It's not a pointer, it's just a string/token substitution. The preprocessor replaces all the #defines before your code ever compiles. Most compilers include a -E or similar argument to emit precompiled code, so you can see what the code looks like after all the #directives are processed.
More directly to your question, there's no way to tell that a token is being replaced in code. Your code can't even tell the difference between (count == 1) and (1 == 1).
If you really want to do that, it might be possible using source file text analysis, say using a diff tool.
What do you mean by "finding"?
The line
#define count 1
defines a symbol "count" that has value 1.
The first step of the compilation process (called preprocessing) will replace every occurence of the symbol count with 1 so that if you have:
if (x > count) ...
it will be replaced by:
if (x > 1) ...
If you get this, you may see why "finding count" is meaningless.
The person asking the question (was it an interview question?) may have been trying to get you to differentiate between using #define constants versus enums. For example:
#define ZERO 0
#define ONE 1
#define TWO 2
vs
enum {
ZERO,
ONE,
TWO
};
Given the code:
x = TWO;
If you use enumerations instead of the #defines, some debuggers will be able to show you the symbolic form of the value, TWO, instead of just the numeric value of 2.