Lines of Code as a function of preprocessor definitions - c

A project I'm working on (in C) has a lot of sections of code that can be included or omitted based on compile-time configuration, using preprocessor directives.
I'm interested in estimating how many lines of code different configurations are adding to, or subtracting from, my core project. In other words, I'd like to write a few #define and #undef lines somewhere, and get a sense of what that does to the LOC count.
I'm not familiar with LOC counters, but from a cursory search, it doesn't seem like most of the easily-available tools do that. I'm assuming this isn't a difficult problem, but just a rather uncommon metric to measure.
Is there an existing tool that would do what I'm looking for, or some easy way to do it myself? Excluding comments and blank lines would be a major nice-to-have, too.

Run it through a preprocessor. For example, under gcc, use the option -E, I believe, to get just the kind of output you seem to want.
-E Stop after the preprocessing stage; do not run the compiler proper.
The output is in the form of preprocessed source code, which is sent
to the standard output.

You could get the preprocessor output from your compiler, but this might have other unwanted side effects, like expanding complex multi-line macros, and adding to the LOC count in ways you didn't expect.
Why not write your own simple pre-processor, and use your own include/exclude directives? You can make them trivially simple to parse, and then pipe your code through this pre-processor before sending it to a full featured LOC counter like CLOC.

Related

Is it possible to see the macros of a compiled C program?

I am trying to learn C and I have this C file that I want view the macros of. Is there a tool to view the macros of the compiled C file.
No. That's literally impossible.
The preprocessor is a textual replacement that happens before the main compile pass. There is no difference between using a macro and putting the code the macro expands to in its place.*
*Ignoring the debugger output. But even then you can do it if you know the right #pragma to tell it the file and line number.
They're always defined in the header file(s) that you've imported with #include, or that those files in turn #include.
This may involve a lot of digging. It may involve going into files that make no sense to you because they're not written for casual inspection.
Any macros of any importance are usually documented. They may use other more complex implementation-specific macros that you shouldn't concern yourself with ordinarily, but if you're curious how they work the source is all there.
That being said, this is only relevant if you have the source and more specifically a complete build environment. Once compiled all these definitions, like the source itself, do not appear in the executable and cannot be inferred directly from the executable, especially not a release build.
Unlike Java or C#, C compiles directly to machine code so there's no way to easily reverse that back to the source. There are "decompilers" that try, but they can only really guess as to the original source. VM-based languages like Java and C# only lightly compile the code, sot here are a lot of hints as to how that code was generated and reversing it is an easier process.

Document #define without the preprocessor

On the Doxygen documentation I am writting, I have set ENABLE_PREPROCESSING = NO, because I want all of the code to be documented, independently of any #if statements.
The problems is that there is a #define that I need to be documented, but since I have disabled the preprocessor, nothing is generated for it (the other structures on that file are being documented just fine).
One option would be to enable the preprocessor and use the PREDEFINED option to set all the #if, but that is not realistically achievable in my case (too many of them).
Are there any other ways to achieve the intended result?
Thanks!
On the Doxygen documentation I am writting, I have set ENABLE_PREPROCESSING = NO, because I want all of the code to be documented, independently of any #if statements.
That's got some code smell to it. The interface presented by your code should be documented according to how it was built. It's pretty pointless to document features that could have been built but weren't, or to document alternative ways in which your features could have been built. Generally speaking, that means having Doxygen pre-process conditional-compilation directives.
And if you have conditional compilation that you intend for users of your library to trigger when they build their own programs, then I suggest taking a different approach: split your headers, so that your users select which headers to include instead of relying on conditional compilation to customize the content of a single header.
HOWEVER, if you must document all the code in every conditional-compilation branch in a single set of documentation, and you also want to document macros, then you could consider leaving preprocessing on, and filtering out the conditional compilation directives with an input filter. The latter part might be specified like this, for example:
INPUT_FILTER = "sed '/^[ ]*#[ ]*\(if\|el\|endif\)/ d'"
That does not account for line continuations, so as to keep it relatively simple, but even in that form it might be sufficient for your purposes. It could be augmented to handle line continuations if needed.

Regular expressions in C preprocessor macro

I would like to know if there is any kind of regular expression expansion within the compiler(GCC) pre processor. Basically more flexible code generation macros.
If there is not a way, how do you suggest i accomplish the same result
The C preprocessor can't do that.
You might want to use a template processor (for instance Mustache but there are many others) that generates what you need before passing it to the compiler.
Also, if you are planning a bigger project and you know this feature will be beneficial you might want to write your own preprocessor that you can run automatically from some build system. Good example of such solution would be moc which enhances C++ for the purpose of Qt framework. Purist might of course disagree.
There is this https://github.com/graph/qc qc = Quick C it allows you to do this in your source code files that end with qc.h
$replace asdf_(\d+) => asdf_ :) $1 blabla
// and now in your code anything that matches the above regular expression
asdf_123
// will become asdf_ :) 123 blabla
And it will output a .cpp & a .h thats preprocessed. Its made to avoid the need to maintain header files. And some other things not making it backwards compatible with c++, but it outputs c++ code so you can do all the c++ things you want at the end of the day.
Edit: I made it and have a bias towards qc.
You might want to look at re2c.org. It it a separate C preprocessor to generate
C code to match regular expressions. I found that and your question when looking for
something similar.

Source to source manipulations

I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.
I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.
So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing it?
So assume following code.
#define AAA 1
void f1(int a){
if(a == AAA)
printf("hello");
}
After applying source-to-source manipulation I want to get this
#define AAA 1
void f1(int a){
if(functionCall(a == AAA))
printf("hello");
}
But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA
#define AAA 1
void f1(int a){
if(functionCall(a == 1))
printf("hello");
}
I hope I was clear enough.
Edit
The above code is only an example. The source-to-source manipulations I want to do are not restricted with if() statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.
Solution
There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!
You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.
An idea would be to replace all occurrences of
if(a == AAA)
with
if(functionCall(a == AAA))
You can do this easily using, e.g., the sed tool.
If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.
Would this solve your problem?
Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.
Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.
It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]
Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.
To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.
Not easy being green.
Clang maintains extremely accurate information about the original source code.
Most notably, the SourceManager is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).
Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.
There is a rewriter module in Clang
You can dig up Chandler's code on the macro diagnosis stack
So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)
I would advise to resort to Rose framework. Source is available on github.

Writing a complex preprocessor macro for unit testing

I am working with a unit-testing suite that hijacks function calls and tests expected output values.
The normal layout requires one block of unit-testing code for each expected value.
Since my code makes use of a large number of enums, I would like to automate the automated-testing with some for loop / macro magic, and I'm looking for some advice with writing it.
Here is a block of the test code that I need to duplicate X number of times:
START_TEST("test_CallbackFn");
EXPECTED_CALLS("{{function1(param_type)#default}{function2(param_type)#default}}");
CallbackFn();
END_CALLS();
END_TEST();
Now, here is what I would envision occuring
for (int i = 0; i < 10; i++)
{
RUN_TEST(i)
}
Now, I would like to define RUN_TEST with the code I mentioned above, except I need to replace the string default with the current value of i. What is throwing me off is the quotes and #'s that are present in the existing EXPECTED_CALLS macro.
I think I would look at using a separate macro processor rather than trying to beat the C preprocessor into submission. The classic example that people point to is m4, but for this, you might do better with awk or perl or python or something similar.
In my experiences, "complex" + "macro" = "don't do it!"
The C preprocessor was not designed to do anything this powerful. While you may be able to do some kung-fu and hack something together that works, it would be much easier to use a scripting language to generate the C code for you (it's also easier to debug since you can read through the generated code and make sure it is correct). Personally, I have used Ruby to do this several times but Python, Perl, bash (etc etc) should also work.
I'm not sure I fully understand the question, but if you want EXPECTED_CALLS to recieve a string where default is replaced with the string value of whatever default is you need to remove the #default from the string. i.e.
EXPECTED_CALLS("{{function1(param_type)#default}{function2(param_type)#default}}");
should be
EXPECTED_CALLS("{{function1(param_type)"#default"}{function2(param_type)"#default"}}");
It's probably possible: Boost.Preprocessor is quite impressive as it is.
For an enum it may be a bit more difficult, but there are for each loops in Boost.Preprocessor, etc..
The problem of the generative approach using external scripts is that it may require to externalize more than just the tests. Unless you plan on implementing a C++ parser which is known to be tricky at the best of times...
So you would need to generate the enums (store them in json for exemple) to be able to generate the tests for these enums afterward... and things begin to get hairy :/

Resources