Regular expressions in C preprocessor macro - c

I would like to know if there is any kind of regular expression expansion within the compiler(GCC) pre processor. Basically more flexible code generation macros.
If there is not a way, how do you suggest i accomplish the same result

The C preprocessor can't do that.
You might want to use a template processor (for instance Mustache but there are many others) that generates what you need before passing it to the compiler.

Also, if you are planning a bigger project and you know this feature will be beneficial you might want to write your own preprocessor that you can run automatically from some build system. Good example of such solution would be moc which enhances C++ for the purpose of Qt framework. Purist might of course disagree.

There is this https://github.com/graph/qc qc = Quick C it allows you to do this in your source code files that end with qc.h
$replace asdf_(\d+) => asdf_ :) $1 blabla
// and now in your code anything that matches the above regular expression
asdf_123
// will become asdf_ :) 123 blabla
And it will output a .cpp & a .h thats preprocessed. Its made to avoid the need to maintain header files. And some other things not making it backwards compatible with c++, but it outputs c++ code so you can do all the c++ things you want at the end of the day.
Edit: I made it and have a bias towards qc.

You might want to look at re2c.org. It it a separate C preprocessor to generate
C code to match regular expressions. I found that and your question when looking for
something similar.

Related

Is it possible to see the macros of a compiled C program?

I am trying to learn C and I have this C file that I want view the macros of. Is there a tool to view the macros of the compiled C file.
No. That's literally impossible.
The preprocessor is a textual replacement that happens before the main compile pass. There is no difference between using a macro and putting the code the macro expands to in its place.*
*Ignoring the debugger output. But even then you can do it if you know the right #pragma to tell it the file and line number.
They're always defined in the header file(s) that you've imported with #include, or that those files in turn #include.
This may involve a lot of digging. It may involve going into files that make no sense to you because they're not written for casual inspection.
Any macros of any importance are usually documented. They may use other more complex implementation-specific macros that you shouldn't concern yourself with ordinarily, but if you're curious how they work the source is all there.
That being said, this is only relevant if you have the source and more specifically a complete build environment. Once compiled all these definitions, like the source itself, do not appear in the executable and cannot be inferred directly from the executable, especially not a release build.
Unlike Java or C#, C compiles directly to machine code so there's no way to easily reverse that back to the source. There are "decompilers" that try, but they can only really guess as to the original source. VM-based languages like Java and C# only lightly compile the code, sot here are a lot of hints as to how that code was generated and reversing it is an easier process.

Preprocessor-like substitution into a parser

I am making a parser currently which aims to be able to input data in a program.
The syntax used is greatly inspired from C.
I would enjoy to reproduct a kind of preprocessor inline substitution into it.
for example
#define HELLO ((variable1 + variable2 + variable3))
int variable1 = 37;
int variable2 = 82;
int variable3 = 928;
Thing is... I'm actually using C. I'm also using standard functions from stdio.h to parse through my files.
So... what techniques I could use to make this work correctly and efficiently?
Does the standard compilers substitute the text by re-copying the stream buffer and making the substitution there as the re-copying occurs or what? Is there more efficient techniques?
I guess we say preprocessor because it first substitutes everything until theres no preproc directives (recursive approach maybe?), and then, it starts doing the real compile job?
Excuse my lack of knowledge!
Thanks!
No, modern C compilers don't implement the preprocessor as a text processor, but they have the different compiler phases (preprocessing being one of them) tangled. This is particularly important for the efficiency of the compiler itself and to be able to track errors back into the original source code.
Also implementing a preprocessor by yourself is a tedious task. Think twice before you start such a project.
Yes, you are right about preprocessors. It has the job of bringing together all files which are requires for the execution of the program to 1 file for eg. stdio.h. Then it allows the compiler to compile the program. The file you want to compile is given as argument to the compiler and the techniques used by the compiler may vary according to the os and the compiler itself
The C preprocessor works on tokens not text. In particular, macro expansion cannot contain preprocessor directives. Other preprocessors, such as m4, work differently.

Solving a Variable Equation defined by the User

Answers in C, Python, C++ or Javascript would be very much appreciated.
I've read a few books, done all the examples. Now I'd like to write a simple program.
But, I already ran into the following roadblock:
My intention is to take an equation from the user and save it in a variable,
For example:
-3*X+4 or pow(2,(sin(cos(x))/5)) > [In valid C Math syntax]
And then calculate the given expression for a certain X-Value.
Something like this:
printf("%g", UserFunction(3.2)) // Input 3.2 for X in User's Function and Print Result
Any ideas? For the life of me, I can't figure this out. Adding to my frustration, the solution is likely a very simply one. Thank you in advance.
There isn't a simple way to do this in C but I think muParser may be useful to you, it is written in C++ but has C binding. ExprTk is also an option but looks like it is C++ only, on the plus side it looks much easier to get interesting results with.
Another option may be the Expression Evaluation which is part of Libav. It is in C and the eval.h header has some good descriptions of the interface.
In compiled languages like C, C++, or Java there is no easy way to do this--you basically have to rewrite a whole compiler (or use an external library with an interpreter). This is only trivial in "scripting" languages like Python and Javascript, which have a function (often called "eval()") that evaluates expressions at runtime. This function is often dangerous, because it can also do things like call functions with side effects.
Ffmpeg/libav has a nice simple function evaluator you could use.

Source to source manipulations

I need to do some source-to-source manipulations in Linux kernel. I tried to use clang for this purpose but there is a problem. Clang does preprocessing of the source code, i.e. macro and include expansion. This causes clang to sometimes produce broken C code in terms of Linux kernel. I can't maintain all the changes manually, since I expect to have thousands of changes per single file.
I tried ANTLR, but the public grammars available are incomplete and not suitable for such projects as Linux kernel.
So my question is the following. Are there any ways to perform source-to-source manipulations for a C code without preprocessing it?
So assume following code.
#define AAA 1
void f1(int a){
if(a == AAA)
printf("hello");
}
After applying source-to-source manipulation I want to get this
#define AAA 1
void f1(int a){
if(functionCall(a == AAA))
printf("hello");
}
But Clang, for instance, produces following code which does not fit my requirements, i.e. it expands macro AAA
#define AAA 1
void f1(int a){
if(functionCall(a == 1))
printf("hello");
}
I hope I was clear enough.
Edit
The above code is only an example. The source-to-source manipulations I want to do are not restricted with if() statement substitution, but also inserting unary operator in front of expression, replace arithmetic expression with its positive or negative value, etc.
Solution
There is one solution I found for my self. I use gcc in order to produce preprocessed source code and then apply Clang. Then I don't have any issues with macro expansion and includes, since that job is done by gcc. Thanks for the answers!
You may consider http://coccinelle.lip6.fr/ : it provides a nice semantics patching framwork.
An idea would be to replace all occurrences of
if(a == AAA)
with
if(functionCall(a == AAA))
You can do this easily using, e.g., the sed tool.
If you have a finite collection of patterns to be replaced you can write a sed script to perform the substitution.
Would this solve your problem?
Handling the preprocessor is one of the most difficult problems in applying transformations to C (and C++) code.
Our DMS Software Reengineering Toolkit with its C Front End come relatively close to doing this. DMS can parse C source code, preserving most preprocessor conditionals, macro defintions and uses.
It does so by allow preprocessor actions in "well-structured" places. Examples: #defines are allowed where declarations or statements can occur, macro calls and conditionals as replacements for many of the nonterminals in the language (e.g., function head, expression, statement, declarations) and in many non-structured places that people commonly place them (e.g, #if fooif (...) {#endif). It parses the source code and preprocessor directives as if they were part of one language (they ARE, its called "C"), and builds corresponding ASTs, which can be transformed and will regenerate correctly with the captured preprocessor directives. [This level of capability handles OP's example perfectly.]
Some directives are poorly placed (both in the syntax sense, e.g., across multiple fragments of the language, and the "you've got to be kidding" understandability sense). These DMS handles by expanding them away, with some guidance from the advance engineer ("alway expand this macro"). A less satisfactory approach is to hand-convert the unstructured preprocessor conditionals/macro calls into structured ones; this is a bit painful but more workable than one might expect since the bad cases occur with considerably less frequency than the good ones.
To do better than this, one needs to have symbol tables and flow analysis that take into account the preprocessor conditions, and capture all the preprocessor conditionals. We've done some experimental work with DMS to capture conditional declarations in the symbol table (seems to work fine), and we're just starting work on a scheme for the latter.
Not easy being green.
Clang maintains extremely accurate information about the original source code.
Most notably, the SourceManager is able to tell if a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro diagnosis which are able to display the actual macro expansion stack (at the various stages of expansions) tracing back to the actual written code (3.0).
Therefore, it is possible to use the generated AST and then rewrite the source code with all its macros still in place. You would have to query virtually every node to know whether it comes from a macro expansion or not, and if it does retrieve the original code of the expansion, but still it seems possible.
There is a rewriter module in Clang
You can dig up Chandler's code on the macro diagnosis stack
So I guess you should have all you need :) (And hope so because I won't be able to help much more :p)
I would advise to resort to Rose framework. Source is available on github.

Writing a complex preprocessor macro for unit testing

I am working with a unit-testing suite that hijacks function calls and tests expected output values.
The normal layout requires one block of unit-testing code for each expected value.
Since my code makes use of a large number of enums, I would like to automate the automated-testing with some for loop / macro magic, and I'm looking for some advice with writing it.
Here is a block of the test code that I need to duplicate X number of times:
START_TEST("test_CallbackFn");
EXPECTED_CALLS("{{function1(param_type)#default}{function2(param_type)#default}}");
CallbackFn();
END_CALLS();
END_TEST();
Now, here is what I would envision occuring
for (int i = 0; i < 10; i++)
{
RUN_TEST(i)
}
Now, I would like to define RUN_TEST with the code I mentioned above, except I need to replace the string default with the current value of i. What is throwing me off is the quotes and #'s that are present in the existing EXPECTED_CALLS macro.
I think I would look at using a separate macro processor rather than trying to beat the C preprocessor into submission. The classic example that people point to is m4, but for this, you might do better with awk or perl or python or something similar.
In my experiences, "complex" + "macro" = "don't do it!"
The C preprocessor was not designed to do anything this powerful. While you may be able to do some kung-fu and hack something together that works, it would be much easier to use a scripting language to generate the C code for you (it's also easier to debug since you can read through the generated code and make sure it is correct). Personally, I have used Ruby to do this several times but Python, Perl, bash (etc etc) should also work.
I'm not sure I fully understand the question, but if you want EXPECTED_CALLS to recieve a string where default is replaced with the string value of whatever default is you need to remove the #default from the string. i.e.
EXPECTED_CALLS("{{function1(param_type)#default}{function2(param_type)#default}}");
should be
EXPECTED_CALLS("{{function1(param_type)"#default"}{function2(param_type)"#default"}}");
It's probably possible: Boost.Preprocessor is quite impressive as it is.
For an enum it may be a bit more difficult, but there are for each loops in Boost.Preprocessor, etc..
The problem of the generative approach using external scripts is that it may require to externalize more than just the tests. Unless you plan on implementing a C++ parser which is known to be tricky at the best of times...
So you would need to generate the enums (store them in json for exemple) to be able to generate the tests for these enums afterward... and things begin to get hairy :/

Resources