Finding printf() calls outside #ifdef statements using Regex (POSIX) - c

I've been asked by a co-worker to come up with a regular expression (POSIX syntax) for finding calls to printf(...); -- in a c-code file -- which aren't in a #ifdef ... #endif scope.
However, seeing as I am only just learning about Regexes at Uni, I'm not completely confident in it.
The scenario would look something like this:
possibly some code
printf(some_parameters); // This should match
possibly more code
#ifdef DEBUG
possibly some code
printf(some_parameters); // This shouldn't match
possibly more code
#endif
possibly some code
printf(some_parameters); // This should also match
possibly more code
Note that a c-file may not contain a #ifdef/#endif statement at all, in which case all calls to printf(); should match.
What I've tried so far is this:
(?<!(#ifdef [A-Å0-9]+)).*printf\(.*\);.*(?!(#endif))
...along with playing around with the position (and even inclusion/exclusion) of .*
Any help or hints appreciated.

Regular expressions are not a good way to approach this. They don't deal well with multi line searches and they are limited in the patterns they can express, e.g. arbitrary nesting is impossible to specify with regexen.
The proper way to tackle this problem is using tools designed to deal with conditional compilation directives in C code. This would be the C preprocessor of your compiler, or a specialized tool like unifdef:
$ unifdef -UDEBUG file.c | grep printf
printf(some_parameters); // This should match
printf(some_parameters); // This should also match
From the manual:
UNIFDEF(1) BSD General Commands Manual UNIFDEF(1)
NAME
unifdef, unifdefall — remove preprocessor conditionals from code
SYNOPSIS
unifdef [-ceklst] [-Ipath -Dsym[=val] -Usym -iDsym[=val] -iUsym] ... [file]
unifdefall [-Ipath] ... file
DESCRIPTION
The unifdef utility selectively processes conditional cpp(1) directives.
It removes from a file both the directives and any additional text that
they specify should be removed, while otherwise leaving the file alone.
The unifdef utility acts on #if, #ifdef, #ifndef, #elif, #else, and #endif
lines, and it understands only the commonly-used subset of the expression
syntax for #if and #elif lines. It handles integer values of symbols
defined on the command line, the defined() operator applied to symbols
defined or undefined on the command line, the operators !, <, >, <=, >=,
==, !=, &&, ||, and parenthesized expressions. Anything that it does not
understand is passed through unharmed. It only processes #ifdef and
#ifndef directives if the symbol is specified on the command line, other‐
wise they are also passed through unchanged. By default, it ignores #if
and #elif lines with constant expressions, or they may be processed by
specifying the -k flag on the command line.

Don't need regex.
cpp -D<your #define options here> | grep printf

Related

How to process macros in LEX?

How do I implement #define in yacc/bison?
For Example:
#define f(x) x*x
If anywhere f(x) appears in any function then it is replaced by the right side of the
macro substituting for the argument ‘x’.
For example, f(3) would be replaced with 3*3. The macro can call another macro too.
It's not usually possible to do macro expansion inside a parser, at least not C-style macros, because C-style macro expansion doesn't respect syntax. For example
#define IF if(
#define THEN )
is legal (although very bad style IMHO). But for that to be handled inside the grammar, it would be necessary to allow a macro identifier to appear anywhere in the input, not just where an identifier might be expected. The necessary modifications to the grammar are going to make it much less readable and are very likely to introduce parser action conflicts. [Note 1]
Alternatively, you could do the macro expansion in the lexical analyzer. The lexical analyzer is not a parser, but parsing a C-style macro invocation doesn't require much sophistication, and if macro parameters were not allowed, it would be even simpler. This is how Flex handles macro replacement in its regular expressions. ({identifier}, for example. [Note 2] Since Flex macros are just raw character sequences, not token lists as with C-style macros, they can be handled by pushing the replacement text back into the input stream. (F)lex provides the unput special action for this purpose. unput pushes one character back into the input stream, so if you want to push an entire macro replacement, you have to unput it one character at a time, back to front so that the last character unput is the first one to be read afterwards.
That's workable but ugly. And it's not really scalable to even the small feature list provided by the C preprocessor. And it violates the fundamental principle of software design, which is that each component does just one thing (so that it can do it well).
So that leaves the most common approach, which is to add a separate macro processor component, so that instead of dividing the parse into lexical scan/syntax analysis, the parse becomes lexical scan/macro expansion/syntax analysis. [Note 3]
A C-style macro processor which works between the lexical analyser and the syntactic analyser could itself be written in Bison. As I mentioned above, the parsing requirements are generally minimal, but there is still parsing to be done and Bison is presumably already part of the project. Although I don't know of any macro processor (other than proof-of-concept programs I've written myself) which do this, I think it's a very flexible solution. In particular, the Bison syntactic analysis phase could be implemented with a push-parser, which avoids the need to produce the entire macro-expanded token stream in order to make it available to a traditional pull-parser.
That's not the only way to design macros, though. Indeed, it has a lot of shortcomings, because the macro expansions are not hygienic, respecting neither syntax nor scope. Probably anyone who has used C macros has at one time or other been bitten by these problems; the simplest manifestation is defining a macro like:
#define NEXT(a) a + 1
and then writing
int x = NEXT(a) * 3;
which is not going to produce the expected result (unless what is expected is a violation of the syntactic form of the last statement). Also, any macro expansion which needs to use a local variable will sooner or later produce an incorrect expansion because of unexpected name collision. Hygienic macro expansion seeks to solve these issues by viewing macro expansion as an operation on syntax trees, not token streams, making the parsing paradigm lexical scan/syntax analysis/macro expansion (of the parse tree). For that operation, the appropriate tool might well be some kind of tree parser.
Notes
Also, you'd want to remove the token from the parse tree Yacc/bison does have a poorly-documented feature, YYBACKUP, which might possibly help be able to accomplish this. I don't know if that's one of its intended use cases; indeed, it is not clear to me what its intended use cases are.
The (f)lex documentation calls these definitions, but they really are macros, and they suffer from all the usual problems macros bring with them, such as mysterious interactions with surrounding syntax.
Another possibility is macro expansion/lexical scan/syntax analysis, which could be implemented using a macro processor like M4. But that completely divorces the macros from the rest of the language.
yacc and lex generate c source at the end. So you can use macros inside the parser and lexer actions.
The actual #define preprocessor directives can go in the first section of the lexer and parser file
%{
// Somewhere here
#define f(x) x*x
%}
These sections will be copied verbatim to the generated c source.

how to use #ifdef inside a macro?'#' is not followed by a macro parameter

I take a project which written in c, and there are lots of Macros.
I want to use a new macro to check if the macro is activated or not.
But the symbol # is reserved in macro. How to fix my code? Thanks :)
#define CHECK_MACRO( macro )\
#ifdef macro
printf("defined "#macro"\n");\
#else
printf("not defined "#macro"\n");\
#endif
You cannot use preprocessor conditional directives inside a macro. Generally speaking, the solution is to turn that inside out: use conditional directives to define the macro differently in different cases. That will not work for a generic macro-test macro such as you propose, however, and it also is limited by the fact that it determines whether the condition holds at the point where the macro is defined, not the point where it is used.
You may perhaps take consolation in the fact that this was never going to work anyway, as a result of the fact the arguments to a function-like macro are expanded before being substituted into the macro's replacement text (except in a couple of special cases that don't apply to the key part of your code).
There are alternatives that could work if the possible values of all macros of interest are limited to short lists of tokens that may appear as or in identifiers. There different alternatives that might be adequate if you can choose a small subset of macros that you're interested in testing. There are no alternatives that do what you propose in its full generality, unless you count writing the conditional compilation directives directly, without a macro, which in fact is the usual way of going about it.
Side note - m4 preprocssor - history/legacy.
In the early days of Unix, the 'm4' processor was used for code generation. It has enhanced features of cpp (or may be cpp is a scaled down version of m4). Specially, it has better support for multi-line macros. It continue to be used in various packages.
Worth mentioning that adding a code generation to your code will make it more complex to maintain/debug.
For example: a.m4
define(`CHECK_MACRO', `
#ifdef $1
printf ("defined #$1\n") ;
#else
printf ("undef #$1\n") ;
#endif
')
#include <stdio.h>
void main(void)
{
CHECK_MACRO(FOO) ;
CHECK_MACRO(BAR) ;
}
Then build/run
m4 a.m4 > a.c
cc a.c
./a.out
undef #FOO
undef #BAR
cc a.c -DFOO
./a.out
defined #FOO
undef #BAR
Usually, the generation was integrated into Makefile with a rule
%.c: %.m4:
m4 -s $< > $#
The -s help track source code line number (it will compile error line number matching the a.m4 source file.

How to understand this way to add a define in C?

I am reading source at openssl, and the following lines apparently defines SSL_OP_NO_SSLv3 if it is not defined yet. Never saw such magic before. Can anyone teach me the syntax here?
#if !defined(OPENSSL_NO_SSL3)
| SSL_OP_NO_SSLv3
#endif
You can reference this link for full file and see line 327.
The question makes sense only with the surrounding code (slightly simplified here):
mask = SSL_OP_NO_TLSv1_1|SSL_OP_NO_TLSv1
#if !defined(OPENSSL_NO_SSL3)
|SSL_OP_NO_SSLv3
#endif
;
The preprocessor just does textual substitution. So if the preprocessor macro OPENSSL_NO_SSL3 is not defined, the preprocessed code will look like this:
mask = SSL_OP_NO_TLSv1_1|SSL_OP_NO_TLSv1
|SSL_OP_NO_SSLv3
;
otherwise the preprocessed code ill look like this:
mask = SSL_OP_NO_TLSv1_1|SSL_OP_NO_TLSv1
;
It doesn't define a macro. It adds an expression to a bitmask. That operation there is bitwise or. The macro/enum SSL_OP_NO_SSLv3 must exist for it to be valid code.
If the macro OPENSSL_NO_SSL3 is defined, then the code inside the conditional isn't included in the source.
Having the preprocessor check for macro definitions is a common way to implement conditional compilation. This way the same source can be compiled under various configurations. The macros to check against can be defined in source with #define, passed by the build system (like with the gcc -D option), or be builtin to the preprocessor (such as __STDC_IEC_559__).

Why there is no semicolons after preprocessor directives?

If I write
#include <stdio.h>;
there no error but a warning comes out during compilation
pari.c:1:18: warning: extra tokens at end of #include directive
What is the reason ?
The reason is that preprocessor directives don't use semicolons. This is because they use a line break to delimit statements. This means that you cannot have multiple directives per line:
#define ABC #define DEF // illegal
But you can have one on multiple lines by ending each line (except the last) with a \ (or /, I forget).
Because Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor.
These preprocessor directives extend only across a single line of code. As soon as a newline character is found, the preprocessor directive is considered to end. That's why no semicolon (;) is expected at the end of a preprocessor directive.
Preprocessor directives are a different language than C, and have a much simpler grammar, because originally they were "parsed", if you can call it that, by a different program called cpp before the C compiler saw the file. People could use that to pre-process even non-C files to include conditional parts of config files and the like.
There is a Linux program called "unifdef" that you can still use to remove some of the conditional parts of a program if you know they'll never be true. For instance, if you have some code to support non-ANSI standard compilers surrounded by #ifdef ANSI/#else/#end or just #ifndef ANSI/#end, and you know you'll never have to support non-ANSI any more, you can eliminate the dead code by running it through unifdef -DANSI.
Because they're unnecessary. Preprocessor directives only exist on one line, unless you explicitly use a line-continuation character (for e.g. a big macro).
During compilation, your code is processed by two separate programs, the pre-processor and the compiler. The pre-processor runs first.
Your code is actually comprised of two languages, one overlaid on top of another. The pre-processor deals with one language, which is all directives starting with "#" (and the implications of these directives). It processes the "#include", "#define" and other directives, and leaves the rest of the code untouched (well, except as side effect of the pre-processor directives, like macro substitutions etc.).
Then the compiler comes along and processes the output generated by the pre-processor. It deals with "C" language, and pretty much ignores the pre-processor directives.
The answer to your question is that "#include" is a part of the language processed by the pre-processor, and in this language ";" are not required, and are, in fact, "extra tokens".
and if you use #define MACRO(para) fun(para); it could be WRONG to put an semikolon behind it.
if (cond)
MACRO (par1);
else
MACRO (par2);
leads to an syntactical error

Why # is required before #include<stdio.h>?

What is the function of #?
It denotes a preprocessor directive:
One important thing you need to remember is that the C preprocessor is not part of the C compiler.
The C preprocessor uses a different syntax. All directives in the C preprocessor begin with a pound sign (#). In other words, the pound sign denotes the beginning of a preprocessor directive, and it must be the first nonspace character on the line.
# was probably chosen arbitrarily as an otherwise unused character in C syntax. # would have worked just as well, I presume.
If there wasn't a character denoting it, then there would probably be trouble differentiating between code intended for the preprocessor -- how would you tell whether if (FOO) was meant to be preprocessed or not?
Because # is the standard prefix for introducing preprocessor statements.
In early C compilers, the pre-processor was a separate program which would handle all the preprocessor statements (similar to the way early C++ "compilers" such as cfront generated C code) and generate C code for the compiler (it may still be a separate program but it may also be just a phase of the compiler nowadays).
The # symbol is just a useful character that can be recognised by the preprocessor and acted upon, such as:
#include <stdio.h>
#if 0
#endif
#pragma treat_warnings_as_errors
#define USE_BUGGY_CODE
and so on.
Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#). The preprocessor is executed before the actual compilation of code begins, therefore the preprocessor digests all these directives before any code is generated by the statements.
Source: http://www.cplusplus.com/doc/tutorial/preprocessor/
It's because # is an indicator that its a preprocessor statement
meaning before it compiles your code, it is going to include the file stdio.h
# is a pre-processor directive. The preprocessor handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
When the pre-processor encounters this, it will include the headers, expand the macros and proceeds towards compilation. It can be used for other purposes like halting compilation using the #error directive. This is called conditional compilation.
We know, without preprocessor programm do not run. And preprocessor is # or #include or #define or other. So # is required before #include .

Resources