Why there is no semicolons after preprocessor directives? - c

If I write
#include <stdio.h>;
there no error but a warning comes out during compilation
pari.c:1:18: warning: extra tokens at end of #include directive
What is the reason ?

The reason is that preprocessor directives don't use semicolons. This is because they use a line break to delimit statements. This means that you cannot have multiple directives per line:
#define ABC #define DEF // illegal
But you can have one on multiple lines by ending each line (except the last) with a \ (or /, I forget).

Because Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor.
These preprocessor directives extend only across a single line of code. As soon as a newline character is found, the preprocessor directive is considered to end. That's why no semicolon (;) is expected at the end of a preprocessor directive.

Preprocessor directives are a different language than C, and have a much simpler grammar, because originally they were "parsed", if you can call it that, by a different program called cpp before the C compiler saw the file. People could use that to pre-process even non-C files to include conditional parts of config files and the like.
There is a Linux program called "unifdef" that you can still use to remove some of the conditional parts of a program if you know they'll never be true. For instance, if you have some code to support non-ANSI standard compilers surrounded by #ifdef ANSI/#else/#end or just #ifndef ANSI/#end, and you know you'll never have to support non-ANSI any more, you can eliminate the dead code by running it through unifdef -DANSI.

Because they're unnecessary. Preprocessor directives only exist on one line, unless you explicitly use a line-continuation character (for e.g. a big macro).

During compilation, your code is processed by two separate programs, the pre-processor and the compiler. The pre-processor runs first.
Your code is actually comprised of two languages, one overlaid on top of another. The pre-processor deals with one language, which is all directives starting with "#" (and the implications of these directives). It processes the "#include", "#define" and other directives, and leaves the rest of the code untouched (well, except as side effect of the pre-processor directives, like macro substitutions etc.).
Then the compiler comes along and processes the output generated by the pre-processor. It deals with "C" language, and pretty much ignores the pre-processor directives.
The answer to your question is that "#include" is a part of the language processed by the pre-processor, and in this language ";" are not required, and are, in fact, "extra tokens".

and if you use #define MACRO(para) fun(para); it could be WRONG to put an semikolon behind it.
if (cond)
MACRO (par1);
else
MACRO (par2);
leads to an syntactical error

Related

How to process macros in LEX?

How do I implement #define in yacc/bison?
For Example:
#define f(x) x*x
If anywhere f(x) appears in any function then it is replaced by the right side of the
macro substituting for the argument ‘x’.
For example, f(3) would be replaced with 3*3. The macro can call another macro too.
It's not usually possible to do macro expansion inside a parser, at least not C-style macros, because C-style macro expansion doesn't respect syntax. For example
#define IF if(
#define THEN )
is legal (although very bad style IMHO). But for that to be handled inside the grammar, it would be necessary to allow a macro identifier to appear anywhere in the input, not just where an identifier might be expected. The necessary modifications to the grammar are going to make it much less readable and are very likely to introduce parser action conflicts. [Note 1]
Alternatively, you could do the macro expansion in the lexical analyzer. The lexical analyzer is not a parser, but parsing a C-style macro invocation doesn't require much sophistication, and if macro parameters were not allowed, it would be even simpler. This is how Flex handles macro replacement in its regular expressions. ({identifier}, for example. [Note 2] Since Flex macros are just raw character sequences, not token lists as with C-style macros, they can be handled by pushing the replacement text back into the input stream. (F)lex provides the unput special action for this purpose. unput pushes one character back into the input stream, so if you want to push an entire macro replacement, you have to unput it one character at a time, back to front so that the last character unput is the first one to be read afterwards.
That's workable but ugly. And it's not really scalable to even the small feature list provided by the C preprocessor. And it violates the fundamental principle of software design, which is that each component does just one thing (so that it can do it well).
So that leaves the most common approach, which is to add a separate macro processor component, so that instead of dividing the parse into lexical scan/syntax analysis, the parse becomes lexical scan/macro expansion/syntax analysis. [Note 3]
A C-style macro processor which works between the lexical analyser and the syntactic analyser could itself be written in Bison. As I mentioned above, the parsing requirements are generally minimal, but there is still parsing to be done and Bison is presumably already part of the project. Although I don't know of any macro processor (other than proof-of-concept programs I've written myself) which do this, I think it's a very flexible solution. In particular, the Bison syntactic analysis phase could be implemented with a push-parser, which avoids the need to produce the entire macro-expanded token stream in order to make it available to a traditional pull-parser.
That's not the only way to design macros, though. Indeed, it has a lot of shortcomings, because the macro expansions are not hygienic, respecting neither syntax nor scope. Probably anyone who has used C macros has at one time or other been bitten by these problems; the simplest manifestation is defining a macro like:
#define NEXT(a) a + 1
and then writing
int x = NEXT(a) * 3;
which is not going to produce the expected result (unless what is expected is a violation of the syntactic form of the last statement). Also, any macro expansion which needs to use a local variable will sooner or later produce an incorrect expansion because of unexpected name collision. Hygienic macro expansion seeks to solve these issues by viewing macro expansion as an operation on syntax trees, not token streams, making the parsing paradigm lexical scan/syntax analysis/macro expansion (of the parse tree). For that operation, the appropriate tool might well be some kind of tree parser.
Notes
Also, you'd want to remove the token from the parse tree Yacc/bison does have a poorly-documented feature, YYBACKUP, which might possibly help be able to accomplish this. I don't know if that's one of its intended use cases; indeed, it is not clear to me what its intended use cases are.
The (f)lex documentation calls these definitions, but they really are macros, and they suffer from all the usual problems macros bring with them, such as mysterious interactions with surrounding syntax.
Another possibility is macro expansion/lexical scan/syntax analysis, which could be implemented using a macro processor like M4. But that completely divorces the macros from the rest of the language.
yacc and lex generate c source at the end. So you can use macros inside the parser and lexer actions.
The actual #define preprocessor directives can go in the first section of the lexer and parser file
%{
// Somewhere here
#define f(x) x*x
%}
These sections will be copied verbatim to the generated c source.

Can i write preprocessor directives anywhere in my c program?

Is it mandatory to write #include at the top of the program and outside the main function?
I tried using #define preprocessor inside the main function and it worked fine with only one exception..that being the constant which i defined using the define directive can be used only after the line #define
For instance say printf("%d",PI); #define PI 3.14will give error "Undefined symbol PI". But in the following code i did not encounter any error
#define PI 3.14
printf("%d",PI);
Is this because C is a procedural language and procedural languages implements top down approach?
Also i would like to know that can we use only #define inside the main function or other preprocessor directives too? If we can use then which ones?
Or is it the other way around, instead of #include we can use all the preprocessor directives in the main function?
The only place you can't put a preprocessor directive is in a macro expansion. The sole exception is #pragma, which can also be written _Pragma().
This has nothing to do with "procedural", but due to the fact that C is defined in terms of 8 translation phases, each of which is "as-if" fully-completed before the next phase. For more details, see the C11 standard, section 5.1.1.2.
One example of when it is useful to use preprocessor directives after the start of a file is for the "X Macro" technique (which many people only know as "those .def files").
Preprocessor directives work pretty much anywhere. Of course, you can make your code confusing pretty easily if you abuse this.
The pre-processor does its work before the compiler performs the source code translation into object code. Pre-processing is mostly a string replacement task, so it can be placed just about anywhere in your code. Of course, if the resulting expansion is syntactically incorrect, the expanded source code will fail to compile.
A commonly tolerated practice is to embed conditional compilation directives inside a function to allow the function to use platform specific APIs.
void some_wrapper_function () {
#if defined(UNIX)
some_unix_specific_function();
#elif defined(WIN32)
some_win32_specific_function();
#else
#error "Compiled on an unsupported platform"
#endif
}
By their nature, the directives themselves normally have to be defined at the beginning of the line, and not somewhere in the middle of source line. But, defined macros can of course appear anywhere in the source, and will be replaced according to the substitution rules defined by your directives.
The trick here is to realize that # directives have traditionally been interpreted by a pre-processor, that runs before any compilation. The pre-processor would produce a new source file, which was then compiled. I don't think any modern compiler works that way by default, but the same principles apply.
So when you say
#include "foo.h"
you're saying "insert the entire contents of foo.h into my source code starting at this line."
You can use this directive pretty much anywhere in a source file, but it's rarely useful (and not often readable) to use it anywhere other than at the start of the source.

Finding printf() calls outside #ifdef statements using Regex (POSIX)

I've been asked by a co-worker to come up with a regular expression (POSIX syntax) for finding calls to printf(...); -- in a c-code file -- which aren't in a #ifdef ... #endif scope.
However, seeing as I am only just learning about Regexes at Uni, I'm not completely confident in it.
The scenario would look something like this:
possibly some code
printf(some_parameters); // This should match
possibly more code
#ifdef DEBUG
possibly some code
printf(some_parameters); // This shouldn't match
possibly more code
#endif
possibly some code
printf(some_parameters); // This should also match
possibly more code
Note that a c-file may not contain a #ifdef/#endif statement at all, in which case all calls to printf(); should match.
What I've tried so far is this:
(?<!(#ifdef [A-Å0-9]+)).*printf\(.*\);.*(?!(#endif))
...along with playing around with the position (and even inclusion/exclusion) of .*
Any help or hints appreciated.
Regular expressions are not a good way to approach this. They don't deal well with multi line searches and they are limited in the patterns they can express, e.g. arbitrary nesting is impossible to specify with regexen.
The proper way to tackle this problem is using tools designed to deal with conditional compilation directives in C code. This would be the C preprocessor of your compiler, or a specialized tool like unifdef:
$ unifdef -UDEBUG file.c | grep printf
printf(some_parameters); // This should match
printf(some_parameters); // This should also match
From the manual:
UNIFDEF(1) BSD General Commands Manual UNIFDEF(1)
NAME
unifdef, unifdefall — remove preprocessor conditionals from code
SYNOPSIS
unifdef [-ceklst] [-Ipath -Dsym[=val] -Usym -iDsym[=val] -iUsym] ... [file]
unifdefall [-Ipath] ... file
DESCRIPTION
The unifdef utility selectively processes conditional cpp(1) directives.
It removes from a file both the directives and any additional text that
they specify should be removed, while otherwise leaving the file alone.
The unifdef utility acts on #if, #ifdef, #ifndef, #elif, #else, and #endif
lines, and it understands only the commonly-used subset of the expression
syntax for #if and #elif lines. It handles integer values of symbols
defined on the command line, the defined() operator applied to symbols
defined or undefined on the command line, the operators !, <, >, <=, >=,
==, !=, &&, ||, and parenthesized expressions. Anything that it does not
understand is passed through unharmed. It only processes #ifdef and
#ifndef directives if the symbol is specified on the command line, other‐
wise they are also passed through unchanged. By default, it ignores #if
and #elif lines with constant expressions, or they may be processed by
specifying the -k flag on the command line.
Don't need regex.
cpp -D<your #define options here> | grep printf

Why # is required before #include<stdio.h>?

What is the function of #?
It denotes a preprocessor directive:
One important thing you need to remember is that the C preprocessor is not part of the C compiler.
The C preprocessor uses a different syntax. All directives in the C preprocessor begin with a pound sign (#). In other words, the pound sign denotes the beginning of a preprocessor directive, and it must be the first nonspace character on the line.
# was probably chosen arbitrarily as an otherwise unused character in C syntax. # would have worked just as well, I presume.
If there wasn't a character denoting it, then there would probably be trouble differentiating between code intended for the preprocessor -- how would you tell whether if (FOO) was meant to be preprocessed or not?
Because # is the standard prefix for introducing preprocessor statements.
In early C compilers, the pre-processor was a separate program which would handle all the preprocessor statements (similar to the way early C++ "compilers" such as cfront generated C code) and generate C code for the compiler (it may still be a separate program but it may also be just a phase of the compiler nowadays).
The # symbol is just a useful character that can be recognised by the preprocessor and acted upon, such as:
#include <stdio.h>
#if 0
#endif
#pragma treat_warnings_as_errors
#define USE_BUGGY_CODE
and so on.
Preprocessor directives are lines included in the code of our programs that are not program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#). The preprocessor is executed before the actual compilation of code begins, therefore the preprocessor digests all these directives before any code is generated by the statements.
Source: http://www.cplusplus.com/doc/tutorial/preprocessor/
It's because # is an indicator that its a preprocessor statement
meaning before it compiles your code, it is going to include the file stdio.h
# is a pre-processor directive. The preprocessor handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
When the pre-processor encounters this, it will include the headers, expand the macros and proceeds towards compilation. It can be used for other purposes like halting compilation using the #error directive. This is called conditional compilation.
We know, without preprocessor programm do not run. And preprocessor is # or #include or #define or other. So # is required before #include .

C macro/#define indentation?

I'm curious as to why I see nearly all C macros formatted like this:
#ifndef FOO
# define FOO
#endif
Or this:
#ifndef FOO
#define FOO
#endif
But never this:
#ifndef FOO
#define FOO
#endif
(moreover, vim's = operator only seems to count the first two as correct.)
Is this due to portability issues among compilers, or is it just a standard practice?
I've seen it done all three ways, it seems to be a matter of style, not of syntax
While usually the second example is the most common, i've seen cases where the first (or third) is used to help distinguish multiple levels of #ifdefs. Sometimes the logic can become deeply nested and the only way to understand it at a glance is to use indentation much like it is common practice to indent blocks of code between { and }.
IIRC, older C preprocessors required the # to be the first character on the line (though I've never actually encountered one that had this requirement).
I never seen your code like your first example. I usually wrote preprocessor directives as in your second example. I found that it visually interfered with the indentation of the actual code less (not that I write in C anymore).
The GNU C Preprocessor manual says:
Preprocessing directives are lines in
your program that start with '#'.
Whitespace is allowed before and after
the '#'.
For preference I use the third style, with the exception of include guards, for which I use the second style.
I don't like the first style at all - I think of #define as being a preprocessor instruction, even though really of course it isn't, it's a # followed by the preprocessor instruction define. But since I do think of it that way, it seems wrong to separate them. I expect text editors written by people who advocate that style will have a block indent/un-indent that works on code written in that style. But I would hate to encounter it using a text editor that didn't.
There's no point pandering to ancient preprocessors where the # has to be the first character of the line, unless you can also list off the top of your head all the other differences between those implementations and standard C, in order to avoid the other things you could possibly do that they would not support. Of course if you genuinely are working with a pre-standard compiler, fair enough.
Preprocessor directives are lines included in our programs that are not actually program statements but directives for the preprocessor. These lines are always preceded by a hash sign (#).Whitespace is allowed before and after the '#'. As soon as a newline character is found, the preprocessor directive is considered to end.
There is no other rule as far the standard of C/C++ concerned,So it remains as the matter of style and readability issue,I have seen/wrote programs only in the second way that you posted,although the third one seems more readable.

Resources