Are C preprocessor statements a part of the C language?

Are C preprocessor statements a part of the C language? - c

I recall a claim made by one of my professors in an introductory C course. He stated that the #define preprocessor command enables a programmer to create a constant for use in later code, and that the command was a part of the C language.
/* Is this truly C code? */
#define FOO 42
Since this was in an introductory programming class, I suspect that he was merely simplifying the relationship between the source file and the compiler, but nevertheless I wish to verify my understanding.
Are preprocessor statements completely independent from the C language (dependent on the specific compiler used) or are they explicitly described in the C99 standard? Out of curiosity, did K&R ever mention preprocessor macros?

Yes, the standard describes the preprocessor. It's a standardized part of the C language.
Note that #include, which is essential for modularization of code, is a preprocessor directive.
In the publically-available draft of the C99 standard, the preprocessor is described in section 6.10.

The preprocessor is indeed part of the C and C++ standard (chapter 16 in the C++ standard) and the standards describe how the preprocessor and the language interact (for example it is illegal to re-#define the C keywords).
However the C preprocessor can work with other languages than C for any kind of simple file preprocessing (I have seen it used with LaTeX files for example).

Yes the preprocessor is part of the C language. Conceptually it is ran before the source is compiled.
Along with constant definitions, the preprocessor is used to implement two very important constructs:
#include which brings other files into the compilation unit.
include guards; i.e. the pattern,
#if !defined(METAWORD)
#define METAWORD 1
/* struct definition, function prototype */
#endif
Out of interest, these two usages have survived into C++, constant definition can be implemented in other (better?) ways.

Related

Reserved words vs standard identifier, how to see the difference

So I'm completely new to programming. I currently study computer science and have just read the first 200 pages of my programming book, but there's one thing I cannot seem to see the difference between and which havn't been clearly specified in the book and that's reserved words vs. standard identifiers - how can I see from code if it's one or the other.
I know the reserved words are some that cannot be changed, while the standard indentifiers can (though not recommended according to my book). The problem is while my book says reserved words are always in pure lowercase like,
(int, void, double, return)
it kinda seems to be the very same for standard indentifier like,
(printf, scanf)
so how do I know when it is what, or do I have to learn all the reserved words from the ANSI C, which is the current language we are trying to learn, (or whatever future language I might work with) to know when it is when?

First off, you'll have to learn the rules for each language you learn as it is one of the areas that varies between languages. There's no universal rule about what's what.
Second, in C, you need to know the list of keywords; that seems to be what you're referring to as 'reserved words'. Those are important; they're immutable; they can't be abused because the compiler won't let you. You can't use int as a variable name; it is always a type.
Third, the C preprocessor can be abused to hijack anything; if you compile with #define double int in effect, you get what you deserve, but there's nothing much to stop you doing that.
Fourth, the only predefined variable name is __func__, the name of the current function.
Fifth, names such as printf() are defined by the standard library, but the standard library has to be implemented by someone using a C compiler; ask the maintainers of the GNU C library. For a discussion of many of the ideas behind the treaty between the standard and the compiler writers, and between the compiler writers and the programmers using a compiler, see the excellent book The Standard C Library by P J Plauger from 1992. Yes, it is old and the modern standard C library is somewhat bigger than the one from C90, but the background information is still valid and very helpful.

Reserved words are part of the language's syntax. C without int is not C, but something else. They are built into the language and are not and cannot be defined anywhere in terms of this particular language.
For example, if is a reserved keyword. You can't redefine it and even if you could, how would you do this in terms of the C language? You could do that in assembly, though.
The standard library functions you're talking about are ordinary functions that have been included into the standard library, nothing more. They are defined in terms of the language's syntax. Also, you can redefine these functions, although it's not advised to do so as this may lead to all sorts of bugs and unexpected behavior. Yet it's perfectly valid to write:
int puts(const char *msg) {
printf("This has been monkey-patched!\n");
return -1;
}
You'd get a warning that'd complain about the redefinition of a standard library function, but this code is valid anyway.
Now, imagine reimplementing return:
unknown_type return(unknown_type stuff) {
// what to do here???
}

What does it mean that the language of preprocessor directives is weakly related to the grammar of C?

The Wikipedia article on the C Preprocessor says:
The language of preprocessor directives is only weakly related to the grammar of C, and so is sometimes used to process other kinds of text files.
How is the language of a preprocessor different from C grammar? What are the advantages? Has the C Preprocessor been used for other languages/purposes?
Can it be used to differentiate between inline functions and macros, since inline functions have the syntax of a normal C function whereas macros use slightly different grammar?

The Wikipedia article is not really an authoritative source for the C programming language. The C preprocessor grammar is a part of the C grammar. However it is completely distinct from the phrase structure grammar i.e. these 2 are not related at all, except that they both understand that the input consists of C language tokens, (though the C preprocessor has the concept of preprocessing numbers, which means that something like 123_abc is a legal preprocessing token, but it is not a valid identifier).
After the preprocessing has been completed and before the translation using the phrase structure grammar commences (the preprocessor directives have by now been removed, and macros expanded and so forth),
Each preprocessing token is converted into a token. (C11 5.1.1.2p1 item 7)
The use of C preprocessor for any other languages is really abuse. The reason is that the preprocessor requires that the file consists of proper C preprocessing tokens. It isn't designed to work for any other languages. Even C++, with its recent extensions, such as raw string literals, cannot be preprocessed by a C preprocessor!
Here's an excerpt from the cpp (GNU C preprocessor) manuals:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning of
character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.

The preprocessor creates preprocessing tokens, which later are converted in C-tokens.
In general the conversion is quite direct, but not always. For example, if you have a conditional preprocessing directive that evaluates to false as in
#if 0
comments
#endif
then in comments you can write whatever you want, it will be converted in preprocessing tokens that will never be converted in C-tokens, so like this inside a C source file you can insert non-commented code.
The only link between the language of the preprocessor and C is that many tokens are defined almost the same but not always.
for example, it is valid to have preprocessor numbers (in ISO9899 standard called pp-numbers) like 4MD which are valid preprocessor numbers but not valid C numbers. Using the ## operator you can get a valid C identifier using these preprocessing numbers. For example
#define version 4A
#define name TEST_
#define VERSION(x, y) x##y
VERSION(name, version) <= this will be valid C identifier
The preprocessor was conceived such that to be applicable to any language to make text translation, not having C in mind. In C it is useful mainly to make a clear separation between interfaces and implementations.

Conditionals in the C preprocessor are valid C expressions so the link between the preprocessor and the C language proper is intimate.
#define A (6)
#if A > 5
Here is a 6
#elif A < 0
# error
#endif
This expands to meaningless C, but may be meaningful text.
Here is a 6
Though the expnded text is invalid C, the preprocessor uses features of C to expand the correct conditional lines. The C standard defines this in terms of the constant expression:
From the C99 standard §6.6:
6.10.1 Conditional inclusion
Preprocessing directives of the forms
# if constant-expression new-line group opt
# elif constant-expression new-line group opt
check whether the controlling constant expression evaluates to nonzero.
And here is the definition of a constant-expression
6.6 Constant expressions
Syntax:
constant-expression:
conditional-expression
Description A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any
place that a constant may be.
Constraints Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when
they are contained within a subexpression that is not evaluated.
Each constant expression shall evaluate to a constant that is in the
range of representable values for its type.
Given the above, it's clear that the preprocessor requires a limited form of C language expression evaluation to work, and therefore knowledge of the C typesystem, grammar, and expression semantics.

Is #define banned in industry standards?

I am a first year computer science student and my professor said #define is banned in the industry standards along with #if, #ifdef, #else, and a few other preprocessor directives. He used the word "banned" because of unexpected behaviour.
Is this accurate? If so why?
Are there, in fact, any standards which prohibit the use of these directives?

First I've heard of it.
No; #define and so on are widely used. Sometimes too widely used, but definitely used. There are places where the C standard mandates the use of macros — you can't avoid those easily. For example, §7.5 Errors <errno.h> says:
The macros are
EDOM
EILSEQ
ERANGE
which expand to integer constant expressions with type int, distinct positive values, and which are suitable for use in #if preprocessing directives; …
Given this, it is clear that not all industry standards prohibit the use of the C preprocessor macro directives. However, there are 'best practices' or 'coding guidelines' standards from various organizations that prescribe limits on the use of the C preprocessor, though none ban its use completely — it is an innate part of C and cannot be wholly avoided. Often, these standards are for people working in safety-critical areas.
One standard you could check the MISRA C (2012) standard; that tends to proscribe things, but even that recognizes that #define et al are sometimes needed (section 8.20, rules 20.1 through 20.14 cover the C preprocessor).
The NASA GSFC (Goddard Space Flight Center) C Coding Standards simply say:
Macros should be used only when necessary. Overuse of macros can make code harder to read and maintain because the code no longer reads or behaves like standard C.
The discussion after that introductory statement illustrates the acceptable use of function macros.
The CERT C Coding Standard has a number of guidelines about the use of the preprocessor, and implies that you should minimize the use of the preprocessor, but does not ban its use.
Stroustrup would like to make the preprocessor irrelevant in C++, but that hasn't happened yet. As Peter notes, some C++ standards, such as the JSF AV C++ Coding Standards (Joint Strike Fighter, Air Vehicle) from circa 2005, dictate minimal use of the C preprocessor. Essentially, the JSF AV C++ rules restrict it to #include and the #ifndef XYZ_H / #define XYZ_H / … / #endif dance that prevents multiple inclusions of a single header. C++ has some options that are not available in C — notably, better support for typed constants that can then be used in places where C does not allow them to be used. See also static const vs #define vs enum for a discussion of the issues there.
It is a good idea to minimize the use of the preprocessor — it is often abused at least as much as it is used (see the Boost preprocessor 'library' for illustrations of how far you can go with the C preprocessor).
Summary
The preprocessor is an integral part of C and #define and #if etc cannot be wholly avoided. The statement by the professor in the question is not generally valid: #define is banned in the industry standards along with #if, #ifdef, #else, and a few other macros is an over-statement at best, but might be supportable with explicit reference to specific industry standards (but the standards in question do not include ISO/IEC 9899:2011 — the C standard).
Note that David Hammen has provided information about one specific C coding standard — the JPL C Coding Standard — that prohibits a lot of things that many people use in C, including limiting the use of of the C preprocessor (and limiting the use of dynamic memory allocation, and prohibiting recursion — read it to see why, and decide whether those reasons are relevant to you).

No, use of macros is not banned.
In fact, use of #include guards in header files is one common technique that is often mandatory and encouraged by accepted coding guidelines. Some folks claim that #pragma once is an alternative to that, but the problem is that #pragma once - by definition, since pragmas are a hook provided by the standard for compiler-specific extensions - is non-standard, even if it is supported by a number of compilers.
That said, there are a number of industry guidelines and encouraged practices that actively discourage all usage of macros other than #include guards because of the problems macros introduce (not respecting scope, etc). In C++ development, use of macros is frowned upon even more strongly than in C development.
Discouraging use of something is not the same as banning it, since it is still possible to legitimately use it - for example, by documenting a justification.

Some coding standards may discourage or even forbid the use of #define to create function-like macros that take arguments, like
#define SQR(x) ((x)*(x))
because a) such macros are not type-safe, and b) somebody will inevitably write SQR(x++), which is bad juju.
Some standards may discourage or ban the use of #ifdefs for conditional compilation. For example, the following code uses conditional compilation to properly print out a size_t value. For C99 and later, you use the %zu conversion specifier; for C89 and earlier, you use %lu and cast the value to unsigned long:
#if __STDC_VERSION__ >= 199901L
# define SIZE_T_CAST
# define SIZE_T_FMT "%zu"
#else
# define SIZE_T_CAST (unsigned long)
# define SIZE_T_FMT "%lu"
#endif
...
printf( "sizeof foo = " SIZE_T_FMT "\n", SIZE_T_CAST sizeof foo );
Some standards may mandate that instead of doing this, you implement the module twice, once for C89 and earlier, once for C99 and later:
/* C89 version */
printf( "sizeof foo = %lu\n", (unsigned long) sizeof foo );
/* C99 version */
printf( "sizeof foo = %zu\n", sizeof foo );
and then let Make (or Ant, or whatever build tool you're using) deal with compiling and linking the correct version. For this example that would be ridiculous overkill, but I've seen code that was an untraceable rat's nest of #ifdefs that should have had that conditional code factored out into separate files.
However, I am not aware of any company or industry group that has banned the use of preprocessor statements outright.

Macros can not be "banned". The statement is nonsense. Literally.
For example, section 7.5 Errors <errno.h> of the C Standard requires the use of macros:
1 The header <errno.h> defines several macros, all relating to the reporting of error conditions.
2 The macros are
EDOM
EILSEQ
ERANGE
which expand to integer constant expressions with type int, distinct
positive values, and which are suitable for use in #if preprocessing
directives; and
errno
which expands to a modifiable lvalue that has type int and thread
local storage duration, the value of which is set to a positive error
number by several library functions. If a macro definition is
suppressed in order to access an actual object, or a program defines
an identifier with the name errno, the behavior is undefined.
So, not only are macros a required part of C, in some cases not using them results in undefined behavior.

No, #define is not banned. Misuse of #define, however, may be frowned upon.
For instance, you may use
#define DEBUG
in your code so that later on, you can designate parts of your code for conditional compilation using #ifdef DEBUG, for debug purposes only. I don't think anyone in his right mind would want to ban something like this. Macros defined using #define are also used extensively in portable programs, to enable/disable compilation of platform-specific code.
However, if you are using something like
#define PI 3.141592653589793
your teacher may rightfully point out that it is much better to declare PI as a constant with the appropriate type, e.g.,
const double PI = 3.141592653589793;
as it allows the compiler to do type checking when PI is used.
Similarly (as mentioned by John Bode above), the use of function-like macros may be disapproved of, especially in C++ where templates can be used. So instead of
#define SQ(X) ((X)*(X))
consider using
double SQ(double X) { return X * X; }
or, in C++, better yet,
template <typename T>T SQ(T X) { return X * X; }
Once again, the idea is that by using the facilities of the language instead of the preprocessor, you allow the compiler to type check and also (possibly) generate better code.
Once you have enough coding experience, you'll know exactly when it is appropriate to use #define. Until then, I think it is a good idea for your teacher to impose certain rules and coding standards, but preferably they themselves should know, and be able to explain, the reasons. A blanket ban on #define is nonsensical.

That's completely false, macros are heavily used in C. Beginners often use them badly but that's not a reason to ban them from industry. A classic bad usage is #define succesor(n) n + 1. If you expect 2 * successor(9) to give 20, then you're wrong because that expression will be translated as 2 * 9 + 1 i.e. 19 not 20. Use parenthesis to get the expected result.

No. It is not banned. And truth to be told, it is impossible to do non-trivial multi-platform code without it.

No your professor is wrong or you misheard something.
#define is a preprocessor macro, and preprocessor macros are needed for conditional compilation and some conventions, which aren't simply built in the C language. For example, in a recent C standard, namely C99, support for booleans had been added. But it's not supported "native" by the language, but by preprocessor #defines. See this reference to stdbool.h

Macros are used pretty heavily in GNU land C, and without conditional preprocessor commands there's be no way to properly handle multiple inclusions of the same source files, so that makes them seem like essential language features to me.
Maybe your class is actually on C++, which despite many people's failure to do so, should be distinguished from C as it is a different language, and I can't speak for macros there. Or maybe the professor meant he's banning them in his class. Anyhow I'm sure the SO community would be interested in hearing which standard he's talking about, since I'm pretty sure all C standards support the use of macros.

Contrary to all of the answers to date, the use of preprocessor directives is oftentimes banned in high-reliability computing. There are two exceptions to this, the use of which are mandated in such organizations. These are the #include directive, and the use of an include guard in a header file. These kinds of bans are more likely in C++ rather than in C.
Here's but one example: 16.1.1 Use the preprocessor only for implementing include guards, and including header files with include guards.
Another example, this time for C rather than C++: JPL Institutional Coding Standard for the C Programming Language . This C coding standard doesn't go quite so far as banning the use of the preprocessor completely, but it comes close. Specifically, it says
Rule 20 (preprocessor use)
Use of the C preprocessor shall be limited to file inclusion and simple macros. [Power of Ten Rule 8].
I'm neither condoning nor decrying those standards. But to say they don't exist is ludicrous.

If you want your C code to interoperate with C++ code, you will want to declare your externally visible symbols, such as function declarations, in the extern "C" namespace. This is often done using conditional compilation:
#ifdef __cplusplus
extern "C" {
#endif
/* C header file body */
#ifdef __cplusplus
}
#endif

Look at any header file and you will see something like this:
#ifndef _FILE_NAME_H
#define _FILE_NAME_H
//Exported functions, strucs, define, ect. go here
#endif /*_FILE_NAME_H */
These define are not only allowed, but critical in nature as each time the header file is referenced in files it will be included separately. This means without the define you are redefining everything in between the guard multiple times which best case fails to compile and worse case leaves you scratching your head later why your code doesn't work the way you want it to.
The compiler will also use define as seen here with gcc that let you test for things like the version of the compiler which is very useful. I'm currently working on a project that needs to compile with avr-gcc, but we have a testing environment that we also run our code though. To prevent the avr specific files and registers from keeping our test code from running we do something like this:
#ifdef __AVR__
//avr specific code here
#endif
Using this in the production code, the complementary test code can compile without using the avr-gcc and the code above is only compiled using avr-gcc.

If you had just mentioned #define, I would have thought maybe he was alluding to its use for enumerations, which are better off using enum to avoid stupid errors such as assigning the same numerical value twice.
Note that even for this situation, it is sometimes better to use #defines than enums, for instance if you rely on numerical values exchanged with other systems and the actual values must stay the same even if you add/delete constants (for compatibility).
However, adding that #if, #ifdef, etc. should not be used either is just weird. Of course, they should probably not be abused, but in real life there are dozens of reasons to use them.
What he may have meant could be that (where appropriate), you should not hardcode behaviour in the source (which would require re-compilation to get a different behaviour), but rather use some form of run-time configuration instead.
That's the only interpretation I could think of that would make sense.

Is the Syntax of C Language completely defined by CFGs?

I think the Question is self sufficient. Is the syntax of C Language completely defined through Context Free Grammars or do we have Language Constructs which may require non-Context Free definitions in the course of parsing?
An example of non CFL construct i thought was the declaration of variables before their use. But in Compilers(Aho Ullman Sethi), it is stated that the C Language does not distinguish between identifiers on the basis of their names. All the identifiers are tokenized as 'id' by the Lexical Analyzer.
If C is not completely defined by CFGs, please can anyone give an example of Non CFL construct in C?

The problem is that you haven't defined "the syntax of C".
If you define it as the language C in the CS sense, meaning the set of all valid C programs, then C – as well as virtually every other language aside from turing tarpits and Lisp – is not context free. The reasons are not related to the problem of interpreting a C program (e.g. deciding whether a * b; is a multiplication or a declaration). Instead, it's simply because context free grammars can't help you decide whether a given string is a valid C program. Even something as simple as int main() { return 0; } needs a more powerful mechanism than context free grammars, as you have to (1) remember the return type and (2) check that whatever occurs after the return matches the return type. a * b; faces a similar problem – you don't need to know whether it's a multiplication, but if it is a multiplication, that must be a valid operation for the types of a and b. I'm not actually sure whether a context sensitive grammar is enough for all of C, as some restrictions on valid C programs are quite subtle, even if you exclude undefined behaviour (some of which may even be undecidable).
Of course, the above notion is hardly useful. Generally, when talking grammars, we're only interested in a pretty good approximation of a valid program: We want a grammar that rules out as many strings which aren't C as possible without undue complexity in the grammar (for example, 1 a or (-)). Everything else is left to later phases of the compiler and called a semantic error or something similar to distinguish it from the first class of errors. These "approximate" grammars are almost always context free grammars (including in C's case), so if you want to call this approximation of the set of valid programs "syntax", C is indeed defined by a context free grammar. Many people do, so you'd be in good company.

The C language, as defined by the language standard, includes the preprocessor. The following is a syntactically correct C program:
#define START int main(
#define MIDDLE ){
START int argc, char** argv MIDDLE return 0; }
It seems to be really tempting to answer this question (which arises a lot) "sure, there is a CFG for C", based on extracting a subset of the grammar in the standard, which grammar in itself is ambiguous and recognizes a superset of the language. That CFG is interesting and even useful, but it is not C.
In fact, the productions in the standard do not even attempt to describe what a syntactically correct source file is. They describe:
The lexical structure of the source file (along with the lexical structure of valid tokens after pre-processing).
The grammar of individual preprocessor directives
A superset of the grammar of the post-processed language, which relies on some other mechanism to distinguish between typedef-name and other uses of identifier, as well as a mechanism to distinguish between constant-expression and other uses of conditional-expression.
There are many who argue that the issues in point 3 are "semantic", rather than "syntactic". However, the nature of C (and even more so its cousin C++) is that it is impossible to disentangle "semantics" from the parsing of a program. For example, the following is a syntactically correct C program:
#define base 7
#if base * 2 < 10
&one ?= two*}}
#endif
int main(void){ return 0; }
So if you really mean "is the syntax of the C language defined by a CFG", the answer must be no. If you meant, "Is there a CFG which defines the syntax of some language which represents strings which are an intermediate product of the translation of a program in the C language," it's possible that the answer is yes, although some would argue that the necessity to make precise what is a constant-expression and a typedef-name make the syntax necessarily context-sensitive, in a way that other languages are not.

Is the syntax of C Language completely defined through Context Free Grammars?
Yes it is. This is the grammar of C in BNF:
http://www.cs.man.ac.uk/~pjj/bnf/c_syntax.bnf
If you don't see other than exactly one symbol on the left hand side of any rule, then the grammar is context free. That is the very definition of context free grammars (Wikipedia):
In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w
where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).
Since ambiguity is mentioned by others, I would like to clarify a bit. Imagine the following grammar:
A -> B x | C x
B -> y
C -> y
This is an ambiguous grammar. However, it is still a context free grammar. These two are completely separate concepts.
Obviously, the semantics analyzer of C is context sensitive. This answer from the duplicate question has further explanations.

There are two things here:
The structure of the language (syntax): this is context free as you do not need to know the surroundings to figure out what is an identifier and what is a function.
The meaning of the program (semantics): this is not context free as you need to know whether an identifier has been declared and with what type when you are referring to it.

If you mean by the "syntax of C" all valid C strings that some C compiler accepts, and after running the pre-processor, but ignoring typing errors, then this is the answer: yes but not unambiguously.
First, you could assume the input program is tokenized according to the C standard. The grammar will describe relations among these tokens and not the bare characters. Such context-free grammars are found in books about C and in implementations that use parser generators. This tokenization is a big assumption because quite some work goes into "lexing" C. So, I would argue that we have not described C with a context-free grammar yet, if we have not used context-free grammars to describe the lexical level. The staging between the tokenizer and the parser combined with the ordering emposed by a scanner generator (prefer keywords, longest match, etc) are a major increase in computational power which is not easily simulated in a context-free grammar.
So, If you do not assume a tokenizer which for example can distinguish type names from variable names using a symbol table, then a context-free grammar is going to be harder. However: the trick here is to accept ambiguity. We can describe the syntax of C including its tokens in a context-free grammar fully. Only the grammar will be ambiguous and produce different interpretations for the same string . For example for A *a; it will have derivations for a multiplication and a pointer declaration both. No problem, the grammar is still describing the C syntax as you requested, just not unambiguously.
Notice that we have assumed having run the pre-processor first as well, I believe your question was not about the code as it looks before pre-processing. Describing that using a context-free grammar would be just madness since syntactic correctness depends on the semantics of expanding user-defined macros. Basically, the programmer is extending the syntax of the C language every time a macro is defined. At CWI we did write context-free grammars for C given a set of known macro definitions to extend the C language and that worked out fine, but that is not a general solution.

macro expansion order with included files

Let's say I have a macro in an inclusion file:
// a.h
#define VALUE SUBSTITUTE
And another file that includes it:
// b.h
#define SUBSTITUTE 3
#include "a.h"
Is it the case that VALUE is now defined to SUBSTITUTE and will be macro expanded in two passes to 3, or is it the case that VALUE has been set to the macro expanded value of SUBSTITUTE (i.e. 3)?
I ask this question in the interest of trying to understand the Boost preprocessor library and how its BOOST_PP_SLOT defines work (edit: and I mean the underlying workings). Therefore, while I am asking the above question, I'd also be interested if anyone could explain that.
(and I guess I'd also like to know where the heck to find the 'painted blue' rules are written...)

VALUE is defined as SUBSTITUTE. The definition of VALUE is not aware at any point that SUBSTITUTE has also been defined. After VALUE is replaced, whatever it was replaced by will be scanned again, and potentially more replacements applied then. All defines exist in their own conceptual space, completely unaware of each other; they only interact with one another at the site of expansion in the main program text (defines are directives, and thus not part of the program proper).
The rules for the preprocessor are specified alongside the rules for C proper in the language standard. The standard documents themselves cost money, but you can usually download the "final draft" for free; the latest (C11) can be found here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
For at-home use the draft is pretty much equivalent to the real thing. Most people who quote the standard are actually looking at copies of the draft. (Certainly it's closer to the actual standard than any real-world C compiler is...)
There's a more accessible description of the macro rules in the GCC manual: http://gcc.gnu.org/onlinedocs/cpp/Self_002dReferential-Macros.html
Additionally... I couldn't tell you much about the Boost preprocessor library, not having used it, but there's a beautiful pair of libraries by the same authors called Order and Chaos that are very "clean" (as macro code goes) and easy to understand. They're more academic in tone and intended to be pure rather than portable; which might make them easier reading.
(Since I don't know Boost PP I don't know how relevant this is to your question but) there's also a good introductory example of the kids of techniques these libraries use for advanced metaprogramming constructs in this answer: Is the C99 preprocessor Turing complete?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight