function-like macros and variables - c

For some imperscrutable reason, in my code I have something like:
#define pippo(x) printf("%d",x)
...
... many lines down in the code
...
int pippo = 0;
The same identifier pippo has been used for both a function-like macro and a variable name! Beside the confusion this could arise in the poor maintainer, I was wondering if this is legal by the standard.
Both C99 and C11 (in 6.10.3.) say:
10 [...] Each subsequent instance of the function-like macro name followed by a
( as the next preprocessing token introduces the sequence of
preprocessing tokens that is replaced by the replacement list in the
definition [...]
They don't say what happens if the function-like macro name is not followed by a '(' and I'm worried that some compiler might consider that this is an error (or might just emit a warning).
Am I too much of a worrywart?

Instances of the name of a function-like macro that are not followed by ( are not replaced.
Using names thusly is not a violation of constraints in the C standard. The standard even gives an example of using this behavior. C 2018 7.1.4 1, discussing standard library functions and their potential implementations as function-like macros (in addition to a definition as a function), says:
… Any macro definition of a function can be suppressed locally by enclosing the name of the function in parentheses, because the name is then not followed by the left parenthesis that indicates expansion of a macro function name. For the same syntactic reason, it is permitted to take the address of a library function even if it is also defined as a macro…
A compiler could give a warning (although it would likely want to suppress this warning when the macro name is a library function used as the C standard suggests, above), but neither GCC 9.2 nor Clang 11.0.0 do, even with all warnings enabled.

5.1.1.2, point 4, specifies how the preprocessor is "invoked" if you will:
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. [...] All preprocessing directives are then deleted.
So this implies that anything that is not touched by the preprocessor is left alone, including any pippo that is not followed by (.

Related

Preprocessor reserved keywords

For macros, are there any name limitations other than it needs to be an identifier? For example, would something like the following be valid?
#define assert getchar
#include <stdio.h>
int main(void)
{
assert();
}
Code link: https://godbolt.org/z/ra63na.
main:
push rbp
mov rbp, rsp
mov eax, 0
call getchar
mov eax, 0
pop rbp
ret
And does the preprocessor have any knowledge of the C language? Or is it more like a find-and-replace program?
For macros, are there any name limitations other than it needs to be an identifier?
Yes, they are subject to the provisions of section 7.1.3 of the language specification ("Reserved Identifiers"), in particular:
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use
[including as macro names].
[...]
Each macro name in any of the [standard library specification] subclauses (including the future library directions) is reserved for
use as specified if any of its associated headers is included; unless
explicitly stated otherwise
[...]
Each identifier with file scope listed in any of the [standard library specification] subclauses (including the future library
directions) is reserved for use as a macro name and as an identifier
with file scope in the same name space if any of its associated
headers is included.
[...] If the program declares or defines an identifier in a context in
which it is reserved (other than as allowed by 7.1.4), or defines a
reserved identifier as a macro name, the behavior is undefined.
The second bullet point in particular would be relevant to your example code if it also included the assert.h header. The identifier assert would then reserved for use as a macro name. That you use it as one would trigger undefined behavior. That does not place any particular requirements on the implementation -- in fact that's exactly the meaning of "undefined behavior". It does not require the implementation to accept the code, nor to reject it, nor to emit any kind of diagnostic in either case. If it did accept it, the preprocessor would not be required to perform macro substitution on assert, nor would it be forbidden to do so, nor, in fact, would it be required to behave in a way that seems in any way rational or predictable.
Similar would apply based on the third bullet point if you defined getchar as a macro name in code that includes stdio.h, as the example does. The code actually presented is ok, however.
You also ask,
And does the preprocessor have any knowledge of the C language? Or is
it more like a find-and-replace program?
A little. The C preprocessor is not a general-purpose macro language, and attempts to use it as one often go poorly. The preprocessor's input is a series of tokens, determined according to rules consistent with C syntax, and it uses the same syntax for identifiers that C does. Conditional inclusion directives recognize a subset of the arithmetic expressions of C, and they work in terms of one of the host implementation's integer data types. The preprocessor (or at least the tokenization stage preceding it) understands C string literals and character constants, so macro replacement does not affect the contents of these.
This is covered in section 7.1.2 and 7.1.3 of the standard (C11). Here is a selection of rules pertaining to macros:
If used, a header shall be included outside of any external declaration or definition, and it shall first be included before the first reference to any of the functions or objects it declares, or to any of the types or macros it defines.
The program shall not have any macros with names lexically identical to keywords currently defined prior to the inclusion of the header or when any macro defined in the header is expanded.
Each macro name in any of the following subclauses (including the future library
directions) is reserved for use as specified if any of its associated headers is included;
unless explicitly stated otherwise.
Each identifier with file scope listed in any of the following subclauses (including the
future library directions) is reserved for use as a macro name and as an identifier with
file scope in the same name space if any of its associated headers is included.
So the exact program you posted is correct, since <assert.h> has not been included. But it would be undefined behaviour if you did include that header.
It's really dumb. It understands enough to do token replacement, but not much more.
For example: #define test fail will replace test in test(...) but not tested or "test".
Since C has a very basic syntax writing a parser that can work through and identify tokens like that is actually not that hard. Making it understand the totality of C syntax is beyond the scope of that tool.
In other words, for an input program like:
#define test fail
int main() {
test(9, "test", tested());
return 0;
}
The C pre-processor breaks this up into tokens that end up something like:
[ "#", "define", "test", "fail" ]
[ "int", "main", "(", ")", "{" ]
[ "test", "(", "9", "\"test\"", "tested", "(", ")", ")", ";" ]
...
Where each of those is processed using the simple pre-processor grammar.
This is slightly more complicated because macros can include arguments, but you get the idea. The grammar used is a simple subset of the whole C grammar.
Yes it is valid. No the pre-processor is not language aware. The pre-processor does exactly what it is told - included content, replaces macros - if that results in invalid syntax, the compiler must detect that.
Other then C symbol naming rules, there are no C language dependencies or reserved words. All pre-processor directives start # which is not a valid C symbol name so there is no need for reserved words.
The pre-processor can be run on its own - either by command line option to the compiler driver or in the of the GUN tool chain it is a standalone executable cpp - making it useful for purposes other than just C and C++ source pre-processing.

Preprocessor and compiler errors in C

When I have a syntax error in C, how can I know if it's a preprocessor error or a compiler error?
Let's say I type in this line: "# include header.h" (The " is part of the line to make it a string literal).
Will the preprocessor have an issue with it or will it be the compiler that will treat it as a string without assigning it to anything?
Typically compiler output doesn't distinguish "pre-processor errors" from "compiler errors", as these aren't really standardized terms.
What's called "pre-processing" is often the process of forming pre-processor tokens, followed by resolving all includes, pragmas and macros. In the C standard, this "pre-processing" roughly corresponds to "translation phases" 3 and 4:
The source file is decomposed into preprocessing tokens and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
The compiler will not obviously complain about finding a valid string literal "# include header.h" in either of the above phases - a string literal is a valid pre-processor token. What you call "pre-processor errors" is probably errors that occur in any of the above phases.
(This is a simplified explanation, there's lots of other mildly interesting stuff happening as well, like trigraph and newline \ replacement etc.)
But in this case, I think the compiler will complain in phase 7, emphasis mine:
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
"Will the preprocessor have an issue with it or will it be the compiler that will treat it as a string without assigning it to anything?"
I've tried your example:
"#include <stdio.h>"
I get the following errors:
For GCC:
"error: expected identifier or '(' before string constant"
For Clang:
"error: expected identifier or '('"
You can see it here.
Both GCC and Clang treats it as string literal, which is reasonable since character sequences surrounded by " are specified as string literals:
"A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes,as in "xyz"."
Source: ISO/IEC 9899:2018 (C18), §6.4.5/3.
This issue is one the compiler cares about, not the preprocessor. In general, since macros are expanded before compilation, the incorrectness or failure of preprocessor directives is usually also something the compiler complains about. There is usually no explicit error detection stage for the C preprocessor.
If the assignment would be proper, f.e.:
const char* p = "#include <stdio.h>";
and you use variables, functions etc. which are declared in header.h, you can* get errors about undefined references about these variables/functions, since the compiler/linker can't see/find those declarations.
*Whether you get an error or not is furthermore dependent upon if the definition of that variable/function is visable before its use in the source code or how you link several source files.
"When I have a syntax error in C, how can I know if it's a preprocessor error or a compiler error?"
As said above, there are no real preprocessor errors, the compiler covers these issues. The preprocessor doesn't really analyze for errors, it is just expanding. Usually it is very clear if an error belongs to a macro or not, even though the compiler evaluates the syntactical issues.
As said in the comments already by Eugene, you can take a look at the macro expanded version of your code when using the -E option for GCC and test if the expansions were expanded successfully/as desired.

What does it mean that the language of preprocessor directives is weakly related to the grammar of C?

The Wikipedia article on the C Preprocessor says:
The language of preprocessor directives is only weakly related to the grammar of C, and so is sometimes used to process other kinds of text files.
How is the language of a preprocessor different from C grammar? What are the advantages? Has the C Preprocessor been used for other languages/purposes?
Can it be used to differentiate between inline functions and macros, since inline functions have the syntax of a normal C function whereas macros use slightly different grammar?
The Wikipedia article is not really an authoritative source for the C programming language. The C preprocessor grammar is a part of the C grammar. However it is completely distinct from the phrase structure grammar i.e. these 2 are not related at all, except that they both understand that the input consists of C language tokens, (though the C preprocessor has the concept of preprocessing numbers, which means that something like 123_abc is a legal preprocessing token, but it is not a valid identifier).
After the preprocessing has been completed and before the translation using the phrase structure grammar commences (the preprocessor directives have by now been removed, and macros expanded and so forth),
Each preprocessing token is converted into a token. (C11 5.1.1.2p1 item 7)
The use of C preprocessor for any other languages is really abuse. The reason is that the preprocessor requires that the file consists of proper C preprocessing tokens. It isn't designed to work for any other languages. Even C++, with its recent extensions, such as raw string literals, cannot be preprocessed by a C preprocessor!
Here's an excerpt from the cpp (GNU C preprocessor) manuals:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning of
character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.
The preprocessor creates preprocessing tokens, which later are converted in C-tokens.
In general the conversion is quite direct, but not always. For example, if you have a conditional preprocessing directive that evaluates to false as in
#if 0
comments
#endif
then in comments you can write whatever you want, it will be converted in preprocessing tokens that will never be converted in C-tokens, so like this inside a C source file you can insert non-commented code.
The only link between the language of the preprocessor and C is that many tokens are defined almost the same but not always.
for example, it is valid to have preprocessor numbers (in ISO9899 standard called pp-numbers) like 4MD which are valid preprocessor numbers but not valid C numbers. Using the ## operator you can get a valid C identifier using these preprocessing numbers. For example
#define version 4A
#define name TEST_
#define VERSION(x, y) x##y
VERSION(name, version) <= this will be valid C identifier
The preprocessor was conceived such that to be applicable to any language to make text translation, not having C in mind. In C it is useful mainly to make a clear separation between interfaces and implementations.
Conditionals in the C preprocessor are valid C expressions so the link between the preprocessor and the C language proper is intimate.
#define A (6)
#if A > 5
Here is a 6
#elif A < 0
# error
#endif
This expands to meaningless C, but may be meaningful text.
Here is a 6
Though the expnded text is invalid C, the preprocessor uses features of C to expand the correct conditional lines. The C standard defines this in terms of the constant expression:
From the C99 standard §6.6:
6.10.1 Conditional inclusion
Preprocessing directives of the forms
# if constant-expression new-line group opt
# elif constant-expression new-line group opt
check whether the controlling constant expression evaluates to nonzero.
And here is the definition of a constant-expression
6.6 Constant expressions
Syntax:
constant-expression:
conditional-expression
Description A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any
place that a constant may be.
Constraints Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when
they are contained within a subexpression that is not evaluated.
Each constant expression shall evaluate to a constant that is in the
range of representable values for its type.
Given the above, it's clear that the preprocessor requires a limited form of C language expression evaluation to work, and therefore knowledge of the C typesystem, grammar, and expression semantics.

Macro Expansion: Argument with Commas

The code I'm working on uses some very convoluted macro voodoo in order to generate code, but in the end there is a construct that looks like this
#define ARGS 1,2,3
#define MACROFUNC_OUTER(PARAMS) MACROFUNC_INNER(PARAMS)
#define MACROFUNC_INNER(A,B,C) A + B + C
int a = MACROFUNC_OUTER(ARGS);
What is expected is to get
int a = 1 + 2 + 3;
This works well for the compiler it has originally been written for (GHS) and also for GCC, but MSVC (2008) considers PARAMS as a single preprocessing token that it won't expand, setting then A to the whole PARAM and B and C to nothing. The result is this
int a = 1,2,3 + + ;
while MSVC warns that not enough actual parameters for macro 'MACROFUNC_INNER'.
Is it possible to get MSVC do the expansion with some tricks (another layer of macro to force a second expansion, some well placed ## or #, ...). Admitting that changing the way the construct work is not an option. (i.e.: can I solve the problem myself?)
What does the C standard say about such corner case? I couldn't find in the C11 norm anything that explicitly tells how to handle arguments that contains a list of arguments. (i.e.: can I argue with the author of the code that he has to write it again, or is just MVSC non-conform?)
MSVC is non-conformant. The standard is actually clear on the point, although it does not feel the need to mention this particular case, which is not exceptional.
When a function-like macro invocation is encountered, the preprocessor:
§6.10.3/11 identifies the arguments, which are possibly empty sequences of tokens separated by non-protected commas , (a comma is protected if it is inside parentheses ()).
§6.10.3.1/1 does a first pass over the macro body, substituting each parameter which is not used in a # or ## operation with the corresponding fully macro-expanded argument. (It does no other substitutions in the macro body in this step.)
§6.10.3.4/1 rescans the substituted replacement token sequence, performing more macro replacements as necessary.
(The above mostly ignores stringification (#) and token concatenation (##), which are not relevant to this question.)
This order of operations unambiguously leads to the behaviour expected by whoever wrote the software.
Apparently (according to #dxiv, and verified here) the following standards-compliant workaround works on some versions of MS Visual Studio:
#define CALL(A,B) A B
#define OUTER(PARAM) CALL(INNER,(PARAM))
#define INNER(A,B,C) whatever
For reference, the actual language from the C11 standard, skipping over the references to # and ## handling:
§6.10.3 11 The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro. The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate arguments.…
§6.10.3.1 1 After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list… is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file…
§6.10.3.4 1 After all parameters in the replacement list have been substituted… [t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
C11 says that each appearance of an object-like macro's name
[is] replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.
[6.10.3/9]
Of function-like macros it says this:
If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments [...] in an invocation of a function-like macro shall equal the number of parameters in the macro definition.
[6.10.3/4]
and this:
The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro.
[6.10.3/11]
and this:
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list [...] is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.
[6.10.3.1/1]
Of macros in general it also says this:
After all parameters in the replacement list have been substituted [... t]he resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
[6.10.3.4/1]
MSVC++ does not properly expand the arguments to function-like macros before rescanning the expansion of such macros. It seems unlikely that there is any easy workaround.
UPDATE:
In light of #dxiv's answer, however, it may be that there is a solution after all. The problem with his solution with respect to standard-conforming behavior is that there needs to be one more expansion than is actually performed. That can easily enough be supplied. This variation on his approach works with GCC, as it should, and inasmuch as it is based on code that dxiv claims works with MSVC++, it seems likely to work there, too:
#define EXPAND(x) x
#define PAREN(...) (__VA_ARGS__)
#define EXPAND_F(m, ...) EXPAND(m PAREN(__VA_ARGS__))
#define SUM3(a,b,c) a + b + c
#define ARGS 1,2,3
int sum = EXPAND_F(SUM3, ARGS);
I have of course made it a little more generic than perhaps it needs to be, but that may serve you well if you have a lot of these to deal with..
Curiuosly enough, the following appears to work in MSVC (tested with 2010 and 2015).
#define ARGS 1,2,3
#define OUTER(...) INNER PARAN(__VA_ARGS__)
#define PARAN(...) (__VA_ARGS__)
#define INNER(A,B,C) A + B + C
int a = OUTER(ARGS);
I don't know that it's supposed to work by the letter of the standard, in fact I have a hunch it's not. Could still be conditionally compiled just for MSVC, as a workaround.
[EDIT] P.S. As pointed out in the comments, the above is (another) non-standard MSVC behavior. Instead, the alternative workarounds posted by #rici and #JohnBollinger in the respective replies are compliant, thus recommended.

C declarations before usage

All identifiers in C need to be declared before they are used, but I can`t find where it denoted in C99 standard.
I think it refers to macro definitions too, but there is only macro expansion order defined.
C99:TC3 6.5.1 §2, with footnote 79 explicitly stating:
Thus, an undeclared identifier is a violation of the syntax.
in conjunction with 6.2.1 §5:
Unless explicitly stated otherwise, [...] it [ie an identifier] refers to the
entity in the relevant name space whose declaration is visible at the point the identifier
occurs.
and §7:
[...] Any other identifier has scope that begins just after the completion of its declarator.
There are a at least couple of exceptions to the rule that all identifiers need to be delcared before use:
while C99 removed implicit function declarations, you may still see C programs that rely, possibly unknowingly, on them. There is even the occasional question on SO that, for example, ask why functions that return double don't work (when the header that includes the declaration of the function is omitted). It seems that when compiling with pre-C99 semantics, warnings for undeclared functions are often not configured to be used or are ignored.
the identifier for a goto label may be used before it's 'declaration' - it is declared implicitly by its syntactic appearance (followed by a : and a statement).
The exception to the rule for goto labels is pretty much a useless nitpick, but the fact that function identifiers can be used without a declaration (pre-C99) is something that can be useful to know because you might once in a while run into a problem with it as a root cause.
Also, identifiers can be used before being 'declared' (strictly speaking, before being defined) in preprocessing, where they can be tested for being defined or not, or used in preprocessor expressions where they will evaluate to 0 if not otherwise defined.

Resources