Double hash usage - c

In C99 6.10.3.3.(2) (with my highlight)
If, in the replacement list of a function-like macro, a parameter is immediately preceded
or followed by a ## preprocessing token, the parameter is replaced by the corresponding
argument’s preprocessing token sequence; however, if an argument consists of no
preprocessing tokens, the parameter is replaced by a placemarker preprocessing token
instead.
#include <stdio.h>
#define hash_hash(a, b) a ## b
#define mkstr(a) # a
#define in_between(a) mkstr(a)
void main(void)
{
char p[] = in_between(a hash_hash(,) b);
printf("%s", p);
}
Output:
a b
I described the highlighted phrase by hash_hash(,) and result seemed correct as the standard's representation.
But I wonder if comma , is omitted, like hash_hash(), then does this differ from standard's explanation (undefined behavior)? And is the placemaker the same as white-space?

But I wonder if comma , is omitted, like hash_hash(), then does this differ from standard's explanation (undefined behavior)?
I think the relevant part is C11 6.10.3/4 emphasis mine:
Constraints
If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments (including those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal the number of parameters in the macro definition.
The comma would give a valid identifier list with no pre-proc tokens, but without it the identifier list doesn't match the macro's number of parameters. So hash_hash() is a constraint violation and you should get a diagnostic message.

But I wonder if comma , is omitted, like hash_hash(), then does this differ from standard's explanation(undefined behavior)?
The comma in this case is part of the syntax of function-like macro invocation. It separates parameters in the macro's parameter list. Invocations of a function-like macro must provide a value for each of that macro's named parameters, and with hash_hash() (without the comma) you would not be doing so. This would be a violation of a language constraint, so not only would the resulting behavior be undefined, but a conforming C implementation is obligated to emit a diagnostic when it encounters such a violation.
And is the placemaker the same as white-space?
No. You can conceptualize placemarkers as a zero-length preprocessing tokens. Such a thing is not directly representable in source code. It is neither whitespace nor absence of preprocessing tokens. "Placemarker" is a pretty good description of its nature and role.

if comma , is omitted, like hash_hash(), then does this differ from standard's explanation (undefined behavior)? And is the placemaker the same as white-space?
Yes, If a macro is defined with arguments, the arguments must be separated with a comma delimiter. Attempts to define a multiple argument macro otherwise would violate the rules of the standard, and it would not compile:
Function-like macros can take arguments, just like true functions. To
define a macro that uses arguments, you insert parameters between the
pair of parentheses in the macro definition that make the macro
function-like. The parameters must be valid C identifiers,
separated by commas and optionally whitespace.
Furthermore,
Because this requirement is well defined, it would not be an example of undefined behavior.
In this phrase of the reference paragraph and optionally whitespace, the and indicates that white space can be used along with a delimiter, but by itself is not sufficient as a delimiter.

Related

Defining a macro as comma (to separate arguemnts with that marco) works properly in functions arguments but not in other macros

I come up with a small example of the problem
Assume I have this macro
#define and ,
And I use it like sum(a and b) which should be expanded to sum(a, b).
My problem here is that if sum is defined by a macro the usage example encounters a too few arguments ... error.
Another problem that I think probably will be related and I guess would be solved by the same trick, is when I define an empty macro and place it between function name and argument list. For example
#define of
Now when I use sum of(1, 2), the compiler treats sum as a function and if it is a macro, then, linker throws an undefined reference error.
#define of
#define sum(a, b) a + b
int main()
{
sum of(a, b); // undefined reference to `sum'
}
Issue with and macro is that a and b is identified as macro argument before replacing and with , happens.
6.10.3.1 Argument substitution
After the arguments for the invocation of a function-like macro have been identified, argument
substitution takes place. A parameter in the replacement list, unless preceded by a # or ## prepro-
cessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding
argument after all macros contained therein have been expanded. Before being substituted, each
argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the
preprocessing file; no other preprocessing tokens are available.
Your second problem is similar but not related. After of is expanded, it is too late to expand sum because it is no longer scanned by the preprocessor:
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has
taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token
sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for
more macro names to replace.

How does C recognize macro tokens as arguments

#define function(in) in+1
int main(void)
{
printf("%d\n",function(1));
return 0;
}
The above is correctly preprocessed to :
int main(void)
{
printf("%d\n",1+1);
return 0;
}
However, if the macro in+1 is changed to in_1, the preprocessor will not do the argument replacement correctly and will end up with this:
printf("%d\n",in_1);
What are the list of tokens the preprocessor can correctly separate and insert the argument? (like the + sign)
Short answer: The replacement done by preprocessor is not simple text substitution. In your case, the argument must be an identifier, and it can only replace the identical identifiers.
The related form of preprocessing is
#define identifier(identifier-list) token-sequence
In order for the replacement to take place, the identifiers in the identifier-list and the tokens in the token-sequence must be identical in the token sense, according to C's tokenization rule (the rule to parse stream into tokens).
If you agree with the fact that
in C in and in_1 are two different identifiers (and C cannot relate one to the other), while
in+1 is not an identifier but a sequence of three tokens:
(1) identifier in,
(2) operator +, and
(3) integer constant 1,
then your question is clear: in and in_1 are just two identifiers between which C does not see any relationship, and cannot do the replacement as you wish.
Reference 1: In C, there are six classes of tokens:
(1) identifiers (e.g. in)
(2) keywords (e.g. if)
(3) constants (e.g. 1)
(4) string literals (e.g. "Hello")
(5) operators (e.g. +)
(6) other separators (e.g. ;)
Reference 2: In C, an identifier is a sequence of letters (including _) and digits (the first one cannot be a digit).
Reference 3: The tokenization rule:
... the next token is the longest string of characters that could constitute a token.
This is to say, when reading in+1, the compiler will read all the way to +, and knows that in is an identifier. But in the case of in_1, the compiler will read all the way to the white space after it, and deems in_1 as an identifier.
All references from the Reference Manual from K&R's The C Programming Language. Language evolved but they capture the essence.
See the C11 standard section 6.4 for the tokenization grammar .
The relevant token type here is identifier, which is defined as any sequence of letters or digits that doesn't start with a digit; also there can be \u codes and other implementation-defined characters in an identifier.
Due to the "maximal munch" principle of tokenization, character sequence in+1 is tokenized as in + 1 (not i n + 1).
if you want in_1, use two hashes....
#define function(in) in ## _1
So...
function(dave) --> dave_1
function(phil) --> phil_1
And for completeness, you can also use a single hash to turn the arg into a text string.
#define function(in) printf(#in "\n");

Can we write a macro in many lines without the backslash at the end?

I saw some examples in CPP manual where we can write macros body in many lines without the backslash.
#define strange(file) fprintf (file, "%s %d",
...
strange(stderr) p, 35)
output:
fprintf (stderr, "%s %d", p, 35)
Are they special cases like directives inside arguments macros or is it allowed only for #define ?
For include directives It must be always declared on one line if I am not wrong.
Edit:
From https://gcc.gnu.org/onlinedocs/cpp/Directives-Within-Macro-Arguments.html
3.9 Directives Within Macro Arguments
Occasionally it is convenient to use preprocessor directives within
the arguments of a macro. The C and C++ standards declare that
behavior in these cases is undefined. GNU CPP processes arbitrary
directives within macro arguments in exactly the same way as it would
have processed the directive were the function-like macro invocation
not present.
If, within a macro invocation, that macro is redefined, then the new
definition takes effect in time for argument pre-expansion, but the
original definition is still used for argument replacement. Here is a
pathological example:
#define f(x) x x
f (1
#undef f
#define f 2
f)
which expands to
1 2 1 2
with the semantics described above.
The example is on many lines.
Multi-line macro definitions without backslash-newline
Since comments are replaced by spaces in translation phase 3:
The source file is decomposed into preprocessing tokens7) and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
and the preprocessor runs as phase 4:
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1.
through phase 4, recursively. All preprocessing directives are then deleted.
it is possible, but absurd, to write a multi-line macro like this:
#include <stdio.h>
#define possible_but_absurd(a, b) /* comments
*/ printf("are translated"); /* in phase 3
*/ printf(" before phase %d", a); /* (the preprocessor)
*/ printf(" is run (%s)\n", b); /* but why abuse the system? */
int main(void)
{
printf("%s %s", "Macros can be continued without backslashes",
"because comments\n");
possible_but_absurd(4, "ISO/IEC 9899:2011,\nSection 5.1.1.2"
" Translation phases");
return 0;
}
which, when run, states:
Macros can be continued without backslashes because comments
are translated before phase 4 is run (ISO/IEC 9899:2011,
Section 5.1.1.2 Translation phases)
Backslash-newline in macro definitions
Translation phases 1 and 2 are also somewhat relevant:
Physical source file multibyte characters are mapped, in an implementation-defined
manner, to the source character set (introducing new-line characters for
end-of-line indicators) if necessary. Trigraph sequences are replaced by
corresponding single-character internal representations.
The trigraph replacement is nominally relevant because ??/ is the trigraph for a backslash.
Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part
of such a splice. A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any such
splicing takes place.
This tells you that by the time phase 4 (the preprocessor) is run, macro definitions are on a single (logical) line — the trailing backslash-newline combinations have been deleted.
The standard notes that the phases are 'as if' — the behaviour of the compiler must be as if it went through the separate phases, but many implementations do not formally separate them out fully.
Avoid the GCC extension
The expanded example (quote from the GCC manual) has the invocation spread over many lines, but the definition is strictly on one line. (This much is not a GCC extension but completely standard behaviour.)
Note that if you're remotely sane, you'll ignore the possibility of putting preprocessing directives within the invocation of a macro (the #undef and #define in the example). It is a GCC extension and totally unportable. The standard says that the behaviour is undefined.
Annex J.2 Undefined behavior
There are sequences of preprocessing tokens within the list of macro arguments that would otherwise act as preprocessing directives (6.10.3).

The output of the following C code is T T , why not t t?

The output of the following C code is T T, but I think it should be t t.
#include<stdio.h>
#define T t
void main()
{
char T = 'T';
printf("\n%c\t%c\n",T,t);
}
The preprocessor does not perform substitution of any text within quotes, whether they are single quotes or double quotes.
So the character constant 'T' is unchanged.
From section 6.10.3 of the C standard:
9 A preprocessing directive of the form
# define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance of
the macro name 171) to be replaced by the replacement list of
preprocessing tokens that constitute the remainder of the
directive. The replacement list is then rescanned for more macro
names as specified below.
171) Since, by macro-replacement time, all character constants and
string literals are preprocessing tokens, not sequences possibly
containing identifier-like subsequences (see 5.1.1.2, translation
phases), they are never scanned for macro names or parameters.
TL;DR The variable name T is subject to MACRO replacement, not the initializer 'T'.
To elaborate, #define MACROs cause textual replacements and anything inside the "quotes" (either '' or "") are not part of MACRO replacement.
So in essence, try running the preprocessor on your code (example: gcc -E test.c) and it looks like
char t = 'T';
printf("\n%c\t%c\n",t,t);
Run gcc -E main.c -o test.txt && tail -f test.txt and See it online
which, expectedly, prints the value of variable t, T T.
That said, for a hosted environment, the required signature for main() is int main(void), at least.

Macro within macro does not work

I came across one more piece of code that is even more confusing..
#include "stdio.h"
#define f(a,b) a##b
#define g(a) #a
#define h(a) g(a)
int main(void)
{
printf("%s\n",h(f(1,2)));
printf("%s\n",g(1));
printf("%s\n",g(f(1,2)));
return 0;
}
output is
12
1
f(1,2)
My assumption was
1) first f(1,2) is replaced by 12 , because of macro f(a,b)
concantenates its arguments
2) then g(a) macro replaces 1 by a string literal "1"
3) the output should be 1
But why is g(f(1,2)) not getting substituted to 12.
I'm sure i'm missing something here.
Can someone explain me this program ?
Macro replacement occurs from the outside in. (Strictly speaking, the preprocessor is required to behave as though it replaces macros one at a time, starting from the beginning of the file and restarting after each replacement.)
The standard (C99 §6.10.3.2/2) says
If, in the replacement list, a parameter is immediately preceded by a # preprocessing
token, both are replaced by a single character string literal preprocessing token that
contains the spelling of the preprocessing token sequence for the corresponding
argument.
Since # is present in the replacement list for the macro g, the argument f(1,2) is converted to a string immediately, and the result is "f(1,2)".
On the other hand, in h(f(1,2)), since the replacement list doesn't contain #, §6.10.3.1/1 applies,
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded.
and the argument f(1, 2) is macro expanded to give 12, so the result is g(12) which then becomes "12" when the file is "re-scanned".
Macros can't expand into preprocessing directives. From C99 6.10.3.4/3 "Rescanning and further replacement":
The resulting completely macro-replaced preprocessing token sequence
is not processed as a preprocessing directive even if it resembles
one,
Source: https://stackoverflow.com/a/2429368/2591612
But you can call f(a,b) from g like you did with h. f(a,b) is interpreted as a string literal as #Red Alert states.

Resources