When is using whitespace for readability NOT allowed? - c

C seems to be pretty permissive when it comes to whitespace.
We can use or omit whitespace around an operator, between a function name and its parenthesized list of arguments, between an array name and its index, etc. in order to make code more readable. I understand this is a matter of preference.
The only place I can think of where whitespace is NOT allowed is this:
#include < stdio.h > // fatal error: stdio.h : No such file or directory
What are the other contexts in C where whitespace cannot be used for readability?

In most cases, adding whitespace within a single token either makes the program invalid or changes the meaning of the token. An obvious example: "foo" and " foo " are both valid string literals with different values, because a string literal is a single token. Changing 123456 to 123 456 changes it from a single integer constant to two integer constants, resulting in a syntax error.
The exceptions to this involve the preprocessor.
You've already mentioned the #include directive. Note that given:
#include "header.h"
the "header.h" is not syntactically a string literal; it's processed before string literals are meaningful. The syntax is similar, but for example a \t sequence in a header name isn't necessarily replaced by a tab character.
Newlines (which are a form of whitespace) are significant in preprocessor directives; you can't legally write:
#ifdef
FOO
/* ... */
#endif
But whitespace other than newlines is permitted:
# if SPACES_ARE_ALLOWED_HERE
#endif
And there's one case I can think of where whitespace is permitted between preprocessor tokens but it changes the meaning. In the definition of a function-like macro, the ( that introduces the parameter list must immediately follow the macro name. This:
#define TWICE(x) ((x) + (x))
defines TWICE as a function-like macro that takes one argument. But this:
#define NOT_TWICE (x) ((x) + (x))
defines NOT_TWICE as an ordinary macro with no arguments that expands to (x) ((x) + (x)).
This rule applies only to macro definitions; a macro invocation follows the normal rules, so you can write either TWICE(42) or TWICE ( 42 ).

White spaces are not allowed for readability (are significant) within a lexical token. I.e. within an identifier (foo bar is different from foobar), within a number (123 456 is different from 123456), within a string (that's your example basically) or within an operator (+ + is different from ++ and + = is different from +=). Between those you can add as much white space as you want, but when you add white space inside such a token you will break the lexical token into two separate tokens (or change the value in case of string constants), thus changing the meaning of your code .
In most cases the code with the added white space is either equivalent to the original code or results in a syntax error. But there are exceptions. For example:
return a +++ b;
is the same as
return a ++ + b;
but is different from:
return a + ++ b;

As I recall you need to be very careful with function-like macros, as in such dummy example:
#include <stdio.h>
#define sum(x, y) ((x)+(y))
int main(void)
{
printf("%d\n", sum(2, 2));
return 0;
}
the:
#define sum(x, y) ((x)+(y))
is different thing than say:
#define sum (x, y) ((x)+(y))
The latter one is object-like macro, that replaces exactly with (x, y) ((x)+(y)), that is parameters are not being subsituted (as it happens in function-like macro).

Related

Why these consecutive macro replacements do not result in an error?

This program gives output as 5. But after replacing all macros, it would result in --5. This should cause an compilation error, trying to decrement the 5. But it compiles and runs fine.
#include <stdio.h>
#define A -B
#define B -C
#define C 5
int main()
{
printf("The value of A is %d\n", A);
return 0;
}
Why is there no error?
Here are the steps for the compilation of the statement printf("The value of A is %d\n", A);:
the lexical parser produces the preprocessing tokens printf, (, "The value of A is %dn", ,, A, ) and ;.
A is a macro that expands to the 2 tokens - and B.
B is also a macro and gets expanded to - and C.
C is again a macro and gets expanded to 5.
the tokens are then converted to C tokens, producing errors for preprocessing tokens that do not convert to proper C tokens (ex: 0a). In this example, the tokens are identical.
the compiler parses the resulting sequence according to the C grammar: printf, (, "The value of A is %d\n", ,, -, -, 5, ), ; matches a function call to printf with 2 arguments: a format string and a constant expression - - 5, which evaluates to 5 at compile time.
the code is therefore equivalent to printf("The value of A is %d\n", 5);. It will produce the output:
The value of A is 5
This sequence of macros is expanded as tokens, not strictly a sequence of characters, hence A does not expand as --5, but rather as - -5. Good C compilers would insert an extra space when preprocessing the source to textual output to ensure the resulting text produces the same sequence of tokens when reparsed. Note however that the C Standard does not say anything about preprocessing to textual output, it only specifies preprocessing as one of the parsing phases and it is a quality of implementation issue for compilers to not introduce potential side effects when preprocessing to textual output.
There is a separate feature for combining tokens into new tokens in the preprocessor called token pasting. It requires a specific operator ## and is quite tricky to use.
Note also that macros should be defined with parentheses around each argument and parentheses around the whole expansion to prevent operator precedence issues:
#define A (-B)
#define B (-C)
#define C 5
Two consecutive dashes are not combined into a single pre-decrement operator -- because C preprocessor works with individual tokens, effectively inserting whitespace around macro substitutions. Running this program through gcc -E
#define A -B
#define B -C
#define C 5
int main() {
return A;
}
produces the following output:
int main() {
return - -5;
}
Note the space after the first -.
According to the standard, macro replacements are performed at the level of preprocessor tokens, not at the level of individual characters (6.10.3.9):
A preprocessing directive of the form
# define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance of the macro name to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive.
Therefore, the two dashes - constitute two different tokens, so they are kept separate from each other in the output of the preprocessor.
Whenever we use #include in C program, compiler will replace the variable with its value wherever it is used.
#define A -B
#define B -C
#define C 5
So when we print A ,
it will execute in following steps.
A=>-B
B=>-C
A=>-(-C)=>C
So when we print value of A, it comes out to be 5.
Generally these #define statements are used to declare value of constants that are to be used through out the code.
For more info see
this link on #define directive

How can macro output double quotes

ATOMIC_JOIN(prefix, detail_platform) is an macro which will output some string as follows:
base/atomic/gcc_gnu_x64
in another macro ATOMIC_DETAIL_HEADER, which output expected to be:
"base/atomic/gcc_gnu_x64.hpp" // notice: double quotes included in the output
I try to write the ATOMIC_DETAIL_HEADER, such as:
#define ATOMIC_DETAIL_HEADER(prefix) "ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp"
#define ATOMIC_DETAIL_HEADER(prefix) \"ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp\"
#define ATOMIC_DETAIL_HEADER(prefix) "##ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp##"
... failed!
but if i hope output is:
<base/atomic/gcc_gnu_x64.hpp>
The follow macro define can do right thing:
#define ATOMIC_DETAIL_HEADER(prefix) <ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp>
A cpp macro cannot build strings this way. It can join tokens to form new tokens, but at every stage it must be a valid token. Your example with angle-brackets works because the bracket characters are distinct tokens whereas the double-quotes cannot exist floating-off like that, and you cannot apply ## to it.
In most contexts, the compiler will concatenate adjacent string literals, so it may be sufficient to #stringify each piece at let the compiler do that.
While luser droog correctly stated why your use of quotes didn't work, he didn't show exactly how the goal can be accomplished. Indeed the # operator replaces a parameter by a string literal, i. e. puts quotation marks around the argument. This is slightly complicated by the fact that your token sequence has to be expanded first, so an additional level of macro substitution is needed:
#define QUOTED(a) #a
#define QUOTE(a) QUOTED(a)
#define ATOMIC_DETAIL_HEADER(prefix) QUOTE(ATOMIC_JOIN(prefix, ATOMIC_DETAIL_PLATFORM).hpp)

Macro within macro does not work

I came across one more piece of code that is even more confusing..
#include "stdio.h"
#define f(a,b) a##b
#define g(a) #a
#define h(a) g(a)
int main(void)
{
printf("%s\n",h(f(1,2)));
printf("%s\n",g(1));
printf("%s\n",g(f(1,2)));
return 0;
}
output is
12
1
f(1,2)
My assumption was
1) first f(1,2) is replaced by 12 , because of macro f(a,b)
concantenates its arguments
2) then g(a) macro replaces 1 by a string literal "1"
3) the output should be 1
But why is g(f(1,2)) not getting substituted to 12.
I'm sure i'm missing something here.
Can someone explain me this program ?
Macro replacement occurs from the outside in. (Strictly speaking, the preprocessor is required to behave as though it replaces macros one at a time, starting from the beginning of the file and restarting after each replacement.)
The standard (C99 §6.10.3.2/2) says
If, in the replacement list, a parameter is immediately preceded by a # preprocessing
token, both are replaced by a single character string literal preprocessing token that
contains the spelling of the preprocessing token sequence for the corresponding
argument.
Since # is present in the replacement list for the macro g, the argument f(1,2) is converted to a string immediately, and the result is "f(1,2)".
On the other hand, in h(f(1,2)), since the replacement list doesn't contain #, §6.10.3.1/1 applies,
After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been
expanded.
and the argument f(1, 2) is macro expanded to give 12, so the result is g(12) which then becomes "12" when the file is "re-scanned".
Macros can't expand into preprocessing directives. From C99 6.10.3.4/3 "Rescanning and further replacement":
The resulting completely macro-replaced preprocessing token sequence
is not processed as a preprocessing directive even if it resembles
one,
Source: https://stackoverflow.com/a/2429368/2591612
But you can call f(a,b) from g like you did with h. f(a,b) is interpreted as a string literal as #Red Alert states.

Define a macro off of the content of a macro

Is it possible to define a macro off of the content of a macro?
For example:
#define SET(key,value) #define key value
SET(myKey,"value")
int main(){
char str[] = myKey;
printf("%s",str);
}
would result in
int main(){
char str[] = "value";
printf("%s",str);
}
after being preprocessed.
Why would I do this? Because I'm curious ;)
No, its not possible to define a macro within another macro.
The preprocessor only iterates once before the compiler. What you're suggesting would require an undetermined amount of iterations.
No you can't - # in a replacment list of a macro means QUOTE NEXT TOKEN. It's more of a spelling issue, than any logical puzzle :)
(If you require this kind of solution in your code, than there are ways and tricks of using macro's, but you need to be specific about the use cases you need - as your example can be achieved by defining: #define mykey "value")
Here it is from the ansi C99 standard
6.10.3.2 The # operator
Constraints
1 Each # preprocessing token in the replacement list for a
function-like macro shall be followed by a parameter as the next
preprocessing token in the replacement list. Semantics 2 If, in the
replacement list, a parameter is immediately preceded by a #
preprocessing token, both are replaced by a single character string
literal preprocessing token that contains the spelling of the
preprocessing token sequence for the corresponding argument. Each
occurrence of white space between the argument’s preprocessing tokens
becomes a single space character in the character string literal.
White space before the first preprocessing token and after the last
preprocessing token composing the argument is deleted. Otherwise, the
original spelling of each preprocessing token in the argument is
retained in the character string literal, except for special handling
for producing the spelling of string literals and character constants:
a \ character is inserted before each " and \ character of a character
constant or string literal (including the delimiting " characters),
except that it is implementation-defined whether a \ character is
inserted before the \ character beginning a universal character name.
If the replacement that results is not a valid character string
literal, the behavior is undefined. The character string literal
corresponding to an empty argument is "". The order of evaluation of #
and ## operators is unspecified.
Macros are a simple text substitution. Generating new preprocessor directives from a macro would require the preprocessor to continue preprocessing from the beginning of the substitution. However, the standard defined preprocessing to continue behind the substitution.
This makes sense from a streaming point of view, viewing the unprocessed code as the input stream and the processed (and substituted) code as the output stream. Macro substitutions can have an arbitrary length, which means for the preprocessing from the beginning that an arbitrary number of characters must be inserted at the beginning of the input stream to be processed again.
When the processing continues behind the substitution, then the input simply is handled in one single run without any insertion or buffering, because everything directly goes to the output.
whilst it is not possible to use a macro to define another macro, depending on what you are seeking to achieve, you can use macros to effectively achieve the same thing by having them define constants. for example, i have an extensive library of c macros i use to define objective C constant strings and key values.
here are some snippets of code from some of my headers.
// use defineStringsIn_X_File to define a NSString constant to a literal value.
// usage (direct) : defineStringsIn_X_File(constname,value);
#define defineStringsIn_h_File(constname,value) extern NSString * const constname;
#define defineStringsIn_m_File(constname,value) NSString * const constname = value;
// use defineKeysIn_X_File when the value is the same as the key.
// eg myKeyname has the value #"myKeyname"
// usage (direct) : defineKeysIn_X_File(keyname);
// usage (indirect) : myKeyDefiner(defineKeysIn_X_File);
#define defineKeysIn_h_File(key) defineStringsIn_h_File(key,key)
#define defineKeysIn_m_File(key) defineStringsIn_m_File(key,##key)
// use defineKeyValuesIn_X_File when the value is completely unrelated to the key - ie you supply a quoted value.
// eg myKeyname has the value #"keyvalue"
// usage: defineKeyValuesIn_X_File(keyname,#"keyvalue");
// usage (indirect) : myKeyDefiner(defineKeyValuesIn_X_File);
#define defineKeyValuesIn_h_File(key,value) defineStringsIn_h_File(key,value)
#define defineKeyValuesIn_m_File(key,value) defineStringsIn_m_File(key,value)
// use definePrefixedKeys_in_X_File when the last part of the keyname is the same as the value.
// eg myPrefixed_keyname has the value #"keyname"
// usage (direct) : definePrefixedKeys_in_X_File(prefix_,keyname);
// usage (indirect) : myKeyDefiner(definePrefixedKeys_in_X_File);
#define definePrefixedKeys_in_h_File_2(prefix,key) defineKeyValuesIn_h_File(prefix##key,##key)
#define definePrefixedKeys_in_m_File_2(prefix,key) defineKeyValuesIn_m_File(prefix##key,##key)
#define definePrefixedKeys_in_h_File_3(prefix,key,NSObject) definePrefixedKeys_in_h_File_2(prefix,key)
#define definePrefixedKeys_in_m_File_3(prefix,key,NSObject) definePrefixedKeys_in_m_File_2(prefix,key)
#define definePrefixedKeys_in_h_File(...) VARARG(definePrefixedKeys_in_h_File_, __VA_ARGS__)
#define definePrefixedKeys_in_m_File(...) VARARG(definePrefixedKeys_in_m_File_, __VA_ARGS__)
// use definePrefixedKeyValues_in_X_File when the value has no relation to the keyname, but the keyname has a common prefixe
// eg myPrefixed_keyname has the value #"bollocks"
// usage: definePrefixedKeyValues_in_X_File(prefix_,keyname,#"bollocks");
// usage (indirect) : myKeyDefiner(definePrefixedKeyValues_in_X_File);
#define definePrefixedKeyValues_in_h_File(prefix,key,value) defineKeyValuesIn_h_File(prefix##key,value)
#define definePrefixedKeyValues_in_m_File(prefix,key,value) defineKeyValuesIn_m_File(prefix##key,value)
#define VA_NARGS_IMPL(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, N, ...) N
#define VA_NARGS(...) VA_NARGS_IMPL(X,##__VA_ARGS__, 11, 10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
#define VARARG_IMPL2(base, count, ...) base##count(__VA_ARGS__)
#define VARARG_IMPL(base, count, ...) VARARG_IMPL2(base, count, __VA_ARGS__)
#define VARARG(base, ...) VARARG_IMPL(base, VA_NARGS(__VA_ARGS__), __VA_ARGS__)
and a usage example that invokes it:
#define sw_Logging_defineKeys(defineKeyValue) \
/** start of key list for sw_Logging_ **/\
/**/defineKeyValue(sw_Logging_,log)\
/**/defineKeyValue(sw_Logging_,time)\
/**/defineKeyValue(sw_Logging_,message)\
/**/defineKeyValue(sw_Logging_,object)\
/**/defineKeyValue(sw_Logging_,findCallStack)\
/**/defineKeyValue(sw_Logging_,debugging)\
/**/defineKeyValue(sw_Logging_,callStackSymbols)\
/**/defineKeyValue(sw_Logging_,callStackReturnAddresses)\
/** end of key list for sw_Logging_ **/
sw_Logging_defineKeys(definePrefixedKeys_in_h_File);
the last part may be a little difficult to get your head around.
the sw_Logging_defineKeys() macro defines a list that takes the name of a macro as it's parameter (defineKeyValue) this is then used to invoke the macro that does the actual definition process. ie, for each item in the list, the macro name passed in is used to define the context ( "header", or "implementation", eg either "h" or "m" file, if you understand the objective c file extensions) whilst this is used for objective c, it is simply plain old c macros, used for a "higher purpose" than possibly Kernighan and Richie ever envisaged. :-)

C preprocessor stringification weirdness

I am defining a macro that evaluates to a constant string, holding the filename and the line number, for logging purposes.
It works fine, but I just can't figure out why 2 additional macros are needed - STRINGIFY and TOSTRING, when intuition suggests simply __FILE__ ":" #__LINE__.
#include <stdio.h>
#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
#define THIS_ORIGIN (__FILE__ ":" TOSTRING(__LINE__))
int main (void) {
/* correctly prints "test.c:9" */
printf("%s", THIS_ORIGIN);
return 0;
}
This just seems like an ugly hack to me.
Can someone explain in detail what happens stage by stage so that __LINE__ is stringified correctly, and why neither of __FILE__ ":" STRINGIFY(__LINE__) and __FILE__ ":" #__LINE__ works?
Because of the order of expansion. The GCC documentation says:
Macro arguments are completely macro-expanded before they are substituted into a macro body, unless they are stringified or pasted with other tokens. After substitution, the entire macro body, including the substituted arguments, is scanned again for macros to be expanded. The result is that the arguments are scanned twice to expand macro calls in them.
So if the argument will be stringified, it is not expanded first. You are getting the literal text in the parenthesis. But if it's being passed to another macro, it is expanded. Therefore if you want to expand it, you need two levels of macros.
This is done because there are cases where you do not want to expand the argument before stringification, most common being the assert() macro. If you write:
assert(MIN(width, height) >= 240);
you want the message to be:
Assertion MIN(width, height) >= 240 failed
and not some insane thing the MIN macro expands to (in gcc it uses several gcc-specific extensions and is quite long IIRC).
You can't simply use __FILE__":"#__LINE__ because the stringify operator # can only be applied to a macro parameter.
__FILE__ ":" STRINGIFY(__LINE__) would work OK with other text (eg __FILE__ ":" STRINGIFY(foo), but doesn't work when used with another macro (which is all __LINE__ really is) as the parameter; otherwise that macro doesn't get substituted.

Resources