What am I missing with the ## operator

What am I missing with the ## operator - c

I have a compiler error with GCC when trying to processor some macros from some TI code that compiles ok with the TI compiler.
The Macro's in question are some variation of
#define CHIP_FSET(Reg,Field,Val) _CHIP_##Reg##_FSET(##Field,Val)
and it is used in code like
CHIP_FSET(ST1_55, XF, CHIP_ST1_55_XF_OFF)
and when GCC gets a hold of that it says
error: pasting "(" and "XF" does not give a valid pre-processing token
It pre-processes successfully if I remove the ## in front of Field. If I am understanding the code correctly the ## in front of field seems irrelevant because it is turned into a function call (or another macro call) that takes two parameters. So the ## is redundant, the original replacement will result in ..._FSET(Field,Val) anyway.
So what am I missing? Everything I could find on the ## pre-processor directive said it just stuck the text together. So the ## never did anything in the first place in this case.
What am I missing?
And why would GCC choke on it but the TI compiler allow it? I'm guessing the answer to that is something like "ambiguous part of the spec".
=========================
Update
I think the problem is because there is a host of nested macros that might not be being completely resolved. What the compiler ends up with is invalid so it spits the dummy at some point in processing them all.
I've managed to make the problem worse by filling in the missing macros and it has caused some others parts to break. Such are the joys of porting code between platforms and compilers I guess.
Thanks for the help.

No the specs are not ambiguous. ## operates on the level of tokens. It is required that the two tokens that are pasted together must form a valid token, again. ( doesn't form a token with alpha characters, thus the error message.

Related

C Casts Inside a Macro

I am trying to implement the standard xor swap algorithm as a C macro.
I have two versions of the macro. One that doesn't worry about the types and one that attempts to cast everything to an integer.
Here are the macro's
#define XOR_SWAP(a,b) ((a)^=(b),(b)^=(a),(a)^=(b))
#define LVALUE_CAST(type,value) (*((type)*)&(value))
#define XOR_CAST_SWAP(type,a,b) (LVALUE_CAST((type),(a))=(type)(a)^(type)(b),LVALUE_CAST((type),(b))=(type)(b)^(type)(a),LVALUE_CAST((type),(a))=(type)(a)^(type)(b))
I know it's a pain to read the one with a cast, but your efforts are appreciated.
The error that I'm getting is:
some_file.c(260,3): expected expression before ')' token
Now, I'm looking at it but I still can't figure out where my problem lies.
I've even used the -save-temps option to capture the preprocessor output and the line looks like this:
((*(((intptr_t))*)&((Block1)))=(intptr_t)(Block1)^(intptr_t)(Block2),(*(((intptr_t))*)&((Block2)))=(intptr_t)(Block2)^(intptr_t)(Block1),(*(((intptr_t))*)&((Block1)))=(intptr_t)(Block1)^(intptr_t)(Block2));
Before anybody mentions it, I've since realized that I should probably make this a function instead of a macro. Or even better, just use that extra variable to do the swap, it isn't hard.
But I want to know why this macro doesn't work. The brackets seem to match exactly as I wanted them to, so why is it complaining?
The LVALUE_CAST is something I took from #Jens Gustedt's answer in this SO question.
Update:
The macro call that produces that preprocessor output looks like this:
XOR_CAST_SWAP(intptr_t, Block1, Block2);

I don't believe you can wrap types in arbitrary levels of parentheses.* So this compiles fine:
((*(intptr_t*)&((Block1)))=(intptr_t)(Block1)^(intptr_t)(Block2),(*(intptr_t*)&((Block2)))=(intptr_t)(Block2)^(intptr_t)(Block1),(*(intptr_t*)&((Block1)))=(intptr_t)(Block1)^(intptr_t)(Block2));
* Disclaimer: this is purely empirical! I don't intend to peruse the standard to figure out what the details are...

What does '\' actually do in C?

As far as I know \ in C just appends the next line as if there was not a line break.
Consider the following code:
main(){\
return 0;
}
When I saw the pre-processed code(gcc -E) it shows
main(){return
0;
}
and not
main(){return 0;
}
What is the reason for this kind of behaviour? Also, how can I get the code I expected?

Yes, your expected result is the one required by the C and C++ standards. The backslash simply escapes the newline, i.e. the backslash-newline sequence is deleted.
GCC 4.2.1 from my OS X installation gives the expected result, as does Clang. Furthermore, adding a #define to the beginning and testing with
#define main(){\
return 0;
}
main()
yields the correct result
}
{return 0;
Perhaps gcc -E does some extra processing after preprocessing and before outputting it. In any case, the line break seen by the rest of the preprocessor seems to be in the right place. So it's a cosmetic bug.
UPDATE: According to the GCC FAQ, -E (or the default setting of the cpp command) attempts to put output tokens in roughly the same visual location as input tokens. To get "raw" output, specify -P as well. This fixes the observed issues.
Probably what happened:
In preserving visual appearance, tokens not separated by spaces are kept together.
Line splicing happens before spaces are identified for the above.
The { and return tokens are grouped into the same visual block.
0 follows a space and its location on the next line is duly noted.
PLUG: If this is really important to you, I have implemented my own preprocessor with correct implementation of both raw-preprocessed and whitespace-preserving "pretty" modes. Following this discussion I added line splices to the preserved whitespace. It's not really intended as a standalone tool, though. It's a testbed for a compiler framework which happens to be a fully compliant C++11 preprocessor library, which happens to have a miniature command-line driver. (The error messages are on par with GCC, or Clang, sans color, though.)

From K&R section A.12 Preprocessing:
A.12.2 Line Splicing
Lines that end with the backslash character \ are
folded by deleting the backslash and the following newline character.
This occurs before division into tokens.

It doesn't matter :/ The tokenizer will not see any difference. 1
Update In response to the comments:
There seems to be a fair amount of confusion as to what the expected output of the preprocessor should be. My point is that the expectation /seems/ reasonable at a glance but doesn't actually need to be specified in this way for the output to be valid. The amount of whitespace present in the output is simply irrelevant to the parser. What matters is that the preprocessor should treat the continued line as one line while interpreting it.
In other words: the preprocessor is not a text transformation tool, it's a token manipulation tool.
If it matters to you, you're probably
using the preprocessor for for something other than C/C++
treating C++ code as text, which is a ... code smell. (libclang and various less complete parser libraries come to mind).
1 (The preprocessor is free to achieve the specified result in whichever way it sees fit. The result you are seeing is possibly the most efficient way the implementors have found to implement this particular transformation)

Parsing C files without preprocessing it

I want to run simple analysis on C files (such as if you call foo macro with INT_TYPE as argument, then cast the response to int*), I do not want to prerprocess the file, I just want to parse it (so that, for instance, I'll have correct line numbers).
Ie, I want to get from
#include <a.h>
#define FOO(f)
int f() {FOO(1);}
an list of tokens like
<include_directive value="a.h"/>
<macro name="FOO"><param name="f"/><result/></macro>
<function name="f">
<return>int</return>
<body>
<macro_call name="FOO"><param>1</param></macro_call>
</body>
</function>
with no need to set include path, etc.
Is there any preexisting parser that does it? All parsers I know assume C is preprocessed. I want to have access to the macros and actual include instructions.

Our C Front End can parse code containing preprocesser elements can do this to fair extent and still build a usable AST. (Yes, the parse tree has precise file/line/column number information).
There are a number of restrictions, which allows it to handle most code. In those few cases it cannot handle, often a small, easy change to the source file giving equivalent code solves the problem.
Here's a rough set of rules and restrictions:
#includes and #defines can occur wherever a declaration or statement can occur, but not in the middle of a statement. These rarely cause a problem.
macro calls can occur where function calls occur in expressions, or can appear without semicolon in place of statements. Macro calls that span non-well-formed chunks are not handled well (anybody surprised?). The latter occur occasionally but not rarely and need manual revision. OP's example of "j(v,oid)*" is problematic, but this is really rare in code.
#if ... #endif must be wrapped around major language concepts (nonterminals) (constant, expression, statement, declaration, function) or sequences of such entities, or around certain non-well-formed but commonly occurring idioms, such as if (exp) {. Each arm of the conditional must contain the same kind of syntactic construct as the other arms. #if wrapped around random text used as bad kind of comment is problematic, but easily fixed in the source by making a real comment. Where these conditions are not met, you need to modify the original source code, often by moving the #if #elsif #else #end a few tokens.
In our experience, one can revise a code base of 50,000 lines in a few hours to get around these issues. While that seems annoying (and it is), the alternative is to not be able to parse the source code at all, which is far worse than annoying.
You also want more than just a parser. See Life After Parsing, to know what happens after you succeed in getting a parse tree. We've done some additional work in building symbol tables in which the declarations are recorded with the preprocessor context in which they are embedded, enabling type checking to include the preprocessor conditions.

You can have a look at this ANTLR grammar. You will have to add rules for preprocessor tokens, though.

Your specific example can be handled by writing your own parsing and ignore macro expansion.
Because FOO(1) itself can be interpreted as a function call.
When more cases are considered however, the parser is much more difficult. You can refer PDF Link to find more information.

Can you devise a simple macro to effectively produce a compiler error when used?

I am looking for a strange macro definition, on purpose: I need a macro defined in such a way, that in the event the macro is effectively used in compiled code, the compiler will unfailingly produce an error.
The background: Since C11 had introduced several new keywords, and a new C++11 standard also added a few, I would like to introduce a header file in my projects (mostly using C89/C95 compilers with a few additions) to force developers to refrain from using these new keywords as identifier names, unless, of course, they are recognized as keywords in the intended fashion.
In the ancient past, I did this for new like this:
#define new *** /* C++ keyword, do not use */
And yes, it worked. Until it didn't, when a programmer forgot the underscore in a parameter name:
void myfunction(uint16_t new parameter);
I used variants since, but I've never been challenged again.
Now I intend to create a file with all keywords not supported by various compilers, and I'm looking for a dependable solution, at best with a not too confusing error message. "Syntax error" would be OK, but "parameter missing" would be confusing already.I'm thinking along the lines of
#define atomic +*=*+ /* C11 derived keyword; do not use */
and aside from my usual hesitation, I'm quite sure that any use (but not the definition) of the macro will produce an error.
EDIT: To make it even more difficult, MISRA will only allow the use of the basic source and execution character set, so # or $ are not allowed.
But I'd like to ask the community: Do you have a better macro value? As effective, but shorter? Or even longer but more dependable in some strange situation? Or a completely different method to generate an error (only using the compiler, please, not external tools!) when a "discouraged" identifier is used for any purpose?
Disclaimer:
And, yes, I know I can use a grep or a parser to run on a nightly build, and report the warnings it finds. But dropping an immediate error on the developers desk is quicker, and certain to be fixed before checking in.

If the sport is for the shortest tokensequence that always produces an error, any combination of two 1 character operators that can't legally occur together, but
don't use ({ or }) because gcc has a special meaning for that
don't use any sort of unbalanced parentheses because they can lead you far away until the error is recognized
don't use < or > because they could match template parameters for C++
don't use prefix operators as second character
don't use postfix operators as first character
This leave some possibilities
.., .| and other combinations with . since . expects a following identifier
&|, &/, &^, &,, &;
!|, !/, !^, !,, !;
But actually to be more user friendly I'd also first place a _Pragma in it so the compiler would also spit a warning.
#define atomic _Pragma("message \"some instructive text that you should read\"") ..

I think you can just use an illegal symbol:
#define bad_name #
Another one that would work would be this:
static const char *illegal_keyword = "";
#define bad_name (illegal_keyword = "bad_name")
It would error you that you are changing a constant. Also, the error message will usually be quite good:
Line 8: error: called object 'illegal_keyword = "printf"' is not a function
And the final one that is perhaps the shortest and will always work is this:
#define bad_name #
Because the preprocessor will never replace twice, and # is illegal outside of the prepocessor this will always error.

#define atomic do not use atomic
The expansion is not recursive so it stops. The only way to stop it from being a compilation error is:
#define do
#define not
#define use
but that's verboten because do and not are keywords.
The error message might even include 'atomic'. You can increase the probability of that by rephrasing the message:
#define atomic atomic cannot be used
(Now you are not playing with keywords in the message, though.)

I think [[]] isn't a valid sequence of tokens anywhere, so you could use that:
#define keyword [[]]
The error will be a syntax error, complaining about [ or ].

My attempt:
#define new new[-1]
#define atomic atomic[-1]

Why pre-processor gives a space?

I want to comment a line using the pre-processor:
#define open /##*
#define close */
main()
{
open commented line close
}
when I do $gcc -E filename.c I expected
/* commented line */
but I got
/ * commented line */
so that the compiler shows an error
Why it is giving an unwanted space ?

From the GNU C Preprocessor documentation:
However, two tokens that don't together form a valid token cannot be pasted together. For example, you cannot concatenate x with + in either order. If you try, the preprocessor issues a warning and emits the two tokens. Whether it puts white space between the tokens is undefined. It is common to find unnecessary uses of '##' in complex macros. If you get this warning, it is likely that you can simply remove the '##'.
In this case '*' and '/' do not form a valid C or C++ token. So they are emitted with a space between them.
(Aside: you are likely to get C compilation errors even if you do manage to insert "comments" into the output of the C preprocessor. There aren't supposed to be any comments there.)

The error is because /* is not a valid token.
As explained from the CPP doc:
two tokens that don't together form a valid token cannot be pasted together. For example, you cannot concatenate x with + in either order.
You can get the error by pasting other nonsense stuff e.g. /##+ or +##-.
About the space, it is deliberately inserted to avoid creating a comment and mess up the rest. From the GCC source code:
/* Avoid comment headers, since they are still processed in stage 3.
It is simpler to insert a space here, rather than modifying the
lexer to ignore comments in some circumstances. Simply returning
false doesn't work, since we want to clear the PASTE_LEFT flag. */
if ((*plhs)->type == CPP_DIV && rhs->type != CPP_EQ)
*end++ = ' ';

The preprocessor runs and produces code in a form that the C compiler can understand. It only processes your code once, so even if you could produce a /* with your #define, the compiler would see the /* and give you an error because it's not valid C code (it's a preprocessing instruction).
This doesn't seem like a very good thing to do.

Because comments are replaced with spaces before (and only before) the preprocessor runs. If you paste together the characters / and * using the preprocessor, you get /* which is just a couple operators. Edit: such abuse of ## technically creates /* as a single token, which has undefined behavior. You can paste together > ## > or < %:%: :, although you shouldn't.
See §6.4.6 of C99 for what tokens you are allowed to construct and 6.10.3.3 for the catenation process.

If you want to comment some code using pre-processor, use
#if 0
...
#endif

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight