Macro definition for ".global" - c

I have a "c" program that uses ".global" assembly language code. My compiler does not allow this, i need to ignore the same, i tried the following:
#define .global //global
But this gives a compiler error. Is there any other option that I can use. The compilation error is:
"expected an identifier"

You are almost certainly going to end up writing your own preprocessing script; it shouldn't be too difficult, if your source files are reasonably controlled. If you don't use the .global construct in string literals or comments, for example, it would be sufficient to do something like:
sed 's/.global [_[:alpha:]][_[:alnum:]]*;//g'
(perhaps with a bit more attention to detail about whitespace).
You cannot manufacture a comment with a macro. (You also cannot define a macro whose name starts with a ., although I suppose there could be a compiler which accepts that as an extension.)
Comments are replaced with whitespace in phase 3 of the translation process. Preprocessor directives are not examined until phase 4, by which time all the comments have disappeared.
So there is no difference between
#define COMMENT //comment
#define COMMENT
Standards reference: §5.1.1.2/1:
The source file is decomposed into preprocessing tokens7) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character.…
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed.…

Related

Preprocessor and compiler errors in C

When I have a syntax error in C, how can I know if it's a preprocessor error or a compiler error?
Let's say I type in this line: "# include header.h" (The " is part of the line to make it a string literal).
Will the preprocessor have an issue with it or will it be the compiler that will treat it as a string without assigning it to anything?
Typically compiler output doesn't distinguish "pre-processor errors" from "compiler errors", as these aren't really standardized terms.
What's called "pre-processing" is often the process of forming pre-processor tokens, followed by resolving all includes, pragmas and macros. In the C standard, this "pre-processing" roughly corresponds to "translation phases" 3 and 4:
The source file is decomposed into preprocessing tokens and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by
one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
The compiler will not obviously complain about finding a valid string literal "# include header.h" in either of the above phases - a string literal is a valid pre-processor token. What you call "pre-processor errors" is probably errors that occur in any of the above phases.
(This is a simplified explanation, there's lots of other mildly interesting stuff happening as well, like trigraph and newline \ replacement etc.)
But in this case, I think the compiler will complain in phase 7, emphasis mine:
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
"Will the preprocessor have an issue with it or will it be the compiler that will treat it as a string without assigning it to anything?"
I've tried your example:
"#include <stdio.h>"
I get the following errors:
For GCC:
"error: expected identifier or '(' before string constant"
For Clang:
"error: expected identifier or '('"
You can see it here.
Both GCC and Clang treats it as string literal, which is reasonable since character sequences surrounded by " are specified as string literals:
"A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes,as in "xyz"."
Source: ISO/IEC 9899:2018 (C18), §6.4.5/3.
This issue is one the compiler cares about, not the preprocessor. In general, since macros are expanded before compilation, the incorrectness or failure of preprocessor directives is usually also something the compiler complains about. There is usually no explicit error detection stage for the C preprocessor.
If the assignment would be proper, f.e.:
const char* p = "#include <stdio.h>";
and you use variables, functions etc. which are declared in header.h, you can* get errors about undefined references about these variables/functions, since the compiler/linker can't see/find those declarations.
*Whether you get an error or not is furthermore dependent upon if the definition of that variable/function is visable before its use in the source code or how you link several source files.
"When I have a syntax error in C, how can I know if it's a preprocessor error or a compiler error?"
As said above, there are no real preprocessor errors, the compiler covers these issues. The preprocessor doesn't really analyze for errors, it is just expanding. Usually it is very clear if an error belongs to a macro or not, even though the compiler evaluates the syntactical issues.
As said in the comments already by Eugene, you can take a look at the macro expanded version of your code when using the -E option for GCC and test if the expansions were expanded successfully/as desired.

C translation phase 4

Recently I encountered the following issue. My implementation was looking like this:
#define MY_CODE_VERSION PROJ_VERSION
#include "project.h"
if (3 != MY_CODE_VERSION)
PROJ_VERSION was defined in project.h. Why didn't I get a compilaton warning/error? Because I was trying to define something on a macro that was not known by the time the compiler was reaching the line #define MY_CODE_VERSION PROJ_VERSION.
I took a look over these phases from ANSI C but I can't figure it out the reason (the actual behaviour of the compiler, at which phase MY_CODE_VERSION takes the value of PROJ_VERSION).
My assuption is that this replacement takes place only at line "#if (3 != MY_CODE_VERSION)" and by this time PROJ_VERSION is already known by the compiler from the inclusion of project.h above.
Thank you in advance
I'll not hash out what you already know. What you apparently did not know:
6.10.3.4 Rescanning and further replacement
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing
tokens are removed. Then, the resulting preprocessing token sequence
is rescanned, along with all subsequent preprocessing tokens of the
source file, for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s
preprocessing tokens), it is not replaced. Furthermore, if any nested
replacements encounter the name of the macro being replaced, it is not
replaced. These nonreplaced macro name preprocessing tokens are no
longer available for further replacement even if they are later
(re)examined in contexts in which that macro name preprocessing token
would otherwise have been replaced.
The resulting completely macro-replaced preprocessing token sequence is not processed as a preprocessing directive even if it
resembles one, but all pragma unary operator expressions within it are
then processed as specified in 6.10.9 below.
In short, once a macro has been expanded and all the string-izer and concatenations have been performed, the resulting "thing" is scanned once again for more stuff to replace. If the same name is found, it is not replaced.
So what you're seeing is standard-defined.

C Preprocessor: Dynamic #Define Creation

I would like to have the expansion of these C preprocessor lines:
#define _POUND_ #define
_POUND_ _FALSE 0
_FALSE
expand so the last line (i.e. _FALSE) expands to 0. I understand recursive CPP isn't possible directly but that it can be done. Unfortunately, I'm not fully sure I follow the logic presented in this link.
I think I need to force an additional evaluation but I don't know how to do that in this case (i.e. I have tried and failed).
Can you help?
As indicated several times over in comments, what you are looking for is not supported. Here's what the standard has to say about it:
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints: The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either the first character in the source file (optionally after white space containing no new-line characters) or that follows white space containing at least one new-line character.
(C2011, 6.10/2; emphasis added)
Translation phase 4 is the one in which preprocessing directives are executed, so it follows that macro expansion during phase 4 cannot cause bona fide preprocessing directives to be created. Macros can be expanded to text that has the form of a preprocessing directive, but such text cannot actually be a directive.
It is true that the text resulting from a macro expansion is re-scanned for more macros to expand, but that process does not involve recognizing preprocessing directives that were not already there.

Inserting a one-line line comment with a preprocessor macro

Is it possible to simulate a one-line comment (//) using a preprocessor macro (or magic)? For example, can this compile with gcc -std=c99?
#define LINE_COMMENT() ???
int main() {
LINE_COMMENT() asd(*&##)($*?><?><":}{)(#
return 0;
}
No. Here is an extract from the standard showing the phases of translation of a C program:
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
As you can see, comments are removed before macros are expanded, so a macro cannot expand into a comment.
You can obviously define a macro that takes an argument and expands to nothing, but it's slightly more restrictive than a comment, as its argument must consist only of valid preprocessor token characters (e.g. no # or unmatched quotes). Not very useful for general commenting purposes.
No. Comments are processed at preprocessor phase. You can do selective compilation (without regard to comments) with #if directives, as in:
#if 0
... // this stuff will not be compiled
...
#endif // up to here.
that's all the magic you can do with the limited macro preprocessor available in C/C++.

What is the lexical and syntactic analysis during the process of compiling in C Compiler?

What is the lexical and syntactic analysis during the process of compiling. Does the preprocessing happens after lexical and syntactic analysis ?
Consider this code:
int a = 10;
if (a < 4)
{
printf("%d", a);
}
In the Lexical Analysis phase: You identify each word/token and assign a meaning to it.
In the code above, you start by identifying that i followed by n followed by t and then a space is the word int, and that it is a language keyword;1 followed by 0 and a space is a number 10 and so on.
In the Syntactic Analysis phase: You verify whether the code follows the language syntax(grammar rules). For example, you check whether there is only one variable on the LHS of an operator(considering language C), that each statement is terminated by a ;, that if is followed by a conditional/Boolean statement etc.
Like others have mentioned, usually, preprocessing happens before lexical analysis or syntactical analysis.
Lexical analysis happens BEFORE the syntactical analysis. This is logical because when it is necessary to call a macro it is necessary to identify the borders of an identifier first. This is done with lexical analysis. After that syntactical analysis kicks in. Note that compilers are typically not generating the full preprocessed source before starting the syntactic analysis. They read the source picking one lexema at a time, do the preprocessing if needed, and feed the result to syntactic analysis.
In one case lexical analysis happens twice. This is the paste buffering. Look at the code:
#define En(x) Abcd ## x ## x
enum En(5)
{
a, b = 20, c, d
};
This code defines enum with a name Abcd55. When the ## are processed during the macro expansion, the data is placed into an internal buffer. After that this buffer is scanned much like a small #include. During the scanning compiler will break contents of the buffer into lexemas. It may happen that borders of scanned lexemas will not match the borders of original lexemas that were placed into the buffer. In the example above 3 lexemas are placed into the buffer but only one is retrieved.
Preprocessing happens before the lexical analysis iirc
Comments get filtered out, #define, ... and after that, a compiler generates tokens with a scanner/lexer (lexical analysis). After that compilers generate parsetrees, which are for the syntactic analysis
There are exceptions, but it usually breaks out like this:
Preprocess - transform program text to program text
Lexical analysis - transform program text to "tokens", which are essentially small integers with attributes attached
Syntactic analysis - transform program text to abstract syntax
The definition of "abstract syntax" can vary. In one-pass compilers, abstract syntax amounts to tartget code. But theses days it's usually a tree or DAG that logically represents the structure of the program.
When we are talking about C programming language, we should note that there is an ISO (ANSI) stadard for the language. Here is a last public draft of C99 (ISO/IEC 9899:1999): www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
There is a section "5.1.1.2 Translation phases" which says how should C program be parsed. There are stages:
... some steps for multi-byte, trigraph and backslash processing...
3). The source file is decomposed into preprocessing tokens and sequences of
white-space characters (including comments).
This is lexical analysis for preprocessing. Only preprocessor directives, punctuation, string constants, identifiers, comments are lexed here.
4). Preprocessing directives are executed, macro invocations are expanded
This is preprocessing itself. This phase will also include files from #include and then it will delete preprocessing directives (like #define or #ifdef and other)
... processing of string literals...
7). White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
Conversion to token means language keyword detection and constants detection.
This is the step of final lexical analysis; syntactic and semantic analyses.
So, your question was:
Does the preprocessing happens after lexical and syntactic analysis ?
Some lexical analysis is needed to do preprocessing, so order is:
lexical_for_preprocessor, preprocessing, true_lexical, other_analysis.
PS: Real C compiler may be organized in slightly different way, but it must behave in the same way as written in standard.

Resources