Flex And Bison, detecting macro statements (newbie) - c

I want to teach flex & bison to detect the macro definitions in pure C. Actually i'am adding this function to the existing parser form here. The parser itself is good, but it lacks macro functionality. So i did add successfully the #include and pragma macros detection, but with the selection macroses i have the problems, this is the code in the parser:
macro_selection_variants
: macro_compound_statement
| include_statement
| pragma_statement
| macro_selection_statement
| statement
;
macro_selection_statement
: MACRO_IFDEF IDENTIFIER macro_selection_variants MACRO_ENDIF
| MACRO_IFDEF IDENTIFIER macro_selection_variants MACRO_ELSE macro_selection_variants MACRO_ENDIF
| MACRO_IFNDEF IDENTIFIER macro_selection_variants MACRO_ENDIF
| MACRO_IFNDEF IDENTIFIER macro_selection_variants MACRO_ELSE macro_selection_variants MACRO_ENDIF
;
statement is declared like so:
statement
: labeled_statement
| compound_statement
| expression_statement
| selection_statement
| iteration_statement
| jump_statement
;
And the lexer part for those macroses is:
"#ifdef" { count(); return(MACRO_IFDEF); }
"#ifndef" { count(); return(MACRO_IFNDEF); }
"#else" { count(); return(MACRO_ELSE); }
"#endif" { count(); return(MACRO_ENDIF); }
So the problem is i get the 2 reduce/reduce errors because i'm trying to use statement in my macro_selection_statement. I need to use statement in the macro selection block, because those blocks can have variables definitions likes so:
#ifdef USER
#include "user.h"
int some_var;
char some_text[]="hello";
#ifdef ONE
int two=0;
#endif
#endif
What would be the right move here? because i read that %expect -rr N is a really bad thing to do with the reduce warnings.

You cannot really expect to implement a preprocessor (properly) inside of a C grammar. It needs to be a *pre*processor; that is, it reads the program text, and its output is sent to the C grammar.
It is possible to (mostly) avoid doing a second lex pass, since (in theory) the preprocessor can output tokens rather than a stream of characters. That would probably work well with a bison 2.7-or-better "push parser", so you might want to give it a try. But the traditional approach is just a stream of characters, which may well be easier.
It's important to remember that the replacement text of a macro, as well as the arguments to the macro, have no syntactic constraints. (Or almost no constraints.) The following is entirely legal:
#define OPEN {
#define CLOSE }
#define SAY(whatever) puts(#whatever);
#include <stdio.h>
int main(int argc, char** argv) OPEN SAY(===>) return 0; CLOSE
And that's just a start :)

Related

Why is bison include not applying/extending to union%?

So here is the start of my program. I keep getting the error
boolExpr.y:13:2: error: unknown type name 'bool'
bool boolean;
However when I check the bison generated file, I can see stdbool.h is included at the start of the program executing. I can't figure out how a library can be important but then bool not be recognized. I'm thinking I missed something simple, or I need to reinstall bison or lex. I can include the rest of the program if needed.
I tried to switch it to int boolean; instead of bool boolean; and that fixed the compilation problem, however it still mystifies me.
Is there some way to extend a pointer to a struct into %union without getting compile errors? I tried to make a structName * boolean; to replace bool boolean but that kept coming back as undefined wimplicit error as well.
%{
#include "semantics.h"
#include <stdbool.h>
#include "IOMngr.h"
#include <string.h>
extern int yylex(); /* The next token function. */
extern char *yytext; /* The matched token text. */
extern int yyerror(char *s);
extern SymTab *table;
extern SymEntry *entry;
%}
%union{
bool boolean;(this is the line # of error)
char * string;
}
%type <string> Id
%type <boolean> Expr
%type <boolean> Term
%type <boolean> Factor
%token Ident
%token TRUE
%token FALSE
%token OR
%token AND
%%
Prog : StmtSeq {printSymTab();};
StmtSeq : Stmt StmtSeq { };
StmtSeq : { };
Stmt : Id '=' Expr ';' {storeVar($1, $3);};
Expr : Expr OR Term {$$ = doOR($1, $3);};
Expr : Term {$$ = $1;};
Term : Term AND Factor {$$ = doAND($1, $3);};
Term : Factor {$$ = $1;};
Factor : '!' Factor {$$ = doNOT($2);};
Factor : '(' Expr ')' {$$ = $2;};
Factor : Id {$$ = getVal($1);};
Factor : TRUE {$$ = true;};
Factor : FALSE {$$ = false;};
Id : Ident {$$ = strdup(yytext);};
%%
int yyerror(char *s) {
WriteIndicator(getCurrentColumnNum());
WriteMessage("Illegal Character in YACC");
return 1;
}
Ok, silly mistake after all- in my lex file, I had
#include "h4.tab.h"
#include "SymTab.h"
#include <stdbool.h>
but it should have been
#include "SymTab.h"
#include <stdbool.h>
#include "h4.tab.h"
didn't realize include order mattered!
When you use a %union declaration, Bison creates a union type called YYSTYPE which it declares in the generated header file; that type is used in the declaration of yylval, which is also in the generated header file. Putting the declarations in the generated header file means that you don't need to do anything to make yylval and its type YYSTYPE available in your lexical analyser, other than #includeing the bison-generated header file.
That's fine if all the types referred to in the %union declaration are standard C types. But there is a problem if you want to use a type which requires an #included header file, or which you yourself define. Assuming you put the necessary lines in a bison code prologue (%{...}%) before the %union declaration, you won't have a problem compiling your parser, but you will run into a problem in the lexical analyser. When you #include your header file, you will effectively insert the declaration of ´union YYSTYPE´ and that will fail unless all of the referenced types have already been defined.
Of course, you can solve this issue by just copying all the necessary #includes and/or definitions from the .y file to the .l file, making sure that you put the ones needed for the union+ declaration before the #include of the header file and the ones which require YYSTYPE be defined after the #include. But that violates the principles of good software design; it means that every time you change an #include or declaration in your .y file, you need to think about whether and where you need to make a similar change in your .l file. Fortunately, bison provides a more convenient mechanism.
The ideal would be to arrange for everything needed to be inserted into the generated header file. Then you can just #include "h4.tab.h" in the lexical analyzer, confident that you don't need to do anything else to ensure that needed #includes are present and in the correct order.
To this end, Bison provides a more flexible alternative to %{...}%, the %code directive:
%code [where] {
// code to insert
}
There are a few possible values for where, which are documented in the Bison documentation.
Two of these are used to maintain the generated header file:
%code requires { ... } inserts the code block in both the header file and the source file, in both cases before the union declaration. This is the code block type you should use for dependencies of the semantic and location types.
%code provides { ... } also inserts the code block into both the header and the source file, but this time after the union declaration. You can use this block type if you have some interfaces which themselves refer to YYSTYPE.
You can still use %{...}% to insert code directly into the output source code. But you might want to instead use
* %code { ... }
which, like %{...}%, only inserts the code in the source file. Unlike %{...}%, it inserts the code in a defined place in the source file, after the YYSTYPE and other declarations. This avoids obscure problems with %{...}% blocks, which are sometimes inserted early and sometimes inserted late and therefore can suddenly fail to compile if you change the order of apparently unrelated bison directives.

How to pass string as prefix of defined macro

Is there any idea to pass C string as part of the defined macro like below code?
#define AAA_NUM 10
#define BBB_NUM 20
#define PREFIX_NUM(string) string##_NUM
int main()
{
char *name_a = "AAA";
char *name_b = "AAA";
printf("AAA_NUM: %d\n", PREFIX_NUM(name_a));
printf("BBB_NUM: %d\n", PREFIX_NUM(name_b));
return 0;
}
Expected output
AAA_NUM: 10
BBB_NUM: 20
As mentioned in other posts, you can't use run-time variables in the pre-processor. You could however create enum that way. Though it is usually not a good idea to generate identifiers with macros either, save for special cases like when maintaining an existing code base and you are limited in how much of the existing code you can/want to change. So it should be used as a last resort only.
The least bad way to write such macros would be by using a common design pattern called "X macros". These are used when it is important that code repetition should be reduced to a single place in the project. They tend to make the code look rather alien though... Example:
#define PREFIX_LIST(X) \
/* pre val */ \
X(AAA, 10) \
X(BBB, 20) \
X(CCC, 30) \
enum // used to generate constants like AAA_NUM = 10,
{
#define PREFIX_ENUMS(pre, val) pre##_NUM = (val),
PREFIX_LIST(PREFIX_ENUMS)
};
#include <stdio.h>
int main (void)
{
// one way to print
#define prefix_to_val(pre) pre##_NUM
printf("AAA_NUM: %d\n", prefix_to_val(AAA));
printf("BBB_NUM: %d\n", prefix_to_val(BBB));
// another alternative
#define STR(s) #s
#define print_all_prefixes(pre, val) printf("%s: %d\n", STR(pre##_NUM), val);
PREFIX_LIST(print_all_prefixes)
return 0;
}
A macro is only processed before compilation and not at runtime. Your code example does not work as you can see here.
Good practice (for example MISRA coding rules) recommend to use macros as little as possible since it is error prone.
Preprocessor works at compile time and here name_a and name_b are non constant, and even if they were (i.e. const char *str is a real constant in C++ but not in C), there is a literal substitution and the preprocessor does not know the contents of variables.
This works (notice that the parameter should be expanded by another macro in order to get a valid token):
#include <stdio.h>
#define AAA_NUM 10
#define BBB_NUM 20
#define _PREFIX_NUM(string) string##_NUM
#define PREFIX_NUM(string) _PREFIX_NUM(string)
int main(void)
{
#define name_a AAA
#define name_b BBB
printf("AAA_NUM: %d\n", PREFIX_NUM(name_a));
printf("BBB_NUM: %d\n", PREFIX_NUM(name_b));
return 0;
}
There is no way in C to create runtime symbols and use them. C is a compiled language and all symbols have to be known before the compilation.
The preprocessor (which do changes on the text level before the compilation) does not know anything about the C language.

Is it bad practice to redefine register masks for PIC24 in order to improve readability?

I am relatively new to working with PIC chips, so this might be a novice-level question, but I am attempting to write a header file containing, among other things, the TRIS/ODC/INIT masks for all of the I/O ports.
On the PCB this chip is built into, any given component is likely to use pins from multiple ports, and there are probably a dozen or so individual components that warrant detailed commenting. For instance, interfacing with a particular SPI ADC module uses pins from ports A, D, and F.
To me, it would seem that the reader-friendly way to write this would be to organize the file by component such that, at a glance, the reader can tell what pins are being used, whether they are configured as inputs or outputs, and how they are initialized.
For instance, showing only the TRIS mask information, here is a snippet of code for a particular ADC module I am using to demonstrate what I am talking about:
#define PORTD_TRIS_MASK 0x00
#define PORTF_TRIS_MASK 0x00
// ...
// lots of hardware configuration stuff goes here
// ...
// ANALOG INPUT - THERMOCOUPLE 1
// Thermocouple ADC chip MAX31856 DNP communicates over SPI
// Accepts any type of thermocouple
// TC1_CS pulled high
// TC1_DRDY pulled high
#define TC1_MOSI LATAbits.LATA14
#define TC1_MISO PORTDbits.RD10
#define TC1_SCK LATDbits.LATD11
#define TC1_CS LATFbits.LATF6
#define TC1_DRDY PORTFbits.RF7
#define TC1_MISO_SHIFT 1<<10
#define TC1_DRDY_SHIFT 1<<7
#define PORTD_TRIS_MASK ( PORTD_TRIS_MASK | TC1_MISO_SHIFT )
#define PORTF_TRIS_MASK ( PORTF_TRIS_MASK | TC1_DRDY_SHIFT )
The above code does not throw any errors, but it does throw a warning:
HardwareProfile.h:1173:0: warning: "PORTD_TRIS_MASK" redefined
HardwareProfile.h:1029:0: note: this is the location of the previous definition
HardwareProfile.h:1174:0: warning: "PORTF_TRIS_MASK" redefined
HardwareProfile.h:1095:0: note: this is the location of the previous definition
The fact that the compiler is complaining about it suggests to me that this may not be encouraged behavior, but nothing about it seems intrinsically problematic to me. Am I missing something, or is this a reasonable way to organize code such that pin configuration details are kept near their definitions?
Alternatively, is there a more conventional way to accomplish what I want to accomplish as far as maintaining readability that is more widely used or acceptable?
Update:
Perhaps I was not clear enough in my original post. It is structured that way because there are a dozen or so such blocks of code in the header file. Supposing there are exactly 13 such blocks of code, any particular mask would be initially defined as 0x00 and redefined 13 times, the idea being that each redefinition adds the configuration information relevant to a particular block.
Update:
In response to a question about how these macros are used, they are simply used to configure all pins in a port at once. On this PIC24, each port has 16 pins, each of which has a TRIS (data direction control) register, ODC (open drain control) register, and LAT (latch) register that, if configured as an output, will need an initial value. Conventionally, writing to one bit at a time sixteen times is discouraged in favor of writing to the entire port once. For instance, consider a simplified case where there are four registers instead of sixteen. Instead of writing this:
// In source file
TRISABITS.TRISA0 = 1;
TRISABITS.TRISA1 = 1;
TRISABITS.TRISA2 = 0;
TRISABITS.TRISA3 = 0;
It is conventional (as I understand it) to write this:
// In header file
#define BIT_0_SHIFT ( 1<<0 )
#define BIT_1_SHIFT ( 1<<1 )
#define BIT_2_SHIFT ( 0<<2 )
#define BIT_3_SHIFT ( 0<<3 )
#define TRISA_MASK ( BIT_0_SHIFT | BIT_1_SHIFT | BIT_2_SHIFT | BIT_3_SHIFT )
// In source file
TRISA = TRISA_MASK;
Regarding a different question about readability, my argument in favor of this structure is that the way ports are organized on this chip is not physically meaningful. Pins on any particular port are not necessarily near each other or in order, and no individual device (e.g. external SPI module) is confined to a single port. Organizing the header file by port means that the reader needs to scroll through the entire header file to examine the configuration of a single device, while organizing the file by device allows an entire device's definitions and configurations to be clearly visible on a single screen.
The preprocessor does not work the same way as code works. For example, consider the following code:
int main(void)
{
int A = (B+C);
int B = (C+2);
int C = 3;
int x = A;
return x;
}
That doesn't work because B and C are used before being declared. The output from the compiler is:
cc -Wall demo.c -o demo
demo.c:3:14: error: use of undeclared identifier 'B'
int A = (B+C);
^
demo.c:3:16: error: use of undeclared identifier 'C'
int A = (B+C);
^
demo.c:4:14: error: use of undeclared identifier 'C'
int B = (C+2);
^
Now try the same thing using #defines for A, B, and C:
#define A (B+C)
#define B (C+2)
#define C 3
int main(void)
{
int x = A;
return x;
}
This time there are no warnings or errors, even though the #defines are out of order. When the preprocessor sees a #define it simply adds an entry to its dictionary. So after reading the three #defines the dictionary contains
Search Replacement
Text Text
--------------------
A (B+C)
B (C+2)
C 3
Note that the preprocessor has not evaluated the replacement text. It simply stores the text. When the preprocessor finds a search term in the code, then it uses the replacement text. So the line
int x = A;
becomes
int x = (B+C);
After performing the substitution, the preprocessor rescans the text to see if more substitutions are possible. After a second scan, we have:
int x = ((C+2)+3);
and the final result is:
int x = ((3 +2)+3);
Most compilers have an option to output the code after preprocessing is finished. With gcc or clang, use the -E option to see the preprocessor output.
OK, so now we should have enough background to actually address your question. Consider the following definitions:
#define PORTD_TRIS_MASK 0x00
#define PORTD_TRIS_MASK ( PORTD_TRIS_MASK | TC1_MISO_SHIFT )
#define PORTD_TRIS_MASK ( PORTD_TRIS_MASK | SB1_DATA_SHIFT )
We have 3 major problems here:
The substitution text is not being evaluated, so the bits are not being OR'd together.
Only one of those definitions (the last one) will be kept in the preprocessor dictionary. That's the reason for the warning. The preprocessor is telling you that it already has an entry for that search term, and it's going to discard the previous replacement text.
When PORTD_TRIS_MASK is found in the code, the preprocessor replaces it with ( PORTD_TRIS_MASK | SB1_DATA_SHIFT ). It then rescans, and finds PORTD_TRIS_MASK again. The result is infinite recursion. Fortunately, the preprocessor has protection against such things, and will stop. The compiler will generate an error later.
The solution is to create uniquely named definitions for each component:
#define TRIS_MASK_D1 TC1_MISO_SHIFT
#define TRIS_MASK_F1 TC1_DRDY_SHIFT
#define TRIS_MASK_D2 SB1_DATA_SHIFT
#define TRIS_MASK_F2 0
And then OR them all together:
#define PORTD_TRIS_MASK (TRIS_MASK_D1 | TRIS_MASK_D2 | ... | TRIS_MASK_D13)
#define PORTF_TRIS_MASK (TRIS_MASK_F1 | TRIS_MASK_F2 | ... | TRIS_MASK_F13)
Macros like PORTD_TRIS_MASK are defined by PIC based on the microcontroller used.
I would not recommend that you change or redefine these macros.
What you can do instead is to use your own macros for particular functionality. e.g.
#define TC1_MISO_SHIFT 1<<10
#define TC1_DRDY_SHIFT 1<<7
#define TC1_MISO_MASK ( PORTD_TRIS_MASK | TC1_MISO_SHIFT )
#define TCI_DRDY_MASK ( PORTF_TRIS_MASK | TC1_DRDY_SHIFT )

C: How to Shield Commas in Macro Arguments?

Is there a general method to shield comments in macro arguments in C? I know that parentheses can be used for this purpose, but that will not work in cases where added parentheses result in syntax errors in the macro output. I've heard that ({ }) works to shield commas in GCC, but I need this code to also work in VC++ (one of the recent versions which does conform to the C standard with regard to commas in macros). I also cannot use variadic macros in my case.
The specific case I'm trying to do is this (lengthof is a macro defined elsewhere). I'm trying to write a single macro for the entire thing because this will be used many times, and having a multi-macro solution would add a large amount of additional testing code.
#define TEST_UNFUNC(func, res_type, res_set, op_type, op_set) \
{ \
static const res_type res[] = res_set; \
static const op_type op[] = op_set; \
int i; \
for (i = 0; i < MIN(lengthof(res), lengthof(op)); i++) \
assert(func(op[i]) == res[i]); \
}
If possible I would like a general answer and not merely a workaround specific to this particular macro.
Use parentheses to shield the comma, and then pass them through a special unparen macro, defined in the example below:
#include <stdio.h>
#define really_unparen(...) __VA_ARGS__
#define invoke(expr) expr
#define unparen(args) invoke(really_unparen args)
#define fancy_macro(a) printf("%s %s\n", unparen(a))
int main()
{
fancy_macro(("Hello", "World"));
}
The trick here is that the invoke macro forces an extra expansion, allowing really_unparen to be called even though it's not followed by parentheses in the source.
Edit: per comment below, this appears to not be necessary in this case. Though I'm sure I've hit a case where I needed it sometime ... and it doesn't hurt.

yyin doesn't read all data in a text file

I have a new problem from the question: Call a function in a Yacc file from another c file So this time, I confront with the problem of yyin function in Lex and Yacc. My codes are following:
calc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval=atoi(yytext); return NUMBER;}
[ \t];
\n return 0;
. return yytext[0];
%%
calc.y
%{
#include <stdio.h>
#include <string.h>
extern FILE* yyin;
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf("= %d\n",$1);}
;
expression: NUMBER '+' NUMBER {$$=$1+$3;}
| NUMBER '-' NUMBER {$$=$1-$3;}
| NUMBER 'x' NUMBER {$$=$1*$3;}
| NUMBER '/' NUMBER
{ if($3 == 0)
yyerror("Error, cannot divided by zero");
else
$$=$1/$3;
}
| NUMBER {$$=$1;}
;
%%
void parse(FILE* fileInput)
{
yyin= fileInput;
while(feof(yyin)==0)
{
yyparse();
}
}
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc,char* argv[])
{
FILE* fileInput;
char inputBuffer[36];
char lineData[36];
if((fileInput=fopen(argv[1],"r"))==NULL)
{
printf("Error reading files, the program terminates immediately\n");
exit(0);
}
parse(fileInput);
}
test.txt
2+1
5+1
1+3
This is how my codes work:
I created main.c to open a file and read it then call a function, parse(), with parameter fileInput.
In calc.y, I set the pointer yyin to be the same as fileInput so the parser can loop and read all lines in a text file (test.txt)
The problem is that yyin didn't read all lines in the text. The result I got is
= 3
The result I should get is
= 3
= 6
= 4
How could I solve this problem. Note that, I want main.c to remain.
This has nothing to do with yyin. Your problem is that your grammar does not recognize a series of statements; it only recognizes one. Having parsed the first line of the file and, ultimately, reduced it to a statement, the parser has nothing more it can do. It has no way to reduce a series of symbols starting with a statement to anything else. Since there is still input left when the parser jams up, it will return an error code to its caller, but the caller ignores that.
To enable the parser to handle multiple statements, you need a rule for that. Presumably you'll want that resulting symbol to be the start symbol, instead of statement. This would be a reasonable rule:
statements: statement
| statements statement
;
Your lexical analyzer returns 0 when it reads a newline.
All grammars from Yacc or Bison recognize 0 as meaning EOF (end of file). You told your grammar that you'd reached the end of file; it believed you.
Don't return 0 for newline. You'll probably need to have your first grammar rule iterate (accept a sequence of statements) — as suggested by John Bollinger in another answer.

Resources