I am writing a lex program to tokenize a C program. I've written the following rule to match a C preprocessor directive
#.* {printf("\n%s is a PREPROCESSOR DIRECTIVE",yytext);}
But when I use a file as an input to yyin, pre-processor directives in the file are matched by yytext displayed is empty
e.g I get
is a PREPROCESSOR DIRECTIVE
There is no problem when yyin is stdin but this arises only when a file is input. Is there an alternate LEX rule?
Focus on the fact that it doesn't work with a file instead of the lex specification, because that is more likely to cause the problem. The printf in the lex file should always at least print the #. The following does work with a file:
%{
#include <stdio.h>
%}
%%
#.* { printf("'%s' preproc\n", yytext); }
%%
int yywrap(void)
{
return 1;
}
int main(int argc, char ** argv)
{
if (argc > 1)
{
if ((yyin = fopen(argv[1], "r")) == NULL)
{
fprintf(stderr, "Can't open `%s'.\n", argv[1]);
exit(1);
}
}
return (yylex());
}
Related
My code is something like below:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXFILENAMESIZE 100
int main(){
char file_name[MAXFILENAMESIZE];
scanf("%s", file_name);
printf("%s\n", file_name);
FILE *fp;
fp = fopen(file_name, "r");
if(!fp){
fprintf(stderr, "failed to open file reading\n");
return 1;
}
fclose(fp);
return 0;
}
My input is something like below:
C:\Users\User\Desktop\user_missions.csv
I expect my executable being able to find the file, yet my file pointer is always null.
I am pretty sure that I do have the file in the directory.
Since I just started to learn programming, if my question is not clear enough, please remind me to improve it.
Thanks.
Bison always prints the input instead of running the action.
I begin with Bison and I try to make it working with the simpler rule as possible.
Lexer
%{
#include <stdio.h>
#include "wip.tab.h"
%}
%%
[\t\n ]+ ;
[a−z]+ { yylval.sval = strdup(yytext); return IDENTIFIER;}
%%
Parser
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char const *);
FILE *yyin;
%}
%union{
char *sval;
}
%token IDENTIFIER
%%
input:
%empty
| input line
;
line:
'\n'
| IDENTIFIER {printf("OK\n");}
;
%%
int main(void) {
FILE *myfile = fopen("example.wip", "r");
if (!myfile) {
printf("File can't be opened\n");
return -1;
}
yyin = myfile;
yyparse();
}
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
The "example.wip" input file
hello
I expect the "OK" output in my terminal but the parser just prints the content of the file.
Thanks in advance.
Bison always prints the input instead of running the action.
Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.
However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:
. { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
And sure enough, this will tell us that all the characters in "hello" are unrecognized.
So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the −. It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.
To fix this, just replace it with a regular dash.
I have a new problem from the question: Call a function in a Yacc file from another c file So this time, I confront with the problem of yyin function in Lex and Yacc. My codes are following:
calc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval=atoi(yytext); return NUMBER;}
[ \t];
\n return 0;
. return yytext[0];
%%
calc.y
%{
#include <stdio.h>
#include <string.h>
extern FILE* yyin;
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf("= %d\n",$1);}
;
expression: NUMBER '+' NUMBER {$$=$1+$3;}
| NUMBER '-' NUMBER {$$=$1-$3;}
| NUMBER 'x' NUMBER {$$=$1*$3;}
| NUMBER '/' NUMBER
{ if($3 == 0)
yyerror("Error, cannot divided by zero");
else
$$=$1/$3;
}
| NUMBER {$$=$1;}
;
%%
void parse(FILE* fileInput)
{
yyin= fileInput;
while(feof(yyin)==0)
{
yyparse();
}
}
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc,char* argv[])
{
FILE* fileInput;
char inputBuffer[36];
char lineData[36];
if((fileInput=fopen(argv[1],"r"))==NULL)
{
printf("Error reading files, the program terminates immediately\n");
exit(0);
}
parse(fileInput);
}
test.txt
2+1
5+1
1+3
This is how my codes work:
I created main.c to open a file and read it then call a function, parse(), with parameter fileInput.
In calc.y, I set the pointer yyin to be the same as fileInput so the parser can loop and read all lines in a text file (test.txt)
The problem is that yyin didn't read all lines in the text. The result I got is
= 3
The result I should get is
= 3
= 6
= 4
How could I solve this problem. Note that, I want main.c to remain.
This has nothing to do with yyin. Your problem is that your grammar does not recognize a series of statements; it only recognizes one. Having parsed the first line of the file and, ultimately, reduced it to a statement, the parser has nothing more it can do. It has no way to reduce a series of symbols starting with a statement to anything else. Since there is still input left when the parser jams up, it will return an error code to its caller, but the caller ignores that.
To enable the parser to handle multiple statements, you need a rule for that. Presumably you'll want that resulting symbol to be the start symbol, instead of statement. This would be a reasonable rule:
statements: statement
| statements statement
;
Your lexical analyzer returns 0 when it reads a newline.
All grammars from Yacc or Bison recognize 0 as meaning EOF (end of file). You told your grammar that you'd reached the end of file; it believed you.
Don't return 0 for newline. You'll probably need to have your first grammar rule iterate (accept a sequence of statements) — as suggested by John Bollinger in another answer.
I want to call fopen() defined in the libc.so from inside my own implementation of fopen. Is it possible to do it without relying on dlsym, dlopen (and also LD_LIBRARY)?
I don't know whether that's a good idea, but you could call your own fopen something else, e.g. do_fopen and write a macro fopen that redirects to that function. When calling fopen in your wrapper, either #undef fopen or place fopen in parentheses to avoid macro expansion:
#include <stdlib.h>
#include <stdio.h>
#define fopen(fn, mode) do_fopen(fn, mode)
FILE *do_fopen(const char *fn, const char *mode)
{
FILE *f = (fopen)(fn, mode);
fprintf(stderr,
"fopen(\"%s\", \"%s\") == %p\n", fn, mode, f);
return f;
}
int main(int argc, char *argv[])
{
FILE *f;
f = fopen(argv[1], "r");
if (f) {
/* do stuff */
}
fclose(f);
return 0;
}
That trick only applies to fopen calls from your own code, not from libraries.
I'm just starting with flex and i have some concerns about this tool.
%{
#include "parser.h"
int line_num = 1;
%}
\n { line_num++; }
%%
In the above code i'm just counting the lines in my scanning file, right?
How could I call the line_num value from another .c document? Whit a function like:
int getLineNumber(void);
And also, how could I detect lexical errors with this tool? I know that it is with the ".*" pattern but how to print it(in a function on a different .c again) like:
printf ("%d: error: %s\n", getLineNumber(), message);
Thanks.
In the code you've shown, you're already defining a global variable line_num. Just declare extern int line_num; in your header file and you can access it anywhere in your program.
If you want to avoid the global variable, replace the beginning of your scanner with something like:
%{
#include "parser.h"
static int line_num = 1;
int getLineNumber(void) {
return line_num;
}
%}
And put a declaration for the getLineNumber function in your header. (This is just ordinary C stuff, flex doesn't add anything weird here.)
For error handling, you can add a rule like:
. { reportUnrecognizedToken(); }
And then put a function like this somewhere:
void uncrecognizedToken() {
printf("Unrecognized token on line %d\n", getLineNumber());
exit(1);
}