Lex/Flex Scanner Isn't Scanning and I Have No Idea Why - c

I have written a simple lex scanner in the file myscanner.l, where testlex.h is just a bunch of #defines as integers (MATCH_0 == 0, etc)
%{
#include "testlex.h"
%}
%%
"dinky" return MATCH_0;
"pinky" return MATCH_1;
"stinky" return MATCH_2;
[ \t\n] ;
. printf("unexpected character\n");
%%
int yywrap(void)
{
return 1;
}
After using lex to create the lex.yy.c file, I implement the code using this C file
#include <stdio.h>
#include "myscanner.h"
extern int yylex();
extern int yylineno;
extern char* yytext;
int main(void)
{
int l = yylex();
while (l)
{
printf("%d\n", l);
l = yylex();
}
return 0;
}
When I pass it this input stream: dinky pinky stinky stinky pinky dinky, there is absolutely no output. The output I am expecting looks like this:
0
1
2
2
1
0
Not even "unexpected character". I know my stack is set up right because I've compiled others' examples and they all scan correctly, but for some inconceivable reason my code _will_not_scan_!
What am I missing?

Looking at your expected output, what you see is the simple result of defining "dinky" -> MATCH_0 as 0.
The first value of l now becomes 0, after having scanned dinky. So while(l) is while(0) and the block is not even executed once. Subsequently your main immediately returns 0.
So don't define any tokens as 0, and then write:
int main(void)
{
int token;
while (token = yylex())
{
printf("%d\n", token);
}
return 0;
}
To be honest I'm surprised you did not find this yourself. Simply trying other input would immediately have giving a clue. And, it should be easy to find that yylex() returns 0 at EOF.
BTW, I think it's better to not use l as variable name as it's almost the same as 1.

The reason why your code does not print anything is that your first input happens to be "dinky", which returns MATCH_0. According to your expected output, MATCH_0 is zero. Therefore, the code will exit right away, before entering the loop even once.
Re-defining MATCH_0 to 1, MATCH_1 to 2, and so on will fix this problem.

Related

Can't use regular expression with .*

I've been trying to use regular expressions (<regex.h>) in a C project I am developing.
According to regex101 the regex it is well written and identifies what I'm trying to identify but it doesn't work when I try to run it in C.
#include <stdio.h>
#include <regex.h>
int main() {
char pattern[] = "#include.*";
char line[] = "#include <stdio.h>";
regex_t string;
int regex_return = -1;
regex_return = regcomp(&string, line, 0);
regex_return += regexec(&string, pattern, 0, NULL, 0);
printf("%d", regex_return);
return 0;
}
This is a sample code I wrote to test the expression when I found out it didn't work.
It prints 1, when I expected 0.
It prints 0 if I change the line to "#include", which is just strange to me, because it's ignoring the .* at the end.
line and pattern are swapped.
regcomp takes the pattern and regexec takes the string to check.

Using a for-loop in C to test the return value of a function

I'm pretty new to coding and especially to C, so I decided to take the CS50 course as an introduction to the language. I just finished watching the first lecture on C and, as a means to test my knowledge on the subject, I attempted to write a short little program. Also I am using the course's library for the get_int() function.
The goal is to test the user's input and check if it's less or equal to ten. If it matches the parameters, the program should print the "Success!" message and exit; otherwise, it should ask for input again. If the input value is over 10, the program responds just as expected, but if you input a value of 10 or less, it ends up asking you for input one more time before actually exiting. I think it's probably something with the "for" loop, but I just can't figure it out.
My code:
#include <stdio.h>
#include <cs50.h>
#include <stdlib.h>
int check_for_value();
int main()
{
for(check_for_value(); check_for_value() != 1; check_for_value())
{
printf("Failed!\n");
}
exit(0);
}
int check_for_value()
{
int i = get_int("Your value: \n");
if(i <= 10)
{
printf("Success!\n");
return 1;
}
else
{
printf("Try again!\n");
return 0;
}
}
That isn't doing exactly what you think it is. In your for loop, each time you write check_for_value(), it is going to call that function. So it will call it the first time and the return value will not matter. It will call it again for the middle statement and then the value will matter because you are comparing the output to not equal to 1. And then again it will call the function in the third statement, where again it won't matter. Usually for something like this, you would use a while loop instead. An example below:
int ret = check_for_value();
while(ret != 1) {
printf("Failed\n");
ret = check_for_value();
}
printf("Success\n");
Technically a for loop can work too as the following:
for(int ret = check_for_value(); ret != 1; ret = check_for_value()) {
printf("Failed\n");
}
The for loop can look very simply
for ( ; !check_for_value(); )
{
printf("Failed!\n");
}
In such a case it is better to use the while loop
while ( !check_for_value() )
{
printf("Failed!\n");
}
As for your for loop
for(check_for_value(); check_for_value() != 1; check_for_value())
^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
then the underlined calls of the function are not tested.
Also bear in mind that such a definition of a for loop
for(int ret = check_for_value(); ret != 1; ret = check_for_value()) {
printf("Failed\n");
}
is a very bad style of programming. There is redundant records of the function calls. The intermediate variable ret is not used in the body of the loop. So its declaration is also redundant. Never use such a style of programming.
Pay attention to that according to the C Standard the function main without parameters shall be declared like
int main( void )
and the statement
exit( 0 );
is redundant.

What is wrong with this Bison grammar?

Im trying to build a Bison grammar and seem to be missing something. I kept it yet very basic, still I am getting a syntax error and can't figure out why:
Here is my Bison Code:
%{
#include <stdlib.h>
#include <stdio.h>
int yylex(void);
int yyerror(char *s);
%}
// Define the types flex could return
%union {
long lval;
char *sval;
}
// Define the terminal symbol token types
%token <sval> IDENT;
%token <lval> NUM;
%%
Program:
Def ';'
;
Def:
IDENT '=' Lambda { printf("Successfully parsed file"); }
;
Lambda:
"fun" IDENT "->" "end"
;
%%
main() {
yyparse();
return 0;
}
int yyerror(char *s)
{
extern int yylineno; // defined and maintained in flex.flex
extern char *yytext; // defined and maintained in flex.flex
printf("ERROR: %s at symbol \"%s\" on line %i", s, yytext, yylineno);
exit(2);
}
Here is my Flex Code
%{
#include <stdlib.h>
#include "bison.tab.h"
%}
ID [A-Za-z][A-Za-z0-9]*
NUM [0-9][0-9]*
HEX [$][A-Fa-f0-9]+
COMM [/][/].*$
%%
fun|if|then|else|let|in|not|head|tail|and|end|isnum|islist|isfun {
printf("Scanning a keyword\n");
}
{ID} {
printf("Scanning an IDENT\n");
yylval.sval = strdup( yytext );
return IDENT;
}
{NUM} {
printf("Scanning a NUM\n");
/* Convert into long to loose leading zeros */
char *ptr = NULL;
long num = strtol(yytext, &ptr, 10);
if( errno == ERANGE ) {
printf("Number was to big");
exit(1);
}
yylval.lval = num;
return NUM;
}
{HEX} {
printf("Scanning a NUM\n");
char *ptr = NULL;
/* convert hex into decimal using offset 1 because of the $ */
long num = strtol(&yytext[1], &ptr, 16);
if( errno == ERANGE ) {
printf("Number was to big");
exit(1);
}
yylval.lval = num;
return NUM;
}
";"|"="|"+"|"-"|"*"|"."|"<"|"="|"("|")"|"->" {
printf("Scanning an operator\n");
}
[ \t\n]+ /* eat up whitespace */
{COMM}* /* eat up one-line comments */
. {
printf("Unrecognized character: %s at linenumber %d\n", yytext, yylineno );
exit(1);
}
%%
And here is my Makefile:
all: parser
parser: bison flex
gcc bison.tab.c lex.yy.c -o parser -lfl
bison: bison.y
bison -d bison.y
flex: flex.flex
flex flex.flex
clean:
rm bison.tab.h
rm bison.tab.c
rm lex.yy.c
rm parser
Everything compiles just fine, I do not get any errors runnin make all.
Here is my testfile
f = fun x -> end;
And here is the output:
./parser < a0.0
Scanning an IDENT
Scanning an operator
Scanning a keyword
Scanning an IDENT
ERROR: syntax error at symbol "x" on line 1
since x seems to be recognized as a IDENT the rule should be correct, still I am gettin an syntax error.
I feel like I am missing something important, hopefully somebody can help me out.
Thanks in advance!
EDIT:
I tried to remove the IDENT in the Lambda rule and the testfile, now it seems to run through the line, but still throws
ERROR: syntax error at symbol "" on line 1
after the EOF.
Your scanner recognizes keywords (and prints out a debugging line, but see below), but it doesn't bother reporting anything to the parser. So they are effectively ignored.
In your bison definition file, you use (for example) "fun" as a terminal, but you do not provide the terminal with a name which could be used in the scanner. The scanner needs this name, because it has to return a token id to the parser.
To summarize, what you need is something like this:
In your grammar, before the %%:
token T_FUN "fun"
token T_IF "if"
token T_THEN "then"
/* Etc. */
In your scanner definition:
fun { return T_FUN; }
if { return T_IF; }
then { return T_THEN; }
/* Etc. */
A couple of other notes:
Your scanner rule for recognizing operators also fails to return anything, so operators will also be ignored. That's clearly not desirable. flex and bison allow an easier solution for single-character operators, which is to let the character be its own token id. That avoids having to create a token name. In the parser, a single-quoted character represents a token-id whose value is the character; that's quite different from a double-quoted string, which is an alias for the declared token name. So you could do this:
"=" { return '='; }
/* Etc. */
but it's easier to do all the single-character tokens at once:
[;+*.<=()-] { return yytext[0]; }
and even easier to use a default rule at the end:
. { return yytext[0]; }
which will have the effect of handling unrecognized characters by returning an unknown token id to the parser, which will cause a syntax error.
This won't work for "->", since that is not a single character token, which will have to be handled in the same way as keywords.
Flex will produce debugging output automatically if you use the -d flag when you create the scanner. That's a lot easier than inserting your own debugging printout, because you can turn it off by simply removing the -d option. (You can use %option debug instead if you don't want to change the flex invocation in your makefile.) It's also better because it provides consistent information, including position information.
Some minor points:
The pattern [0-9][0-9]* could more easily be written [0-9]+
The comment pattern "//".* does not require a $ lookahead at the end, since .* will always match the longest sequence of non-newline characters; consequently, the first unmatched character must either be a newline or the EOF. $ lookahead will not match if the pattern is terminated with an EOF, which will cause odd errors if the file ends with a comment without a newline at the end.
There is no point using {COMM}* since the comment pattern does not match the newline which terminates the comment, so it is impossible for there to be two consecutive comment matches. But anyway, after matching a comment and the following newline, flex will continue to match a following comment, so {COMM} is sufficient. (Personally, I wouldn't use the COMM abbreviation; it really adds nothing to readability, IMHO.)

How can I create a grammar rule for an error?

I am writing a compiler in C, and I use bison for the grammar and flex for the tokens. To improve the quality of error messages, some common errors need to appear in the grammar. This has the side effect, however, of bison thinking that an invalid input is actually valid.
For example, consider this grammar:
program
: command ';' program
| command ';'
| command {yyerror("Missing ;.");} // wrong input
;
command
: INC
| DEC
;
where INC and DEC are tokens and program is the initial symbol. In this case, INC; is a valid program, but INC is not, and an error message is generated. The function yyparse(), however, returns 0 as if the program were correct.
Looking at the bison manual, I found the macro YYERROR, which should behave as if the parser itself found an error. But even if I add YYERROR after the call to yyerror(), the function yyparse() still returns 0. I could use YYABORT instead, but that would stop on the first error, which is terrible and not what I want.
Is there anyway to make yyparse() return 1 without stopping on the first error?
Since you intend to recover from syntax errors, you're not going to be able to use the return code from yyparse to signal that one or more errors occurred. Instead, you'll have to track that information yourself.
The traditional way to do that would be to use a global error count (just showing the crucial pieces):
%{
int parse_error_count = 0;
%}
%%
program: statement { yyerror("Missing semicolon");
++parse_error_count; }
%%
int parse_interface() {
parse_error_count = 0;
int status = yyparse();
if (status) return status; /* Might have run out of memory */
if (parse_error_count) return 3; /* yyparse returns 0, 1 or 2 */
return 0;
}
A more modern solution is to define an additional "out" parameter to yyparse:
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
int main() {
int error_count = 0;
int status = yyparse(&error_count);
if (status || error_count) { /* handle error */ }
Finally, in case you really need to export the symbol yyparse from your bison-generated code, you can do the following ugly hack:
%code top {
#define yyparse internal_yyparse
}
%parse-param { int* error_count }
%%
program: statement { yyerror("Missing semicolon");
++*error_count; }
%%
#undef yyparse
int yyparse() {
int error_count = 0;
int status = internal_yyparse(&error_count);
// Whatever you want to do as a summary
return status ? status : error_count ? 1 : 0;
}
yyerror() just prints an error message. It doesn't alter what yyparse() returns.
What you're attempting is not a good idea. You'll enormously expand the grammar and you run a major risk of making it ambiguous. All you need to do is remove the production that calls yyerror(). That input will produce a syntax error anyway, and that will cause yyparse() not to return 0. You're keeping a dog and barking yourself. What you should be checking for is semantic errors that the parser can't see.
If you really want to improve the error messages, there's enough information in the parse tables and state information to tell you what the expected next token was. However in most cases it's such a large set it's pointless to print it. But programmers are used to sorting out 'syntax error'. Don't sweat it. Writing compilers is hard enough already.
NB You should make your grammar left-recursive to avoid excessive stack usage: for example, program : program ';' command.

Why am I getting a "Segmentation Fault" error when I try to run the tests?

I've written a function that determines whether or not to assign default values (it assigns default values if the flag is not present, and it assigns values the user passes if the flag is present). And I'm trying to test my function with a string to see if it did give me the right numbers. I keep getting "Segmentation Fault" when I try to run the tests, it compiles, but the tests just don't work. :(
Here's my header file:
#ifndef COMMANDLINE_H
#define COMMANDLINE_H
#include "data.h"
#include <stdio.h>
struct point eye;
/* The variable listed above is a global variable */
void eye_flag(int arg_list, char *array[]);
#endif
Here's my implementation file:
#include <stdio.h>
#include "commandline.h"
#include "data.h"
#include "string.h"
/* Used global variables for struct point eye */
void eye_flag(int arg_list, char *array[])
{
eye.x = 0.0;
eye.y = 0.0;
eye.z = -14.0;
/* The values listed above for struct point eye are the default values. */
for (int i = 0; i <= arg_list; i++)
{
if (strcmp(array[i], "-eye") == 0)
{
sscanf(array[i+1], "%lf", &eye.x);
sscanf(array[i+2], "%lf", &eye.y);
sscanf(array[i+3], "%lf", &eye.z);
}
}
}
And here are my test cases:
#include "commandline.h"
#include "checkit.h"
#include <stdio.h>
void eye_tests(void)
{
char *arg_eye[6] = {"a.out", "sphere.in.txt", "-eye", "2.4", "3.5", "6.7"};
eye_flag(6, arg_eye);
checkit_double(eye.x, 2.4);
checkit_double(eye.y, 3.5);
checkit_double(eye.z, 6.7);
char *arg_eye2[2] = {"a.out", "sphere.in.txt"};
eye_flag(2, arg_eye2);
checkit_double(eye.x, 0.0);
checkit_double(eye.y, 0.0);
checkit_double(eye.z, -14.0);
}
int main()
{
eye_tests();
return 0;
}
The absolute easiest way to solve this one is run it in a debugger. You probably won't even need to learn how to step through your code or anything - just fire up, run, and read the line.
If you are on a *nix system:
Compile your code with -g flag.
Load as, e.g. gdb a.out.
Run now that it's loaded - (gdb) run.
Do whatever you need to reproduce the segfault.
bt or where should give you a stack trace - and an exact line that is causing your problem.
I'm sure enough you can solve it from there to post this as an answer; but if not, knowing the exact line will make it very much easier to research and solve.
The errors are here:
for (int i = 0; i <= arg_list; i++)
{ ///^^
if (strcmp(array[i], "-eye") == 0)
{
sscanf(array[i+1], "%lf", &eye.x);
//^^^
sscanf(array[i+2], "%lf", &eye.y);
sscanf(array[i+3], "%lf", &eye.z);
}
}
i <= arg_list is wrong since you pass in 6, array index starts from 0, the max value is 5
i+1, i+2,i+3 will give you out of bounds index when you iterate from 0 to 5.
Your loop condition is wrong. It should be i < arg_list.
Think about what happens when i == arg_list.

Resources