Need help generating three address code witth lex and yacc - c

I am generating three address code for a c like program containing declaration,arithmetic,boolean, if and while statements.
Currently i am beginning with arithmetic expression. I am reading the c like program from a text file.
Lex code:
parser.lex
Yacc code:
parser.yacc
Input C like program(contents of test.txt)
a=1+2/3;
I have a make file like:
bison -d -v parser.y
flex -o parser.lex.c parser.lex
gcc -o cparser parser.lex.c parser.tab.c -lfl -lm
./cparser
When i compile my input file, i get the following output:
t1=2/3/3
t2=1+2/3;+t1
a=1+2/3;=t2
Parsing Successful. The three address code is:
syntax error
Successful parsing.
Why are the $1 $2 $3...etc not containing the desired reduction?
Why is the stderr printing syntax error?

In your lexer code, you have things like:
{number} {yylval=yytext; return NUMBER;}
this will set $$ for that token to point at the lexer internal buffer which will be clobbered by the next call to yylex, so when you go to print it in the parser, you'll print some garbage. You need something like:
{number} {yylval=strdup(yytext); return NUMBER;}
In addition, you have patterns like:
'int' return INT;
The ' character is not special in any way to flex, so this pattern matches the 5-character sequence 'int'.

if (yyparse())
should be
if (yyparse() == 0)
In your lex rule "\n" {/*simply skip new line*/} you could keep track of the line number so when there is a syntax error you can print out the line number.

Related

How can I get all modulo(%) operation in my C source code

How can I get all modulo(%) operation in my c source code?
I don't think a regular expression can do it, due to C MACRO inclusion.
And, must take care of string printf formatting.
Which static analyse source code can I use to do this?
For exemple, Codesonar or codecheck "only" try find problems, not that.
I want to check manually all modulo operation with zero. (like division by 0).
It's probably not the best way to do this, but it seems to work. Here's my test file:
#include <stdio.h>
#define MACRO %
int main(void) {
int a = 1%1;
int b = 1 MACRO 1;
printf("%1s", "hello");
}
Then I ran this and got some useful output:
$ clang -fsyntax-only -Xclang -dump-tokens main.c 2>&1 > /dev/null | grep percent
percent '%' Loc=<main.c:6:14>
percent '%' [LeadingSpace] Loc=<main.c:7:15 <Spelling=main.c:3:15>>
Short explanation:
This is the output from the lexical analysis. The command seem to write to stderr instead of stdout. That's the reason for 2>&1 > /dev/null. After that I'm just using grep to find all occurrences of the modulo operator.

Why is my lex output .exe not returning any responds? (Failing at hello world)

I am trying to understand lex/yacc and currently I am failing at hello world. I probably messed something up, somewhere, but I can't seem to find it.
Also, I am not experienced with C language or with lex/flex/yacc/bison so this is all new to me.
test.l file;
%option noyywrap
%{
#include <stdio.h>
%}
%%
"hey" printf("hello!");
%%
int main()
{
return 0;
}
I compile this on windows, with the commands;
lex test.l
This returns lex.yy.c file without errors or warnings.
I then compile with;
cc lex.yy.c
Which without errors or warnings, creates the a.exe as supposed too.
When i then run the file with input from another file;
a.exe < input
It returns nothing.
Input file;
"hey"
Any information is welcome, since every single guide I found either creates errors (when literally copy pasted, even after clean install and guided-install) or is simply outdated or listed for "windows" while it uses commands that are non-windows >.<
It's the double quotes in the "hey" in your .l file they actually don't mean "hey" they mean hey so if you change your input file to just say hey rather than "hey" your code should work. If you want to parse " then your rule should be: "\"hey\""
Also lex should auto include stdio.h so you probably don't need it.
LOL you forgot to call yylex();
So:
%option noyywrap
%%
"hey" printf("hello!");
%%
int main()
{
yylex();
return 0;
}
Important subtlety
You will see this behaviour occur and you may not notice it right away but
Your original code will give you and output. The match will occur even with \"hey\" output and you will get:
"hello!"
Notice the "'s.
That's because lex injects a default rule for any character matching to just spit it back out, and because your "hey" rule matches hey and you have "'s around the input the quotes come out in the "hello!"
Your main does nothing - you need to call the lexer.
int main()
{
yylex();
return 0;
}

Passing a Command Line Argument and a textfile to a program

I have a Read and Reverse Coding assignment where I need to pass a Textfile and a character (-L, or -W), depending on whether the operator wants the textfile returned in reverse by lines or by words. (I should also note that the assignment requires that nothing is asked of the user during the code. It must be decided which variation is wanted in the command line.)
I don't need help with the code to reverse the lines or words, but do need help with understanding how to take in character and the textfile, then use them in the code. I've tried using the parameters (int argc, char *argv[]) on the main, but anytime I try to pass in just the -L the terminal either says Command not found or clang: error: argument to '-L' is missing (expected 1 value)
Also, when my teacher passes a textfile to a program he often uses a >. Can someone explain how to use this?
Ex. program.c > hello.txt
Then he would end up using that .txt in the program.
Consider this:
program -L < data.txt
or
program -W < data.txt
or
cat data.txt | program -L
The "-L" or "-W" will be in argv[1].
Good luck!
The idea of Passing a Command Line Arguement is the following
Argc: argument counter amount of "strings"(arguments) passed for execution.
Its always 1 or greater as the calling of the function is an argument.
Argv: argument vectors(pointers), is a pointer to each of the arguments received by the command line
Example of program call:
./myprogram -w
argc=2
argv will have two pointers to strings(char):
argv[0]= "./myprograms"
argv[1]= "-w"
Now with your problem
When excecuting a program via command line you have a lot of options amongst these:
1) One of these is to give the program input of a file(the file will be passed character by character to the standard input ending with an EOF or -1 -not an ascii character-)
These can be done by the follow way
./program.c < hello.txt
2)Redirect the output of the program to a file
./program.c > hello.txt
What you are looking to do is input a file while passing an argument this can be done the following way
./program.c < hello.txt -L
IMPORTANT: "< hello.txt" will NOT count as an argument so for this case the case the argc and argv will be the follow
argc=2
argv[0]="./program.c"
argv[1]="-L"
Hopes this helps comment if you need anymore help or something isn't clear. Good luck with your course!!!

Debugging lexical analyzer issue

I am writing a lexical analyzer here is the code:
%{
#include <stdio.h>
void showToken(char*);
%}
%%
int main(){
void showToken(char* name){
printf("<%s,%s>",name,yytext);
}
return 0;
}
%%
I am getting the following :
~/hedor1>cc -c -o lexical.o lexical.c
lexical.l:40: error: expected identifier or â(â before â%â token
I cant find where is the problem and moreover in the CODE SECTION must I write :
int main(){}
what happens if I don't write the main function above?
Primary problem
You can only have two %% lines in a Lex (Flex) analyzer.
...definitions...
%%
...lexical patterns...
%%
...everything else...
The programs Lex and Flex simply copy the content of the file after the second %% verbatim into the generated C code. And C doesn't like %% at any time.
Nitpick
You shouldn't nest functions inside each other like you're trying to with:
int main(){
void showToken(char* name){
printf("<%s,%s>",name,yytext);
}
return 0;
}
You need to separate main() from showToken(). (There is a GCC-specific extension that does allow nested functions. Don't use it.)
Also, when you have a line number in an error message, it is helpful to insert a comment to identify the line in the source. Or describe the line that is identified. But we shouldn't have to count the lines in your code, even if the error is in line 1...well, maybe lines 1-3 aren't too critical, but there is a fuzzy breakpoint after which identifying the line is important. By the time is has reached the teens, it is close to essential; the first 5 lines probably aren't crucial; in between (6-12) it's generally better to indicate the line number.

How to get the entire string reported by __LINE__

Is it possible to get the entire string on line reported through LINE macro.
Sample code:
#include <stdio.h>
#define LOG(lvl) pLog(lvl, __LINE__, __FILE__)
pLog(const char *str, int line, const char *file)
{
printf("Line [%u]: File [%s]", line, file);
}
int main ()
{
LOG("Hello"
"world");
return 0;
}
The output is: Line [13]: File [macro.c]
Now in a large code base i want to search this file and print the string "Hello world" present at line reported (in this case it is 13)
One way i was thinking is to search for this file first generate the output file with gcc -E do grep for pLog and save their string then grep for LOG in actual code file and save line number match the line number with the line number present in result and then do matching of index and print the string.
As string can be distributed across multiple lines (as in code Hello is in one line and world is in another line) so also need to take care of that.
Is there anyother best and fast way of doing it or gcc provide some option to convert back line and file to actual code
This is very easy to do with Clang. The following command dumps Abstract Syntax Tree (AST) for the file test.c to the file out:
clang -cc1 -ast-dump test.c > out
Looking at the AST in the generated file you can easily find the information you need:
(StringLiteral 0x1376cd8 <line:12:9, line:13:13> 'char [11]' lvalue "Helloworld")))
Clang gives start of the first token of the string (line:12:9), start of the last token of the string (line:13:13) and the full string ("Helloworld").
You can either parse the AST dump or use Clang API to get the same information. If this is not a one time task, I'd go for API since the AST dump format is more likely to change in the future.
All this of course make sense only if you have a reason not to print the string in pLog itself.

Resources